Sign up for our newsletter and get the latest big data news and analysis.

Ohio State Launches High-Performance Deep Learning Project

Deep learning is one of the hottest topics at SC16. Now, DK Panda and his team at Ohio State University have announced an exciting new High-Performance Deep Learning project that aims to bring HPC technologies to the DL field. “Welcome to the High-Performance Deep Learning project created by the Network-Based Computing Laboratory of The Ohio State University. Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like Caffe, Torch, TensorFlow, and CNTK. However, most DL frameworks have been limited to a single node. The objective of the HiDL project is to exploit modern HPC technologies and solutions to scale out and accelerate DL frameworks.”

Video: Why use Tables and Graphs for Knowledge Discovery System?

In this video from the 2016 HPC User Forum in Austin, John Feo from PNNL presents: Why use Tables and Graphs for Knowledge Discovery System? “GEMS software provides a scalable solution for graph queries over increasingly large data sets. As computing tools and expertise used in conducting scientific research continue to expand, so have the enormity and diversity of the data being collected. Developed at Pacific Northwest National Laboratory, the Graph Engine for Multithreaded Systems, or GEMS, is a multilayer software system for semantic graph databases. In their work, scientists from PNNL and NVIDIA Research examined how GEMS answered queries on science metadata and compared its scaling performance against generated benchmark data sets. They showed that GEMS could answer queries over science metadata in seconds and scaled well to larger quantities of data.”

Rescuing Lost History: Using Big Data to Recover Black Women’s Lived Experiences

“A lot of times when people think about big data, they think about it in ahistorical times…outside of this political context,” said Ruby Mendenhall, an associate professor of sociology at UIUC. “It’s really important to think about whose voice is digitized, in journals and newspapers. A lot of that for black women has been lost and you need to make a concerted effort to recover it.” Mendenhall’s study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women’s shared experience over time.

Adding Security and More to Intel® Enterprise Edition for Lustre* Software version 3.0

Intel Enterprise Edition for Lustre* Software has taken a leap toward greater enterprise capabilities and improved features for HPC with release of version 3.0. This latest version includes new security enhancements, dynamic LNET configuration support, ZFS snapshots, and other features asked for by the HPC community inside and outside the enterprise. Additionally, it adds the Intel Omni-Path Architecture drivers.

The Future of Data Science

Here at insideBIGDATA, we’re very serious about data science and machine learning. Data science holds the potential to dramatically impact our lives and how we work. Despite its promise, many questions about data science remain.

Interview: Dr. Michael Ernst from Brookhaven National Laboratory

I recently caught up with Dr. Michael Ernst, Director of the RHIC and ATLAS Computing Facility at Brookhaven National Laboratory, to discuss how Brookhaven National Laboratory has found an innovative and inexpensive way to use AWS cloud spot instances when working with CERN’s LHC ATLAS experiment in order to speed up research during critical time periods.

Video: Machine Learning Overview from NERSC

In this video from the HPC User Forum in Tucson, Prabhat from NERSC presents: Machine Learning. “Prabhat leads the Data and Analytics Services team at NERSC. His current research interests include scientific data management, parallel I/O, high performance computing and scientific visualization.”

Best Practices – Big Data Acceleration

“This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown.”

Adler Planetarium Uses Attendance Data to Visualize and Enhance Visitor Experience

Adler Planetarium enlisted the help of Chicago-based Inquidia Consulting, a leading data engineering and data science firm to select and build an advanced data management and visualization platform

Case Studies: Big Data and Scientific Research

This is the fifth and final article in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.