Sign up for our newsletter and get the latest big data news and analysis.

Video: Why use Tables and Graphs for Knowledge Discovery System?

In this video from the 2016 HPC User Forum in Austin, John Feo from PNNL presents: Why use Tables and Graphs for Knowledge Discovery System? “GEMS software provides a scalable solution for graph queries over increasingly large data sets. As computing tools and expertise used in conducting scientific research continue to expand, so have the enormity and diversity of the data being collected. Developed at Pacific Northwest National Laboratory, the Graph Engine for Multithreaded Systems, or GEMS, is a multilayer software system for semantic graph databases. In their work, scientists from PNNL and NVIDIA Research examined how GEMS answered queries on science metadata and compared its scaling performance against generated benchmark data sets. They showed that GEMS could answer queries over science metadata in seconds and scaled well to larger quantities of data.”

Rescuing Lost History: Using Big Data to Recover Black Women’s Lived Experiences

“A lot of times when people think about big data, they think about it in ahistorical times…outside of this political context,” said Ruby Mendenhall, an associate professor of sociology at UIUC. “It’s really important to think about whose voice is digitized, in journals and newspapers. A lot of that for black women has been lost and you need to make a concerted effort to recover it.” Mendenhall’s study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women’s shared experience over time.

Adding Security and More to Intel® Enterprise Edition for Lustre* Software version 3.0

Intel Enterprise Edition for Lustre* Software has taken a leap toward greater enterprise capabilities and improved features for HPC with release of version 3.0. This latest version includes new security enhancements, dynamic LNET configuration support, ZFS snapshots, and other features asked for by the HPC community inside and outside the enterprise. Additionally, it adds the Intel Omni-Path Architecture drivers.

The Future of Data Science

Here at insideBIGDATA, we’re very serious about data science and machine learning. Data science holds the potential to dramatically impact our lives and how we work. Despite its promise, many questions about data science remain.

Interview: Dr. Michael Ernst from Brookhaven National Laboratory

I recently caught up with Dr. Michael Ernst, Director of the RHIC and ATLAS Computing Facility at Brookhaven National Laboratory, to discuss how Brookhaven National Laboratory has found an innovative and inexpensive way to use AWS cloud spot instances when working with CERN’s LHC ATLAS experiment in order to speed up research during critical time periods.

Video: Machine Learning Overview from NERSC

In this video from the HPC User Forum in Tucson, Prabhat from NERSC presents: Machine Learning. “Prabhat leads the Data and Analytics Services team at NERSC. His current research interests include scientific data management, parallel I/O, high performance computing and scientific visualization.”

Best Practices – Big Data Acceleration

“This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown.”

Adler Planetarium Uses Attendance Data to Visualize and Enhance Visitor Experience

Adler Planetarium enlisted the help of Chicago-based Inquidia Consulting, a leading data engineering and data science firm to select and build an advanced data management and visualization platform

Case Studies: Big Data and Scientific Research

This is the fifth and final article in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.

Big Data and Open Science Data

This article is the fourth in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.