In this video from the 2016 HPC User Forum in Austin, John Feo from PNNL presents: Why use Tables and Graphs for Knowledge Discovery System? “GEMS software provides a scalable solution for graph queries over increasingly large data sets. As computing tools and expertise used in conducting scientific research continue to expand, so have the enormity and diversity of the data being collected. Developed at Pacific Northwest National Laboratory, the Graph Engine for Multithreaded Systems, or GEMS, is a multilayer software system for semantic graph databases. In their work, scientists from PNNL and NVIDIA Research examined how GEMS answered queries on science metadata and compared its scaling performance against generated benchmark data sets. They showed that GEMS could answer queries over science metadata in seconds and scaled well to larger quantities of data.”
“A lot of times when people think about big data, they think about it in ahistorical times…outside of this political context,” said Ruby Mendenhall, an associate professor of sociology at UIUC. “It’s really important to think about whose voice is digitized, in journals and newspapers. A lot of that for black women has been lost and you need to make a concerted effort to recover it.” Mendenhall’s study employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women’s shared experience over time.
“Benchmarks, customer experiences, and the technical literature have shown that code modernization can greatly increase application performance on both Intel Xeon and Intel Xeon Phi processors. Colfax Research recently published a study showing that image tagging performance using the open source NeuralTalk2 software can be improved 28x on Intel Xeon processors and by over 55x on the latest Intel Xeon Phi processors.”
In this special guest feature, Rob Farber from TechEnablement writes that the Intel Scalable Systems Framework is pushing the boundaries of Machine Learning performance. “machine learning and other data-intensive HPC workloads cannot scale unless the storage filesystem can scale to meet the increased demands for data.”
Intel Enterprise Edition for Lustre* Software has taken a leap toward greater enterprise capabilities and improved features for HPC with release of version 3.0. This latest version includes new security enhancements, dynamic LNET configuration support, ZFS snapshots, and other features asked for by the HPC community inside and outside the enterprise. Additionally, it adds the Intel Omni-Path Architecture drivers.
“Presto is a perfect fit with the Teradata Unified Data Architecture, an integrated analytical ecosystem for our enterprise customers. Presto enables companies to leverage standard ANSI SQL to execute interactive queries against Hadoop data. With Presto, utilizing Teradata’s Query Grid connector for Presto, customers can execute queries that originate in Teradata Integrated Data Warehouse that join data within the IDW and Hadoop leveraging Presto.”
In this slidecast, Marc Hamilton from Nvidia describes the latest updates to the company’s Deep Learning Platform. “Great hardware needs great software. To help data scientists and developers make the most of the vast opportunities in deep learning, we’re announcing today at the International Supercomputing show, ISC16, a trio of new capabilities for our deep learning software platform. The three — NVIDIA DIGITS 4, CUDA Deep Neural Network Library (cuDNN) 5.1 and the new GPU Inference Engine (GIE) — are powerful tools that make it even easier to create solutions on our platform.”
In this video from the HPC User Forum in Tucson, Prabhat from NERSC presents: Machine Learning. “Prabhat leads the Data and Analytics Services team at NERSC. His current research interests include scientific data management, parallel I/O, high performance computing and scientific visualization.”
“This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown.”
Today Cornell University announced a five-year, $5 million project sponsored by the National Science Foundation to build a federated cloud comprised of data infrastructure building blocks (DIBBs) designed to support scientists and engineers requiring flexible workflows and analysis tools for large-scale data sets, known as the Aristotle Cloud Federation.