Sign up for our newsletter and get the latest big data news and analysis.

Interview: David Steinmetz, Machine Learning Engineer at Capital One

I recently caught up with Daniel Steinmetz, who is a Machine Learning Engineer with Capital One Bank to discuss how to get a job at Capital One, the types skills they are looking for, and what his typical day looks like.

The Importance of Vectorization Resurfaces

Vectorization offers potential speedups in codes with significant array-based computations—speedups that amplify the improved performance obtained through higher-level, parallel computations using threads and distributed execution on clusters. Key features for vectorization include tunable array sizes to reflect various processor cache and instruction capabilities and stride-1 accesses within inner loops.

“Above the Trend Line” – Your Industry Rumor Central for 9/18/2017

Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

‘Learning Database’ Speeds Queries from Hours to Seconds

University of Michigan researchers developed software called Verdict that enables existing databases to learn from each query a user submits, finding accurate answers without trawling through the same data again and again. Verdict allows databases to deliver answers more than 200 times faster while maintaining 99 percent accuracy. In a research environment, that could mean getting answers in seconds instead of hours or days.

“Above the Trend Line” – Your Industry Rumor Central for 9/11/2017

Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

Julia: A High-Level Language for Supercomputing and Big Data

Julia is a new language for technical computing that is meant to address the problem of language environments not designed to run efficiently on large compute clusters. It reads like Python or Octave, but performs as well as C. It has built-in primitives for multi-threading and distributed computing, allowing applications to scale to millions of cores. In addition to HPC, Julia is also gaining traction in the data science community.

“Above the Trend Line” – Your Industry Rumor Central for 9/4/2017

Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

TOP 10 insideBIGDATA Articles for August 2017

In this continuing regular feature, we give all our valued readers a monthly heads-up for the top 10 most viewed articles appearing on insideBIGDATA. Over the past several months, we’ve heard from many of our followers that this feature will enable them to catch up with important news and features flowing across our many channels. We’re happy to oblige! We understand that busy big data professionals can’t check the site everyday.

From the Editor’s Bookshelf: My Favorite Titles for Data Science and Machine Learning

As a practicing data scientist, I’ve spent years building up my library of academic and practical resources that I routinely draw upon for helping me do my work. Although my library is vast, I have a select group of books that occupy a prominent position on my desk. I’ve been asked enough times about my “favorite titles” list, I thought I’d write this article for my readers.

Taking Control of System Storage Performance

The Intel Storage Performance Snapshot Tool gives you a fast, high-level look at system storage performance and helps you understand the potential benefits of moving to faster storage. To demonstrate the power of this tool, let’s consider two snapshots while running a MySQL database workload against the same system but with two storage configurations.