Sign up for our newsletter and get the latest big data news and analysis.

Interview: Ida Johnsson, Ph.D. Candidate at the Department of Economics at USC

I recently caught up with Ida Johnsson, a Ph.D. Candidate at the Department of Economics at University of Southern California, to discuss how she is actively transitioning to the field of data science. This interview can serve as a compelling example for others wishing to move into the field of data science from other disciplines and explore career opportunities.

Hadoop 3.0 Perspectives by Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli

In the Q&A below, , Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli, offers his perspectives on the recent release of Hadoop 3.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Interview: Joe Pasqua, Executive Vice President of Products at MarkLogic

I recently spoke with Joe Pasqua, Executive Vice President of Products at MarkLogic. Our discussion touched on a number of related issues, including the importance of effective data integration as organizations’ work to implement initiatives as broad as digital transformation and as specific as compliance with the new General Data Protection Regulation (GDPR) laws, which go into effect this May.

Top 5 Mistakes When Writing Spark Applications

In the presentation below from Spark Summit 2016, Mark Grover goes over the top 5 things that he’s seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

The Data Scientist’s Guide to Apache Spark

Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from our friends over at Databricks.

Interview: Shalini Agarwal, Director, Engineering and Product at LinkedIn

I recently caught up with Shalini Agarwal, Director, Engineering and Product at LinkedIn, to discuss how we need more data scientists to make our applications smarter; however we can make them more efficient and accomplish more with data scientists by having automated workflows and tools. These tools can be used by non-data scientists to leverage the established workflows and remove the repetitive tasks from the mountain of tasks expected from a data-scientist.

AtScale Brings its Universal Semantic Layer to the AWS Cloud

AtScale announced the preview availability of its universal semantic platform for business intelligence (BI) on Amazon Redshift. With this offer, enterprises will gain faster time to insight by deploying Big Data Analytics on the Amazon Cloud and benefit from an enhanced ROI by running production-ready workloads on the cost-effective Amazon cloud platform.

Interview: Ben Bromhead, CTO and Co-founder at Instaclustr

I recently caught up with Ben Bromhead, CTO and Co-founder at Instaclustr, to discuss the departure of DataStax from the Apache Cassandra open source project, and how there’s now a void with regard to Cassandra dev community health, database feature upgrades, and overall commits. Instaclustr is looking to step into this space, fill the vacuum, get commits back up, and replace DataStax on these initiatives.

Interview: Ayush Parashar, Co-Founder and Vice President of Engineering at Unifi Software

I recently caught up with Ayush Parashar, Co-Founder and Vice President of Engineering at Unifi Software, to discuss the role and value of metadata as enterprises embrace self-service access to data. Unifi leverages metadata in almost every aspect of what they do. From delivering an elegant user experience to powering AI, it’s a key asset in supporting the capabilities and functionality they are able to deliver to a business user.

Databricks Launches Delta To Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark™, announced Databricks Delta, the first unified data management system that provides the scale and cost-efficiency of a data lake, the query performance of a data warehouse, and the low latency of a streaming ingest system. Databricks Delta, a […]