Sign up for our newsletter and get the latest big data news and analysis.

Monte Carlo Simulations in Ad-Lift Measurement Using Spark

In this talk from Spark Summit East 2016, Prasad Chalasani explores some of the challenges that arise in setting up scalable simulations in a specific application, and share some solutions and lessons learned along the way, in the realms of mathematics and programming.

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

In the talk below, Michael Armbrust, gives an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL’s query optimizer, to all users of Spark.

IBM Unleashes the Power of Machine Learning with Watson-enabled Data Platform

IBM (NYSE:IBM) announced IBM Watson Data Platform to help companies gain more valuable insights from data. The platform delivers the world’s fastest data ingestion engine and cognitive-powered decision-making to data professionals, allowing them to collaborate in the IBM Cloud, with the services they prefer. IBM is also making IBM Watson Machine Learning Service available – making machine learning simple with an intuitive, self-service interface.

Databricks Sets New World Record for CloudSort Benchmark Using Apache Spark at $1.44 Per Terabyte

Databricks®, the company founded by the the team that created the popular Apache® Spark™ project, announced that in collaboration with industry partners, it has broken the world record in the CloudSort Benchmark, a third-party industry benchmarking competition for processing large datasets.

Apache Spark Survey Reveals Increased Growth in Users and New Workloads Including Exploratory Data Science and Machine Learning

In order to better understand Apache Spark’s growing role in big data, Taneja Group conducted a major market research project, surveying approximately 7,000 people. The sample was made up of technical and managerial job roles from around the world directly involved in big data.

Splice Machine Announces Native PL/SQL Support to Accelerate Migrations from Oracle to Hadoop

Splice Machine, provider of the open-source SQL RDBMS powered by Hadoop and Spark, announced that it now supports native PL/SQL on Splice Machine.

Bigstep Launches High-Performance, Low-Latency Spark-as-a-Service for Real-Time Streaming Applications

Bigstep, the big data cloud provider, today launched a bare-metal Spark-as-a-Service offering.

Databricks Adds Deep Learning Support to Cloud-Based Apache Spark Platform

Databricks®, the company founded by the creators of the Apache® Spark™ project, today announced the addition of deep learning support to its cloud-based Apache Spark platform.

Distributed System Architectures for Healthcare and Life Sciences

The insideBIGDATA Guide to Healthcare & Life Sciences is a useful new resource directed toward enterprise thought leaders who wish to gain strategic insights into this exciting new area of technology. This segment focuses on the use of distributed system architectures – Hadoop and Spark.

Apache Spark Survey 2016 Report

More than 1,600 members of the Apache Spark community from over 900 organizations have spoken, and Spark continues to be the most active open-source project in the big data space today. The 2016 Databricks Apache Spark Survey shows a rise in production deployments of Spark in the public cloud, as well as an increased usage […]