Sign up for our newsletter and get the latest big data news and analysis.

Top 5 Mistakes When Writing Spark Applications

In the presentation below from Spark Summit 2016, Mark Grover goes over the top 5 things that he’s seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

The Data Scientist’s Guide to Apache Spark

Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from our friends over at Databricks.

The Data Scientist’s Guide to Apache Spark™

For data scientists looking to apply Apache Spark’s advanced analytics techniques and deep learning models at scale, Databricks is happy to provide The Data Scientist’s Guide to Apache Spark. Download this eBook to: Learn the fundamentals of advanced analytics and receive a crash course in machine learning. Get a deep dive on MLlib, the primary […]

Databricks Launches Delta To Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark™, announced Databricks Delta, the first unified data management system that provides the scale and cost-efficiency of a data lake, the query performance of a data warehouse, and the low latency of a streaming ingest system. Databricks Delta, a […]

Apache Spark Expands With Cypher, Neo4j’s ‘SQL For Graphs,’ Adds Declarative Graph Querying

Neo4j, a leader in connected data, announced that it has released the preview version of Cypher for Apache Spark (CAPS) language toolkit. This combination allows big data analysts to incorporate graphs and graph algorithms in their work, which will dramatically broaden how they reveal connections in their data.

Impetus Technologies Delivers Visual Spark Studio – A New, Free Development Tool to Accelerate Spark Adoption in Enterprises

Impetus Technologies, a big data software products and services company, announced the immediate availability of Visual Spark StudioTM, a new standalone tool aimed at addressing the increasing demand for Spark-based analytic and data processing solutions in enterprises.

Interview: Ash Munshi, CEO at Pepperdata

I recently caught up with Ash Munshi, CEO at Pepperdata, to get a rundown on his company, a sense for how big data and DevOps are related, some highlights on new product offerings, and his sense for where Pepperdata is headed in the future.

IBM Combines All-Flash and Storage Software Optimized for Hortonworks

IBM (NYSE: IBM) announced a new all-flash, high-performance data and file management solution for enterprise clients running exabyte-scale big data analytics, cognitive and AI applications. The combined flash and storage software solution has been certified with the Hortonworks Data Platform (HDP) to provide clients with more choice in selecting the right platform for their big data analytics on data processing engines like Hadoop and Spark.

Unravel Data Adds Native Support for Impala and Kafka

Unravel Data, the Application Performance Management (APM) platform designed for Big Data, announced that it has integrated support for Cloudera Impala and Apache Kafka into its platform, allowing users to derive the maximum value from those applications. Unravel continues to offer the only full-stack solution that doesn’t just monitor and unify system-level data, but rather tracks, correlates, and interprets performance data across the full-stack in order to optimize, troubleshoot, and analyze from a single pane.

Databricks Simplifies and Scales Deep Learning with New Apache Spark Library

Databricks, the company founded by the creators of the popular Apache Spark project, announced Deep Learning Pipelines, a new library to integrate and scale out deep learning in Apache Spark.