Apache Spark Archives - insideBIGDATA

Databricks Announces Major Contributions to Flagship Open Source Projects

July 2, 2022 by Editorial Team Leave a Comment

Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse.

Filed Under: Big Data, Big Data Services, Big Data Software, Cloud, Databricks, Google News Feed, inside SPARK, Machine Learning, Main Feature, News / Analysis, Spark 101, Uncategorized Tagged With: Apache Spark, databricks, lakehouse, MLflow, Weekly Newsletter Articles

StreamSets Launches StreamSets Transformer

September 15, 2019 by Editorial Team Leave a Comment

StreamSets, Inc., provider of the DataOps platform for modern data integration, released StreamSets® Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users — even those without specialized skills — StreamSets Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Now, data engineers, scientists, architects and operators gain deep visibility into the execution of Apache Spark while broadening usage across the business.

Filed Under: Databricks, Google News Feed, inside SPARK, Main Feature, News / Analysis, Uncategorized Tagged With: Apache Spark, Weekly Newsletter Articles

State of the Art Natural Language Processing at Scale

July 5, 2018 by Editorial Team Leave a Comment

The two part presentation below from the Spark+AI Summit 2018 is a deep dive into key design choices made in the NLP library for Apache Spark. The library natively extends the Spark ML pipeline API’s which enables zero-copy, distributed, combined NLP, ML & DL pipelines, leveraging all of Spark’s built-in optimizations.

Filed Under: Big Data, Featured, Google News Feed, inside SPARK, Machine Learning, News / Analysis, Uncategorized, Video Tagged With: Apache Spark, NLP, Weekly Newsletter Articles

Databricks Partners with RStudio To Increase Productivity of Data Science Teams

June 29, 2018 by Editorial Team Leave a Comment

Databricks, a leader in unified analytics and founded by the original creators of Apache Spark™, announced a partnership with RStudio, providers of a free and open-source integrated development environment for R, to increase the productivity of data science teams. The partnership will allow the two companies to seamlessly integrate Databricks’ Unified Analytics Platform with the RStudio Server, simplifying R programming on big data.

Filed Under: Databricks, Google News Feed, inside SPARK, Main Feature, News / Analysis, Uncategorized Tagged With: Apache Spark, Rstudio, Weekly Newsletter Articles

Apache Spark 2.0: A Deep Dive Into Structured Streaming

May 28, 2018 by Editorial Team Leave a Comment

In this talk, Tathagata Das takes a deep dive into the concepts and the API and show how this simplifies building complex “Continuous Applications”. Tathagata is an Apache Spark Committer and a member of the PMC. He’s the lead developer behind Spark Streaming, and is currently employed at Databricks.

Filed Under: Big Data, Databricks, Google News Feed, inside SPARK, Main Feature, News / Analysis, Uncategorized Tagged With: Apache Spark, Streaming, Weekly Newsletter Articles

Top 5 Mistakes When Writing Spark Applications

January 7, 2018 by Editorial Team Leave a Comment

In the presentation below from Spark Summit 2016, Mark Grover goes over the top 5 things that he’s seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

Filed Under: Big Data, Featured, Google News Feed, inside SPARK, News / Analysis, Spark 101, Uncategorized, Video Tagged With: Apache Spark, Weekly Newsletter Articles

The Data Scientist’s Guide to Apache Spark

December 27, 2017 by Editorial Team Leave a Comment

Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from our friends over at Databricks.

Filed Under: Big Data, Databricks, Featured, Google News Feed, inside SPARK, News / Analysis, Spark 101, Uncategorized Tagged With: Apache Spark, data scientist, Weekly Newsletter Articles

The Data Scientist’s Guide to Apache Spark™

December 26, 2017 by Daniel Gutierrez Leave a Comment

For data scientists looking to apply Apache Spark’s advanced analytics techniques and deep learning models at scale, Databricks is happy to provide The Data Scientist’s Guide to Apache Spark. Download this eBook to: Learn the fundamentals of advanced analytics and receive a crash course in machine learning. Get a deep dive on MLlib, the primary […]

Tagged With: Apache Spark, data scientist

Databricks Launches Delta To Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

October 26, 2017 by Editorial Team Leave a Comment

Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark™, announced Databricks Delta, the first unified data management system that provides the scale and cost-efficiency of a data lake, the query performance of a data warehouse, and the low latency of a streaming ingest system. Databricks Delta, a […]

Filed Under: Big Data, Databricks, Google News Feed, inside SPARK, News / Analysis Tagged With: Apache Spark, data lake, Weekly Newsletter Articles

Apache Spark Expands With Cypher, Neo4j’s ‘SQL For Graphs,’ Adds Declarative Graph Querying

October 24, 2017 by Editorial Team Leave a Comment

Neo4j, a leader in connected data, announced that it has released the preview version of Cypher for Apache Spark (CAPS) language toolkit. This combination allows big data analysts to incorporate graphs and graph algorithms in their work, which will dramatically broaden how they reveal connections in their data.

Filed Under: Big Data, Google News Feed, inside SPARK, News / Analysis, Uncategorized Tagged With: Apache Spark, graph database, Weekly Newsletter Articles

Databricks Announces Major Contributions to Flagship Open Source Projects

StreamSets Launches StreamSets Transformer

State of the Art Natural Language Processing at Scale

Databricks Partners with RStudio To Increase Productivity of Data Science Teams

Apache Spark 2.0: A Deep Dive Into Structured Streaming

Top 5 Mistakes When Writing Spark Applications

The Data Scientist’s Guide to Apache Spark

The Data Scientist’s Guide to Apache Spark™

Databricks Launches Delta To Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

Apache Spark Expands With Cypher, Neo4j’s ‘SQL For Graphs,’ Adds Declarative Graph Querying

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Featured RSS Feed

More News from insideHPC