The inside Spark channel is a resource for professionals looking to learn about the benefits of Apache Spark

MLOps | Is the Enterprise Repeating the Same DIY Mistakes?

August 9, 2022 by Editorial Team Leave a Comment

In this contributed article, Aaron Friedman, VP of Operations at Wallaroo.ai, discusses why hiring data scientists isn’t the answer to unlocking ML value (especially at a time when finding qualified candidates is harder than ever).

Filed Under: Big Data, Data Science, Featured, Google News Feed, inside SPARK, Machine Learning, News / Analysis, Opinion, Uncategorized Tagged With: data scientist, Machine Learning, MLOps, Weekly Newsletter Articles

Databricks Announces Major Contributions to Flagship Open Source Projects

July 2, 2022 by Editorial Team Leave a Comment

Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse.

Filed Under: Big Data, Big Data Services, Big Data Software, Cloud, Databricks, Google News Feed, inside SPARK, Machine Learning, Main Feature, News / Analysis, Spark 101, Uncategorized Tagged With: Apache Spark, databricks, lakehouse, MLflow, Weekly Newsletter Articles

Don’t Call It A “Data Product” Unless It Meets These 5 Requirements

June 9, 2022 by Editorial Team 3 Comments

In this special guest feature, Barr Moses, Co-founder and CEO of Monte Carlo, believes data products can transform an organization’s ability to be data-driven as long as they meet 5 key requirements. Data products can transform an organization’s ability to be data-driven, as long as they are implemented correctly and in good faith.

Filed Under: Big Data, Featured, Google News Feed, Industry Perspectives, Infrastructure, inside SPARK, News / Analysis, Uncategorized

Databricks Launches SQL Analytics to Enable Cloud Data Warehousing on Data Lakes

November 14, 2020 by Editorial Team Leave a Comment

Databricks, the data and AI company, announced the launch of SQL Analytics, which for the first time enables data analysts to perform workloads previously meant only for a data warehouse on a data lake. This expands the traditional scope of the data lake from data science and machine learning to include all data workloads including Business Intelligence (BI) and SQL.

Filed Under: Analytics, Big Data, Big Data Software, Databricks, Featured, Google News Feed, inside SPARK, News / Analysis, Spark 101, Uncategorized Tagged With: analytics, cloud data warehous, data lake, data warehouse, databricks, SQL, Weekly Newsletter Articles

Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications

January 8, 2020 by Editorial Team Leave a Comment

This white paper by enterprise search specialists Lucidworks, points out that unlike consumer search, which has become a seamless part of our everyday lives, the enterprise side might as well still be running Windows 95. Imagine if Amazon, Google, or Facebook treated every user the same, regardless of who they are, where they are, what they’re searching for, and what they’ve clicked. Your users expect that same sophistication in their enterprise apps.

Filed Under: Analytics, Big Data, Featured, Google News Feed, inside SPARK, News / Analysis, Sponsored Post, Uncategorized, White Papers Tagged With: enterprise search, Lucidworks, Weekly Featured Newsletter Post

StreamSets Launches StreamSets Transformer

September 15, 2019 by Editorial Team Leave a Comment

StreamSets, Inc., provider of the DataOps platform for modern data integration, released StreamSets® Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users — even those without specialized skills — StreamSets Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Now, data engineers, scientists, architects and operators gain deep visibility into the execution of Apache Spark while broadening usage across the business.

Filed Under: Databricks, Google News Feed, inside SPARK, Main Feature, News / Analysis, Uncategorized Tagged With: Apache Spark, Weekly Newsletter Articles

Addressing Governmental Challenges when Engaging AI, ML and Data Analytics

June 19, 2019 by Daniel Gutierrez 1 Comment

Gartner recently stated that all industries and levels of government agree the top three game-changing technologies today are AI/machine learning, data analytics/predictive analytics and cloud technologies. However, there are some primary sticking points when it comes to innovation in these areas. Government organizations continue to encounter challenges when trying to pursue these initiatives due to complex security and compliance requirements, poor scalability of legacy IT infrastructure, and perceived risks associated with cloud and IT modernization efforts. How can these challenges be addressed?

Filed Under: Big Data, Databricks, Featured, Google News Feed, Government, inside SPARK, News / Analysis, Opinion, Uncategorized Tagged With: AI, artificial intelligence, data analytics, Government, Machine Learning, Weekly Newsletter Articles

The Power of Crunching Big Data Effectively

March 31, 2019 by Editorial Team Leave a Comment

In this contributed article, Lex Boost, CEO of Leaseweb USA, points out that according to an Accenture study, 79% of enterprise executives agree that companies not embracing big data will lose their competitive edge. Considering that data creation is on track to grow 10-fold by 2025, it’s crucial for companies to be able to process it more quickly, and meaningfully.

Filed Under: Big Data, Cloudera, Featured, Google News Feed, Hadoop, inside Hadoop, inside SPARK, News / Analysis, Opinion, Uncategorized Tagged With: Big Data, Weekly Newsletter Articles

Databricks and RStudio Introduce New Version of MLflow with R Integration

October 14, 2018 by Editorial Team Leave a Comment

Databricks, a leader in unified analytics and founded by the original creators of Apache Spark™, and RStudio, today announced a new release of MLflow, an open source multi-cloud framework for the machine learning lifecycle, now with R integration. RStudio has partnered with Databricks to develop an R API for MLflow v0.7.0.

Filed Under: Databricks, Google News Feed, inside SPARK, Machine Learning, Main Feature, News / Analysis, Uncategorized Tagged With: Machine Learning, Weekly Newsletter Articles

State of the Art Natural Language Processing at Scale

July 5, 2018 by Editorial Team Leave a Comment

The two part presentation below from the Spark+AI Summit 2018 is a deep dive into key design choices made in the NLP library for Apache Spark. The library natively extends the Spark ML pipeline API’s which enables zero-copy, distributed, combined NLP, ML & DL pipelines, leveraging all of Spark’s built-in optimizations.

Filed Under: Big Data, Featured, Google News Feed, inside SPARK, Machine Learning, News / Analysis, Uncategorized, Video Tagged With: Apache Spark, NLP, Weekly Newsletter Articles

The inside Spark channel is a resource for professionals looking to learn about the benefits of Apache Spark

MLOps | Is the Enterprise Repeating the Same DIY Mistakes?

Databricks Announces Major Contributions to Flagship Open Source Projects

Don’t Call It A “Data Product” Unless It Meets These 5 Requirements

Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications

StreamSets Launches StreamSets Transformer

Addressing Governmental Challenges when Engaging AI, ML and Data Analytics

The Power of Crunching Big Data Effectively

Databricks and RStudio Introduce New Version of MLflow with R Integration

State of the Art Natural Language Processing at Scale

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Featured RSS Feed

More News from insideHPC