Sign up for our newsletter and get the latest big data news and analysis.

CIOs Say Data Management is Critical for Successful AI Adoption in New Global Research Report

A new survey report by MIT Technology Review Insights highlights AI and data management as essential pillars to enterprise success, but found that the majority of survey respondents cited data mismanagement as a critical factor that could jeopardize their company’s future AI success. The report, “CIO vision 2025: Bridging the gap between BI and AI,” was conducted in May and June 2022 in association with Databricks, pioneer of the lakehouse architecture.

Databricks Announces Major Contributions to Flagship Open Source Projects

Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse. 

Databricks Announces General Availability of Delta Live Tables

Databricks, the Data and AI company and pioneer of the data lakehouse paradigm, announced the general availability of Delta Live Tables (DLT), the first ETL framework to use a simple declarative approach to build reliable data pipelines and to automatically manage data infrastructure at scale. Turning SQL queries into production ETL pipelines often requires a lot of tedious, complicated operational work. By using modern software engineering practices to automate the most time consuming parts of data engineering, data engineers and analysts can concentrate on delivering data rather than on operating and maintaining pipelines.

Databricks Launches Data Lakehouse for Retail and Consumer Goods Customers

Databricks, the Data and AI company and pioneer of the data lakehouse architecture, announced the Databricks Lakehouse for Retail, the company’s first industry-specific data lakehouse for retailers and consumer goods (CG) customers. With Databricks’ Lakehouse for Retail, data teams are enabled with a centralized data and AI platform that is tailored to help solve the most critical data challenges that retailers, partners, and their suppliers are facing.

How ML Powers Data Access Governance with Immuta & Databricks

If data isn’t accessible for real-time analytics, is it still valuable? Immuta’s native Databricks integration avoids this dilemma by using ML to streamline data access governance, and deliver analytics-ready data quickly and securely. For Databricks users leveraging Immuta, ML drives sensitive data discovery, dynamic access control, and consistent policy enforcement.

Databricks Launches SQL Analytics to Enable Cloud Data Warehousing on Data Lakes

Databricks, the data and AI company, announced the launch of SQL Analytics, which for the first time enables data analysts to perform workloads previously meant only for a data warehouse on a data lake. This expands the traditional scope of the data lake from data science and machine learning to include all data workloads including Business Intelligence (BI) and SQL.

StreamSets Launches StreamSets Transformer

StreamSets, Inc., provider of the DataOps platform for modern data integration, released StreamSets® Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users — even those without specialized skills — StreamSets Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Now, data engineers, scientists, architects and operators gain deep visibility into the execution of Apache Spark while broadening usage across the business.

Addressing Governmental Challenges when Engaging AI, ML and Data Analytics

Gartner recently stated that all industries and levels of government agree the top three game-changing technologies today are AI/machine learning, data analytics/predictive analytics and cloud technologies. However, there are some primary sticking points when it comes to innovation in these areas. Government organizations continue to encounter challenges when trying to pursue these initiatives due to complex security and compliance requirements, poor scalability of legacy IT infrastructure, and perceived risks associated with cloud and IT modernization efforts. How can these challenges be addressed?

The Future of Open Source Big Data Platforms

Three well-funded startups – Cloudera Inc., Hortonworks Inc., and MapR Technologies Inc. — emerged a decade ago to commercialize products and services in the open-source ecosystem around Hadoop, a popular software framework for processing huge amounts of data. The hype peaked in early 2014 when Cloudera raised a massive $900 million funding round, valuing it […]

Informatica Announces Enterprise Data Catalog Integrations With Microsoft, Tableau, and Databricks

Informatica, the enterprise cloud data management leader, announced the industry’s most comprehensive enterprise-scale intelligent data catalog, enhanced with technology innovations and tight strategic-partner integrations. The Informatica® Enterprise Data Catalog (EDC) creates a “catalog of catalogs” with AI-driven data discovery across multi-cloud and hybrid environments, providing broad metadata connectivity to support organizations in driving their data-driven digital transformations.