Sign up for our newsletter and get the latest big data news and analysis.

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

In the talk below, Michael Armbrust, gives an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL’s query optimizer, to all users of Spark.

SAP S/4HANA® Enhancements Accelerate Digital Transformation Adoption

SAP SE (NYSE: SAP) announced the SAP S/4HANA® 1610 release, the latest enhancement to SAP’s next-generation ERP business suite. By utilizing a simplified data model and the award-winning SAP Fiori® 2.0 user experience, the new release helps reduce complexity and unleash the next wave of business productivity. Through prediction and pattern recognition capabilities, along with machine learning, SAP S/4HANA lays the foundation to reduce exception management for more routine transactions, empowering the workforce to focus on higher-value tasks with embedded analytics and real-time insights.

Governance Gets Sexy

In this special guest feature, Joe Pasqua, Executive Vice President of Products at MarkLogic, discusses how effective data governance is more important—and more elusive—than ever before. It enables companies to frame data according to their strategic initiatives, regulatory mandates and core principles. It reduces risk and increases a company’s ability to make the most out of the data in its possession.

MongoDB 3.4 Accelerates Digital Transformation for the Modern Enterprise

MongoDB, the database for giant ideas, announced MongoDB 3.4, the latest version of the popular modern database. MongoDB 3.4 adds key features that embrace additional data models, combining operational and analytical processing, elastic cross-region scaling and sophisticated operational tooling to simplify data management for customers.

How Breakthrough Innovations at Experian’s DataLabs Expand the Horizons of Doing Good Things with Data

In this contributed article, Eric Haller, Executive Vice President at Experian DataLabs describes how his company uses large data sets coupled with data science methodologies to solve strategic marketing and risk-management problems with an emphasis on financial services, telecommunications and healthcare.

Progress DataDirect Hybrid Data Pipeline Revolutionizes Data Access for Cloud Applications

Progress (NASDAQ: PRGS) announced the release of Progress® DataDirect®Hybrid Data Pipeline, a data access service that provides simple, secure access to organizations’ cloud and on-premises data sources for hybrid cloud applications, such as CRM, data management platforms or hosted analytics. Progress DataDirect Hybrid Data Pipeline represents the first vendor-agnostic hybrid connector that provides secure firewall-friendly access to back-office data from any cloud, independent of vendor or technology. It enables developers to integrate applications and data quickly, no matter where that data lives—on-site, in the cloud or both.

Driving Engagement and Revenue with Segmentation in the Digital World

In this special guest feature, Patrick Smith, Manager of Data Science Services at Mather Economics, discusses the value of the ability to attach digital advertising revenue to the online activity at a customer level to show what content creates the highest revenue stream. It’s this knowledge that can (and has), in turn, help publishers create successful, more profitable business strategies.

Trifacta Introduces Wrangler Edge for Analyst Teams

Trifacta, a global leader in data wrangling, announced the launch of Wrangler Edge, a new offering designed for analyst teams wrangling diverse data outside of big data environments. Since its introduction in late 2015, Trifacta’s free desktop edition, Wrangler, has seen broad adoption with more than 4,000 companies in 132 countries using Wrangler to explore, transform and join diverse data for analysis.

Building a Business Case for Data Quality

The infographic below was developed by Experian Data Quality as a by product of their recent survey of 402 management-level professionals. The infographic covers how managers feel about data. Data is your organization’s most valuable asset, and having good data quality is necessary for sustained success.

IBM Unleashes the Power of Machine Learning with Watson-enabled Data Platform

IBM (NYSE:IBM) announced IBM Watson Data Platform to help companies gain more valuable insights from data. The platform delivers the world’s fastest data ingestion engine and cognitive-powered decision-making to data professionals, allowing them to collaborate in the IBM Cloud, with the services they prefer. IBM is also making IBM Watson Machine Learning Service available – making machine learning simple with an intuitive, self-service interface.