Sign up for our newsletter and get the latest big data news and analysis.

EnterpriseDB Announces New Apache Spark Connecter to Speed Postgres Big Data Processing

EnterpriseDB® (EDB™), the database platform company for digital business, announced the general availability of a new version of the EDB Postgres Data Adapter for Hadoop with compatibility for the Apache Spark cluster computing framework. The new version gives organizations the ability to combine analytic workloads based on the Hadoop Distributed File System (HDFS) with operational data in Postgres, using an Apache Spark interface.

Interview: Paulo Sampaio, Data Scientist at EDITED

I recently caught up with Paulo Sampaio, Data Scientist at EDITED, to talk about applying machine learning, neural networks, natural language processing, and big data analytics to the retail industry. Paulo and his team are applying neural networks, machine learning and other models to analyze over 520 million products in real-time across 42 countries to make gradual distinctions in clothing styles, sizes and categories.

The Leaky Pipeline Problem -
 Making your Mark as a Woman in Big Data

insideBIGDATA was on hand for the recent Spark Summit East 2017 conference in Boston, and one of the more compelling presentations was by Kavitha Mariappan, VP Marketing at Databricks. The talk focused on the premise that despite the tremendous growth and opportunities in big data today, women still play a small role in this arena.

Visualizing Malaria in Zambia using Big Data

EXASOL, a high-performance in-memory analytic database developer, and PATH, an international nonprofit organization and global leader in health and innovation, announced a partnership to support the Zambian government’s ambitious campaign to eliminate malaria by 2020.

Percipient Launches SparkPLUS to Solve Apache Spark’s Out-of-memory Problems

Percipient, a Singapore-based startup, is launching a revolutionary solution to address the memory issues incurred by users of open source platform, Apache Spark. By delivering unified data a priori to the Spark platform, Percipient’s SparkPLUS solution is able to multiply the platform’s computing space, thereby greatly enhancing its utility for real time and analytical applications.

Research Firm Advises Analytics Stakeholders and Security Professionals to Build Plans for Securing Hadoop-based Assets

Dataguise, a technology leader in secure business execution, announced inclusion in a report by Gartner titled, “Rethink and Extend Data Security Policies to Include Hadoop.” The report provides best practices for addressing data security concerns related to Apache Hadoop deployments and highlights several leading vendors in the category to support these endeavors.

Monte Carlo Simulations in Ad-Lift Measurement Using Spark

In this talk from Spark Summit East 2016, Prasad Chalasani explores some of the challenges that arise in setting up scalable simulations in a specific application, and share some solutions and lessons learned along the way, in the realms of mathematics and programming.

Interview: Natalia Hernandez, Data Scientist at Foodpairing

I recently caught up with Natalia Hernandez, Data Scientist at Foodpairing, to highlight how her company’s data scientists mine public online data, which gives general trend insights to use consumer intelligence and molecular analysis of ingredients to forecast the next big flavors in the food industry.

Splice Machine’s New OLAP Engine Adds Columnar Storage and In-Memory Caching to its Hybrid Relational Data Platform

Splice Machine, provider of the open-source SQL RDBMS powered by Apache Hadoop® and Apache Spark™, announced the release of version 2.5 of its industry-leading data platform for intelligent applications. The new version strengthens its ability to concurrently run enterprise-scale transactional and analytical workloads, frequently referred to as HTAP (Hybrid Transactional and Analytical Processing).

Hortonworks Advances Cloud Strategy with Availability of Hortonworks Data Cloud for Amazon Web Services

Hortonworks, Inc. ® (NASDAQ: HDP), a leading innovator of open and connected data platforms, announced the availability of Hortonworks Data Cloud on the Amazon Web Services (AWS) Cloud. Hortonworks Data Cloud for AWS enables users to harness the agility and elasticity of Apache® Hadoop™ and Apache® Spark™ in the cloud for powering new workloads and analytic applications. The new cloud service, powered by open source, delivers the most popular enterprise-grade capabilities of Hortonworks Data Platform (HDP®) with both hourly and annual billing options available on the AWS Marketplace.