Paxata Debuts Spring ’15 Release at Strata + Hadoop World

Print Friendly, PDF & Email

paxataStrata + Hadoop World News

Paxata, an Adaptive Data Preparation™ application and platform built from the ground up to help business analysts, data scientists, developers, data curators, and IT teams automate, collaborate and dynamically govern the data integration, data quality and enrichment process in a self-service fashion, today announced the availability of its Spring ’15 release at Strata + Hadoop World.

The new release delivers innovations around the Pillars of a Modern Data Prep Platform: performance, elasticity, efficiency, connectivity and scalability. The architecture is the first in the industry to deliver interactive performance on data preparation tasks at massive volumes. Elasticity is achieved in a utility-based model, as the system dynamically scales costs and capacity up or down as workloads and users increase or decrease. The platform delivers efficiency as massive data prep projects are performed with no coding, sampling or pre-defined models required. This release builds on existing connectivity options, providing seamless access to machine and interaction data stored in Hadoop clusters and relational databases. The new release focuses on scalability through automation of data prep tasks, project repeatability and data extraction with a new REST API toolkit.

We have proven that the power center of data preparation has moved from away from legacy IT-only, relational and batch processing, into the self-service, interactive, poly-structured, elastic world,” said Prakash Nanduri, Co-founder and CEO of Paxata. “Our Spring ‘15 release blows the doors off other self-service data preparation solutions in terms of both interactive massive-scale processing and automation. While I am excited by the fact that we hit our 35th customer milestone in just over a year of commercial availability, it is more exciting to know that we did that by proving ourselves in the face of the most rigorous data prep scenarios our customers trust us to help them solve.”

The Modern Data Prep Platform

The Paxata platform was built with a data management layer that persists data inside the Hadoop Distributed File System (HDFS) and a real-time columnar parallelized in-memory pipeline data prep engine powered by Intellifusion™. The data prep engine wraps Apache Spark v1.2 with additional functionality built to optimize Spark performance and responsiveness. This unified experience allows for a seamless transition between interactive and batch execution models supported by advanced compilation and caching techniques.

  • The Spring ’15 release has out-performed expectations in terms of processing speed and capacity, with one highlight being a data migration project that involved rapidly extracting, organizing and cleaning 30 million records from over 400 datasets which were being moved into an SAP system. The more impressive point on this is that all the data preparation was done interactively in the cloud.
  • Unlike rigid systems that are sized for maximum usage scenarios, the Paxata platform is the first truly elastic architecture, giving cloud and on premise customers the ability to dynamically scale usage up or down based on ever-changing workloads, data volumes, projects and concurrent users.
  • Unlike other vendors who force users to code scripts to prepare data sets through sampling and executing in batch-mode within a legacy MapReduce environment or require proprietary graph databases to persist their data in, Paxata provides an elegant way to increase efficiency in the data prep process end-to-end. The Spring ’15 release builds on the ability of Paxata to handle big data volumes by making it possible to seamlessly extract value from Hadoop environments without spending hours coding ETL processes, which today represents 80% of MapReduce jobs. This minimizes the time and effort it takes to surface Hadoop and non-Hadoop data to the business.
  • In addition to the standard data sources Paxata handles (e.g, Excel, XML, JSON, AVRO), this release makes it easy for IT to surface machine-generated data from sensors, clickstreams, and server logs from numerous NoSQL data stores, Hadoop and non-Hadoop systems simultaneously to their business users without re-architecting the entire access process.
  • In addition to the industry-defining Intellifusion capabilities that come with the easy-to-use self-service end user application, Paxata power users now also have programmatic system access to the entire Paxata enterprise data preparation platform through the introduction of its comprehensive REST API. This allows business analysts to run a data prep project, then hand it off to their operations team who leverages tools like Apache Oozie Workflow Scheduler to automate the entire project lifecycle, from ingestion to data preparation to published AnswerSets™.

According to Forrester Research, “Vendors are building a new generation of data preparation tools that use Apache Hadoop and Spark to run machine-learning algorithms in support of data preparation tasks. These new tools provide users with more predictive and intelligent assistance as they structure data, identify types, script transformations, and clean or enrich data.” The report goes on to say: “With Paxata, analysts work in spreadsheet-like interface on top of a machine-learning engine that shapes and improves the data automatically. They can source data, reports and pivot tables. Paxata learns from acceptance and additional shaping of the data by analysts from other projects.”

Cloud-Based and On-Premise Data Prep

Paxata Adaptive Data Preparation is available in a multi-tenant cloud service as well as an on-premise deployment. Cloud customers are able to access the robust Paxata architecture without additional cost or burden of maintenance. On-premise or private cloud customers have the ability to deploy Paxata within a dedicated Hadoop environment or as part of their existing Hadoop cluster. Regardless of deployment model selected, all customers benefit from the only modern data preparation solution built for data prep at scale.

Paxata provides simple, transparent subscription pricing for teams of analysts or enterprise-wide and is available for trial use. For more information visit


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind