Altiscale Announces Apache Spark on the Altiscale Data Cloud

Print Friendly, PDF & Email

altiscale_logoStrata + Hadoop World News

Altiscale, Inc., a leading provider of Hadoop-as-a-Service, today announced that Apache Spark is now available on the Altiscale Data Cloud. Altiscale customers can now leverage Apache Spark on Apache Hadoop in order to achieve their critical analytical and business objectives. The addition of Apache Spark provides a broader array of analytical services for machine learning, stream processing, and data processing for large data sets.

Altiscale is dedicated to helping customers quickly find value in the ever-increasing flood of data generated by the connected world,” said Raymie Stata, co-founder and CEO of Altiscale. “Apache Spark in the Altiscale Data Cloud ensures that customers can take advantage of the latest in-memory processing techniques as they are processing their data assets. The Altiscale Data Cloud is purpose-built to provide the fastest, most scalable Hadoop-as-a-Service and is the ideal place to run Spark.”

Apache Spark is an open source framework that is gaining adoption for its machine learning, interactive analytics, and streaming analytics capabilities for large datasets. Spark is appropriate for low-latency computations and  iterative algorithms that employ its in-memory computing capabilities. At Altiscale, Spark is fully integrated into the larger Hadoop ecosystem, so customers benefit not only from Spark, but also from Hive, Pig, MapReduce, and even tools like R and H2O. All of these tools run side-by-side on the same Hadoop Data File System (HDFS) cluster, managed through YARN.

There’s a common misunderstanding that you have to choose between Hadoop and Spark,” said David Chaiken, CTO, Altiscale. “Hadoop is a large ecosystem that includes storage, security, and multiple ways to process your data. Spark is a computing paradigm that fits into and runs best in the Hadoop ecosystem. At Altiscale, customers get the full benefit of leveraging Spark’s complementary strengths.”

Altiscale customers are already using both MapReduce and Apache Spark on the Altiscale Data Cloud. For example, one customer, who had already been using MapReduce to perform regular customer billing analysis upon tens of  millions of customer records, had a need for fast, daily geographic analysis and reporting. Apache Spark was quickly added to their Altiscale Data Cloud subscription and the customer expects to expand its use.

Apache Spark runs reliably in the Altiscale Data Cloud, where all aspects of hardware, networking, security, software, tools, and operations are optimized for the processing and analysis of massive data sets. Learn more HERE. The Altiscale Data Cloud with Apache Spark is available now.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind