The Apache Software Foundation Announces Apache™ Tajo™ v0.9

Print Friendly, PDF & Email

tajo_logoThe Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 200 Open Source projects and initiatives, has announced the availability of Apache™ Tajo™ v0.9, the advanced Open Source data warehousing system in Apache Hadoop™.

With Apache Tajo v0.9, our goal of bringing traditional SQL performance to massive data is a step closer,” said Hyunsik Choi, Vice President of Apache Tajo. “We really enjoyed working to improve Tajo’s leading-edge native SQL support, and its lightning performance across divergent workloads. We’re very excited about the release of Apache Tajo 0.9.”

Dubbed an “SQL-on-Hadoop” solution, Apache Tajo is used for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large data sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities. Overall, Apache Tajo v0.9 delivers more powerful native SQL support on an even faster platform.

We have been determined from the outset to find ways of boosting query processing speed without compromising system robustness and solution accessibility,” said Jihoon Son, member of the Apache Tajo Project Management Committee. “In practice, that means using cutting-edge query techniques and processing algorithms as our source of ‘speed’, meanwhile maintaining three key features: Fault tolerance, the ability to fully utilize working memory and write to disk, and data source neutrality. We think those design choices give Apache Tajo long-run flexibility and coherence.”

Features and enhancements in Apache Tajo v0.9 include:

  • More comprehensive and powerful SQL capabilities, such as TIMESTAMP, DATE, TIME, and INTERVAL type support, as well as WINDOW functions, OVER clause support, and multiple distinct aggregation;
  • Performance improvements, such as offheap sort algorithm for ORDER BY and Runtime code generation for evaluating expressions push the boundaries of massive data query speeds;
  • Improvements to the hash shuffle I/O, boosting bottom-line speeds by 200-300% on “heavy”, complex queries;
  • Enhanced Hadoop integration, including support for Hadoop 2.2.0 up to Hadoop 2.5.1, and expanded Hive Metastore access;
  • Improved catalog backup and restore feature, as well as accessibility enhancements streamline performance across disparate technology environments.
Apache Tajo is part of the Apache Hadoop ecosystem at a variety of organizations, including Gruter, Korea University, and NASA JPL’s Radio Astronomy and Airborne Snow Observatory projects, among others. At SK Telecom, South Korea’s largest wireless carrier, Apache Tajo has undergone a brutal testing regimen, where it has had to deal with telco-sized data stores, node growth and cluster expansion, and a grueling company-wide data analysis and reporting schedule. “The fast processing capabilities of Apache Tajo have allowed us to build an entirely new big data warehouse and OLAP system,” said Eddy Park, Hadoop-based Data Warehouse Project Manager at SK Telecom. “Apache Tajo now plays a vital role in data-driven decision making at our company.”

We run Apache Tajo in-house on 30 cluster nodes in order to power Seenal, our social network analysis service that supplies social media insight to government and corporate clients,” said Hyoungjun Kim, CTO of Gruter. On the one hand, this involves running complex ETL processes on hundreds of gigabytes of data per day in order to detect market and opinion signals. On the other hand, analysts and project teams often need to run very specific analyses on much smaller data sets. Tajo is able to handle the full spectrum of Seenal’s data processing and query needs at high speed and with minimal fuss.”

Availability and Oversight

As with all Apache products, Apache Tajo software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Tajo, visit and

Speak Your Mind