Apache Tajo Update Offers Open, Relational Big Data Warehousing Solution
Now here is an interesting open source project that has been flying under the radar: The Apache Software Foundation (ASF), which stewards more than 350 open source projects and initiatives, announced the availability of Apache Tajo v0.10.0, the latest version of the advanced open data warehousing system in Apache Hadoop.
Apache Tajo is used for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large data sets stored on HDFS (Hadoop Distributed File System) and other data sources. "By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities," notes the announcement from Apache.
Although it doesn't grab a lot of headlines, Tajo is in use at numerous organizations worldwide, including Gruter, Korea University, Melon, NASA JPL Radio Astronomy and Airborne Snow Observatory projects, and SK Telecom for processing Web-scale data sets in real time.
"Tajo has evolved over the last couple of years into a mature 'SQL-on-Hadoop' engine," said Hyunsik Choi, Vice President of Apache Tajo. "The improved JDBC driver in this release allows users to easily access Tajo as if users use traditional RDBMSs. We have verified new JDBC driver on many commercial BI solutions and various SQL tools. It was easy and works successfully."
Tajo v0.10.0 reflects dozens of new features and improvements, including:
- Oracle and PostgreSQL catalog store support
- Direct JSON file support
- HBase storage integration (allowing users to directly access HBase tables through Tajo)
- Improved JDBC driver for easier use of JDBC application
- Improved Amazon S3 support
A complete overview of all new enhancements can be found in the project release notes at https://dist.apache.org/repos/dist/dev/tajo/tajo-0.10.0-rc1/relnotes.html
"I'm very happy with that Tajo has rapidly developed in recent years," said Jihoon Son, member of the Apache Tajo Project Management Committee. "One of the most impressive parts is the improved support on Amazon S3. Thanks to the EMR bootstrap, users can exploit Tajo's advanced SQL functionalities on AWS with just a few clicks."
Apache Tajo software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Tajo, visit http://tajo.apache.org/ and https://twitter.com/ApacheTajo