How Fast is Your Hadoop? Here's How to Measure That

by Ostatic Staff - Dec. 18, 2015

Hadoop is certainly on a roll. Allied Market Research has forecasted that the global market for Hadoop along with related hardware, software, and services will reach $50.2 billion by 2020, propelled by greater use of raw, unstructured, and structured data. At the same time, there are many questions being raised about Hadoop's complexity and what to do about the shortage of workers familiar with it.

Many organizations are also wrestling with how to quantify the performance they are actually getting from big data tools like Hadoop and Spark. On that front, there is good news. The Transaction Processing Performance Council (TPC) has  announced two new additions to its growing arsenal of industry-standard benchmarks: TPC-DS 2.0 and TPCx-V. TPC-DS 2.0 is billed as "the first industry-standard benchmark for SQL-based Big Data systems, including Hadoop and Apache Spark-based systems, as well as relational database management systems (RDBMSs)." It could provide a standard for quantifying big data performance. 

The TPC-DS benchmark is crafted to measure query response time, query throughput, data integration performance and data load for a given system configuration. These are all critical metrics when using tools like Hadoop and Spark. It executes SQL queries with various operational requirements and complexities (e.g. ad-hoc, reporting, iterative OLAP and data mining), and periodically synchronizes with source databases through maintenance functions. 

“TPC-DS 2.0 marks the next major release of TPC-DS,” said Meikel Poess, chairman of the TPC-DS committee. “It is the world’s first industry-standard benchmark designed to measure performance of SQL-based Big Data implementations, and there are simply no comparable alternatives available today. Developing Version 2 was a substantial undertaking – and ultimately a highly rewarding accomplishment – as both the data and execution models are extraordinarily complex.”

Additional information on TPC-DS 2.0 is available via the following URL: http://www.tpc.org/tpcds/default.asp

 And, in case you are wondering, the TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate performance benchmarks to industry.