LinkedIn Open Sources "Pinot" for Powerful Data Analytics

by Ostatic Staff - Jun. 12, 2015

When it comes to new open source tools that can make a difference, it's wise to look to some of the tech companies that regularly open source their own in-house platforms and tools. Just witness Netflix, which has open sourced troves of useful cloud utilities. Facebook and Google have release a lot of useful tools as well.

Now, LinkedIn has open-sourced Pinot, its own real-time distributed analytics and datastore infrastructure aimed at low-latency data scaling.

According to a LinkedIn announcement:

"We’ve been using it at LinkedIn for more than two years, and in that time, it has established itself as the de facto online analytics platform to provide valuable insights to our members and customers. At LinkedIn, we have a large deployment of Pinot storing 100’s of billions of records and ingesting over a billion records every day. Pinot serves as the backend for more than 25 analytics products for our customers and members. This includes products such as Who Viewed My Profile, Who Viewed My Posts and the analytics we offer on job postings and ads to help our customers be as effective as possible and get a better return on their investment."

"In addition, more than 30 internal products are powered by Pinot. This includes XLNT, our A/B testing platform, which is crucial to our business – we run more than 400 experiments in parallel daily on it."

 Does the name Pinot refer to Pinot Noir? Indeed it does refer to the wine and the varietal grape, which is known for its toughness.

How do you interface with Pinot? The post notes the following:

"For ease of use we decided to provide a SQL like interface. We support most SQL features including a SQL-like query language and a rich feature set such as filtering, aggregation, group by, order by, distinct. Currently we do not support joins in order to ensure predictable latency. We leveraged Apache Helix as the control plane for cluster-wide coordination."

That remark about Apache Helix is a good reminder that if you do anything online, you are probably being assisted by Apache tools and platforms.

The source code for Pinot is available on Github under Apache 2.0 License. Documentation that covers getting started, design and how to use Pinot is published on the project wiki. LinkedIn is also soliciting feedback on the project.