From Apache to Google: Notable Open Source Offerings from Tech Titans

by Ostatic Staff - Dec. 29, 2016

Each year, we at OStatic round up our ongoing collections of open source resources, tutorials, and tools. We regularly collect the best developer tools, free online books on open source topics, and newly open sourced projects.

In this post, you'll find some of the best new tools from 2016.

From NPR. Whether you do some blogging, work as a journalist or just make use of popular social media and cloud computing tools, you probably regularly need to acquire and customize publishable graphics. The good people at NPR are out to make that job easier.

They have released a collection of fully customizable, open source tools to help anyone create appealing images for social media, the cloud or websites.

The tools, dubbed Quotable, Factlist and Waterbug, are part of an overarching suite of open source offerings called Lunchbox. You can get the tools for Windows or Mac systems on NPR's blog, which includes useful and visual guidelines, or you can go to GitHub to get the source code and customize the suite.

From Facebook and Twitter. Facebook open sourced its machine learning system designed for artificial intelligence (AI) computing at a large scale. It's based on Nvidia hardware. Meanwhile, Twitter has open sourcced Diffy, which is software that developers can employ to ferret out bugs when they’re making updates to certain parts of code. Diffy is now available on GitHub here.

A Twitter blog post explains what Diffy is designed for:

"Today, we’re excited to release Diffy, an open-source tool that automatically catches bugs in Apache Thrift and HTTP-based services. It needs minimal setup and is able to catch bugs without requiring developers to write many tests. Service-oriented architectures like our platform see a large number of services evolve at a very fast pace. As new features are added with each commit, existing code is inevitably modified daily – and the developer may wonder if they might have broken something."

"As the complexity of a system grows, it very quickly becomes impossible to get adequate coverage using hand-written tests, and there’s a need for more advanced automated techniques that require minimal effort from developers. Diffy is one such approach we use."

 As for Facebook's newest open source offering,  the company's Kevin Lee and Serkan Piantino wrote in a blog post that the open sourced AI hardware more efficient than off-the-shelf options because the servers can be operated within data centers based on Open Compute Project standards.

Google's Hat is in the Ring.  On the artificial intelligence front, there is a true renaissance going on right now, and it includes a slew of new open source tools, many of which are likely to give rise to businesses built around them. For example, Google recently open sourced a program called TensorFlow. It’s based on the same internal toolset that Google has spent years developing to support its AI software and other predictive and analytics programs. You can find out more about TensorFlow at its site, and you might be surprised to learn that it is the engine behind several Google tools you may already use, including Google Photos and the speech recognition found in the Google app.

Now, Google has open sourced a "Show and Tell" algorithm to developers, who can purportedly use it recognize objects in photos with up to 93.9 percent accuracy, and help to automate smart photo captioning. It's based on TensorFlow, and its code base is on GitHub. 

From Microsoft.  Microsoft released a new open-source UWP Community Toolkit that eases app development by streamlining new capabilities (helper functions, custom controls and app services) that simplify common developer tasks.  Meanwhile, we're also noting that Microsoft has open sourced PowerShell under an MIT license and ported it to Red Hat, CentOS, and Ubuntu. The company is making the command-line shell and scripting platform available for both Linux and Mac.

MSBuild, the development platform for Microsoft's Visual Studio tools and .Net Platform, is also officially open source.  The code is available on GitHub now.

From Apache. In recent months, we've steadily taken note of the many Big Data projects that the Apache Software Foundation has been elevating to Top-Level Status. As Apache moves Big Data projects to Top-Level Status, they gain valuable community support. Only recently, the foundation announced that Apache Kudu had graduated as a Top-Level project.

Then, the news comes that Apache Geode has graduated from the Apache Incubator as well. It is a very interesting  open source in-memory data grid that provides transactional data management for scale-out applications needing low latency response times during high concurrent processing.

The Geode codebase was originally developed by Gemstone Systems in 2002. GemFire, the original commercial distribution of Geode, was first widely adopted by the financial sector as the transactional, low-latency data engine used in Wall Street trading platforms. Pivotal, which owns the GemFire technology, submitted the Geode code to the Apache Incubator in April 2015.

"We are excited to see Geode graduate from the Apache Incubator to a Top-Level Project. It's quite a feat to transform a mature commercial product into a widely adopted open source project," said Elisabeth Hendrickson, VP of Big Data R&D at Pivotal. "The committers in Geode have worked hard at building community and making the project accessible to newcomers, paving the way for developers everywhere to benefit from a proven in memory data grid technology."

Here is more on numerous other Apache Big Data projects that are moving forward:

Allura. According to the Allura project page, new features include an Admin Nav Bar, which is a an improvement on how users customize the tools of a project. There is also a new interface. Apache encourages users to read an admin toolbar post to see how easy it is to access tool configurations and add new tools with Allura.

Brooklyn. The foundation announced that Apache Brooklyn is now a Top-Level Project (TLP), "signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles."  Brooklyn is an application blueprint and management platform used for integrating services across multiple data centers as well as and a wide range of software in the cloud.

According to the Brooklyn announcement:

"With modern applications being composed of many components, and increasing interest in micro-services architecture, the deployment and ongoing evolution of deployed apps is an increasingly difficult problem. Apache Brooklyn’s blueprints provide a clear, concise way to model an application, its components and their configuration, and the relationships between components, before deploying to public Cloud or private infrastructure. Policy-based management, built on the foundation of autonomic computing theory, continually evaluates the running application and makes modifications to it to keep it healthy and optimize for metrics such as cost and responsiveness."

Brooklyn is in use at some notable organizations. Cloud service providers Canopy and Virtustream have created product offerings built on Brooklyn. IBM has also made extensive use of Apache Brooklyn in order to migrate large workloads from AWS to IBM Softlayer.

Kylin. Meanwhile, the foundation has also just announced that Apache Kylin, an open source big data project born at eBay, has graduated to Top-Level status. Kylin is an open source Distributed Analytics Engine designed to provide an SQL interface and multi-dimensional analysis (OLAP) on Apache Hadoop, supporting extremely large datasets. It is widely used at eBay and at a few other organizations.

"Apache Kylin's incubation journey has demonstrated the value of Open Source governance at ASF and the power of building an open-source community and ecosystem around the project," said Luke Han, Vice President of Apache Kylin. "Our community is engaging the world's biggest local developer community in alignment with the Apache Way."

As an OLAP-on-Hadoop solution, Apache Kylin aims to fill the gap between Big Data exploration and human use, "enabling interactive analysis on massive datasets with sub-second latency for analysts, end users, developers, and data enthusiasts," according to developers. "Apache Kylin brings back business intelligence (BI) to Apache Hadoop to unleash the value of Big Data," they added.

Lens. Apache recently announced that Apache Lens, an open source Big Data and analytics tool, has graduated from the Apache Incubator to become a Top-Level Project (TLP).

According to the announcement:

"Apache Lens is a Unified Analytics platform. It provides an optimal execution environment for analytical queries in the unified view. Apache Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores."

"By providing an online analytical processing (OLAP) model on top of data, Lens seamlessly integrates Apache Hadoop with traditional data warehouses to appear as one. It also provides query history and statistics for queries running in the system along with query life cycle management."

 "Incubating Apache Lens has been an amazing experience at the ASF," said Amareshwari Sriramadasu, Vice President of Apache Lens. "Apache Lens solves a very critical problem in Big Data analytics space with respect to end users. It enables business users, analysts, data scientists, developers and other users to do complex analysis with ease, without knowing the underlying data layout."

Ignite. The ASF has announced that Apache Ignite is to become a top-level project. It's an open source effort to build an in-memory data fabric that was driven by GridGain Systems and WANdisco.

Apache Ignite is a high-performance, integrated and distributed In-Memory Data Fabric for computing and transacting on large-scale data sets in real-time, "orders of magnitude faster than possible with traditional disk-based or flash technologies," according to Apache. It is designed to easily power both existing and new applications in a distributed, massively parallel architecture on affordable, industry-standard hardware.

Tajo. Apache Tajo v0.11.0, an advanced open source data warehousing system in Apache Hadoop, is another new Top-Level project. Apache claims that Tajo provides the ability to rapidly extract more intelligence fro  Hadoop deployments, third party databases, and commercial business  intelligence tools.

And of course, Spark and other previously announced Big Data tools overseen by Apache are flourishing. Look for many more data- and developer-focused tools to move forward at Apache in the months to come, and, look for more on the best open sourced projects from OStatic in 2017.