Apache Flink, an Unsung Big Data Tool, Arrives in Version 1.0
Are you familiar with Apache Flink? Not everyone is, but Flink is competing with tools like Apache Spark in the Big Data space, and has released its first API-stable 1.0 version this week. Flink came from Berlin’s Technical University, and it was previously known as Stratosphere before it was added to Apache’s incubator program.
Like Spark, Flink is essentially positioned as a possible improvement on Hadoop’s MapReduce technology. Spark is primarily for in-memory processing of batch data, while Flink emphasizes the streaming data model. Here are more details.
"The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production. We encourage everyone to download the release and check out the documentation."
"The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink’s API features very flexible window definitions on data streams which let it stand out among other open source stream processors."
"Flink provides more efficient memory processing than Spark since it has a memory management system that reduces the amount of garbage collection performed by the JVM. Spark has done a lot of work to address these issues via its Project Tungsten initiative, but Flink implemented such ideas far earlier in its lifecycle. Any state data that needs to be stored when processing a stream is held in an instance of RocksDB, an open source key-value store developed by Facebook."
"Flink is also likely to eclipse Apache Storm, a stream-processing system with a broad ecosystem of development. Users can take Storm's topologies and run them in Flink to transition between the two."
"Overall, we have seen Flink grow in terms of functionality from an engine to one of the most complete open-source stream processing frameworks available," Flink's developers said. "The community grew from a relatively small and geographically focused team, to a truly global, and one of the largest big data communities in the the Apache Software Foundation."