Apache Delivers Open Source Java Tool for Working with PDFs

by Ostatic Staff - Mar. 21, 2016

As I noted recently, the Apache Software Foundation, which incubates more than 350 open source projects and initiatives, has been steadily advancing a number of important open source projects. Now, the foundation has announced the availability of Apache PDFBox v2.0, an open source Java tool for working with Portable Document Format (PDF) documents.

PDF, of course, was first released as a format by Adobe Systems in 1993, and became an ISO International Standard - ISO 32000-1 in 2008.

Apache PDFBox allows for the creation of new PDF documents, manipulation, rendering, signing of existing documents and the ability to extract content from documents. In addition, PDFBox includes several command line utilities. In February 2015, the project became the first Open Source Partner Organization of the PDF Association.

The Apache PDFBox library enables users to create new PDF documents, manipulate existing documents, extract content, digitally sign, print, and validate files against the PDF/A-1b standard. Its command line utilities include encrypt, decrypt, overlay, debugger, merger, PDFToImage, and TextToPDF.

A guide to migrating to v2.0 is available at http://pdfbox.apache.org/2.0/migration.html , with community support at http://pdfbox.apache.org/mailinglists.html

"PDF is a very popular and easy to use format for document exchange. It is used by millions of people every day, however the format itself is quite complicated and a real challenge to write a piece of software to work with it," said Andreas Lehmkühler, Vice President of Apache PDFBox. "This new major release of PDFBox includes a lot of improvements, fixes and new features which should make the life easier for our users."

"We thank all the people from our small but fine community for their support," explained Lehmkühler. "Special thanks also goes to our fellow colleagues from the Apache Tika project for their cooperation in stress-testing with a corpus of 250,000 PDF files."

"We are grateful for the Google Summer of Code program," said PDFBox committer Tilman Hausherr. "The project allowed us to hire students to improve 3D rendering and the PDFDebugger stand-alone application, which also sped up our own bug finding." 

"Apache PDFBox v2.0 is a significant milestone as it took us several years to complete," added Lehmkühler. "This long-awaited release is the collective achievement of more than 150 individuals who have contributed code to date. Without their frequent contributions it wouldn't be possible to drive a project like PDFBox."

For downloads, documentation, and ways to become involved with Apache PDFBox, visit http://pdfbox.apache.org/