New Census Project Collects Data on Open Source Projects, Bolsters Security

by Ostatic Staff - Jul. 10, 2015

The Linux Foundation Core Infrastructure Initiative (CII), backed by companies like Google, Facebook, Salesforce, HP, and others, has announced a new Census Project that automates the collection and analysis of data on different open source projects, ultimately creating a risk score for each project based on the results. It's not the first approach to auditing open source usage, but has many unique aspects.

One of the goals is the help the tech industry find an efficient barometer for assessing software from a security point of view. Here are more details.

The Census  Project, developed by David Wheeler and Samir Khakimov of the Institute for Defense Analyses (IDA), is live now and was co-funded by CII  to automate analysis on a large number of open source projects to come up with a quick way to prioritize which projects to look at more closely. The Census Project calculates a "risk score" based on a number of metrics about the project, some of which are relatively static (language, website, network access) and some of which change over time (contributor count and popularity). 

According to an announcement post:

"The Heartbleed vulnerability in the open source software (OSS) program OpenSSL had widespread impact and serious ramifications. It led to the formation of the multi-million dollar Core Infrastructure Initiative backed by The Linux Foundation and industry leaders like Amazon Web Services, Facebook, Google, IBM, Microsoft."

"The Census Project expands on the CII’s efforts to collaboratively identify and fund critical open source projects in need of assistance. It automates the collection and analysis of data on different open source projects, ultimately creating a risk score for each project based on the results. Projects with a higher ranking are especially in need of reinforcements and funding; and, as a result, CII will consider such projects priority candidates for funding. A high score means that the project may not be getting the attention that it deserves and that it merits further investigation."

“Measuring software security is an ongoing struggle that’s notoriously difficult given missing or messy data,” said Jim Zemlin, Executive Director at The Linux Foundation. “There’s no perfect set of metrics to guarantee that software is secure or not. The Census Project brings the power of the open source collaboration to help fill this massive gap, which will provide a useful barometer for assessing software from a security point of view.  We look forward to feedback on the effort in order to improve the census itself and subsequently the software that we all depend on for our privacy and security.”

Full source and data for the project are available on GitHub, and developers and security experts are invited to participate in The Census Project. That includes experimenting with different metrics, providing corrected data, proposing new projects to include in the evaluation, and suggesting alternative formulas for combining the data. Anyone can issue a pull request with suggested changes from the most successful alternatives.

Supporting software for capturing data, sourced from the Black Duck Open HUB (formerly Ohloh), a free online community and public directory of free and open source software (FOSS), is written in Python by Samir Khakimov of IDA. The code is released under the open source MIT license. Black Duck, of course, is well-known for its history of open source audits and metric tracking.