Jena, and the Open Source Semantic Web

by Guest Editor - May. 13, 2008Comments (1)

By Raj Bala

Many people, even technology folks, really don't know what to make of the Semantic Web. Then there are several camps that disagree on the meaning of various Semantic Web terms. Now that the general concept is finally getting some traction, there is even some groupthink going on about moving away from the current moniker. Some people want the moniker to be "Linked Data Web," because that's supposedly a better description of the technologies and components surrounding the Semantic Web. We'll continue to refer to it as most people understand it, and the open source community is pitching in, through the Jena project.

Here's a quick summation of the Semantic Web goal: Data that is better described than other data will serve its users better. When hyperlinks are marked up with more descriptive HTML, machine agents and web crawlers can understand these descriptions and do a better job of giving users what they need.

The idea is that these descriptions are either broadly understood already, or extensions of what is broadly understood. In either case, when descriptions are published appropriately, machine agents are better informed, and thus so are their users. That's the sort of low-tech, but useful definition of the Semantic Web that XHTML and Microformat folks favor.

Then there's the camp unofficially led by Sir Tim Berners-Lee. This camp has a more formal approach to the Semantic Web, and introduces a correspondingly elevated level of complication. They think there's a better way to manage lots of marked up data near the storage level. Laying data elements out based on their relationships to each other allows for applications to infer complex relationships that could be valuable to applications users.

There's not much to the first described low-tech approach. Mark a web application's rendered HTML view with additional data that's broadly known, and you're finished.
As far as the Tim Berners Lee camp goes, there are some great open source components that developers can use to build Semantic Web applications under the formal approach. Leading the charge there is Jena, an open source Semantic Web framework written in Java. It's a project born out of HP Labs in the U.K. with strong support from them and the community of developers at large.

Jena is essentially an API that provides developers a mechanism to insert and query RDF encoded triples. The resultant data store is, creatively, called a “triple store.” The word triple comes from the three parts of the stored data.

SPARQL, part of Jena's ARQ module, provides query access to RDF encoded triples. Jena provides functionality equivalent to the SPARQL W3C recommendation, with a few extensions for added functionality.

Like most of the open source database-persisted SPARQL implementations, there are limitations. It's essentially a query interface built on top of SQL, so it's only ever as fast as the underlying database subsystem. With disk I/O still being one of the largest applications constraints, SPARQL can be very slow especially with massive amounts of data.

Jena also provides a web service implementation of SPARQL called Joseki, so that remote applications can query as if they were running locally. The idea is that application developers can open up their RDF triple stores to third parties by providing them tight integration over HTTP.

Very few companies have been able to successfully scale a Semantic Web triple store for millions of users. While the Semantic Web has some mathematically sound theoretical foundations, there are still very few (if any) real-world implementations of this stuff scaling well.

If anyone solves the scaling issues, it will likely be the open source community surrounding Jena. That community has emerged as one of the most vibrant sections of the Semantic Web world, due to an enthusiastic contributory development team. If you 're interested in the Semantic Web, definitely investigate Jena.



Khürt Williams uses OStatic to support Open Source, ask and answer questions and stay informed. What about you?



1 Comments
 

I don't quite get the gist of your post. For instance, why are all Triple Stores clumped into the: Do Not Scale bucket? If there is one thing the Semantic Web gives us, it is the easy way to say: In My Humble Opinion (IMHO) or Based on What I know etc..

I am a major propoent of the label "Linked Data Web" but note that every post that I make re. "Linked Data" has the tag "semanticweb". And each Tag is linked to the DBpedia URI: http://dbpedia.org/resource/Semantic_Web via MOAT in my Blog Data Space.

Thus, you (human or machine) can discern that in my realm of discouse "Linked Data" is associated with the Semantic Web.

BTW - you refer to the "Semantic Web" as complicated (or overly so), which is a prevalent perception that continues to blur comprehension of the vision's intrinsic value, across many fronts. Facing such reality (isince perception is increasingly reality), why would a more tightly focused moniker (still related back to the root) be problematic to you?

The Linked Data Web is simply a moniker for "Linked Data on the Web". A Web of Data Objects endowed with dereferencable URIs whch enable remote Object Referencing via HTTP.

The rate of comprehenion of the Semantic Web vision has grown exponentially since the emergence of DBpedia [1], Linking Open Data Community[2], and the use of the monker: Linked Data Web [3] :-)

There is even a Linked Data Planet Conference in June [4] where I believe Andy Seaborne will be giving a talk re. Jena and related SPARQL matters.

Links: 1. http://moat-project.org/ontology 2. http://dbpedia.org 3. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenD... 4. http://www.openlinksw.com/blog/~kidehen 5. http://www.linkeddataplanet.com

Kingsley Idehen

0 Votes
Share Your Comments

If you are a member, to have your comment attributed to you. If you are not yet a member, Join OStatic and help the Open Source community by sharing your thoughts, answering user questions and providing reviews and alternatives for projects.