Pursuing an RDF Epiphany
I've worked with XML and related web technologies for a while now, and I've struggled to fully grasp or otherwise grok the Resource Description Framework. It seems to have an unfortunate taboo of being too complicated, esoteric, or impractical. At first glance, it can appear to be a solution looking for a problem.
The main question that came up for me has always been why one would use RDF (or RDF/XML) instead of a plain, non-RDF XML vocabulary for data transfer, sharing, integration, etc. I'll attempt to answer that question...primarily for my own edification...and then a real world example will be briefly discussed to hopefully dislodge the lingering RDF monkey from my back.
Flexibility: RDF is the Model
The plain XML vs. RDF question was posed by Leigh Dodds
a while ago, with a few members of the RDF community responding.
One of the most concise answers was from Shelley Powers:
RDF is based on a domain-neutral model that allows one set of statements to be merged with another set of statements, even though the information contained in each set of statements may differ dramatically. Plain XML is hierarchical and only needs to be well-formed (and hopefully valid against a schema); extracting anything semantically within the document is dependent upon some shared, explicit understanding between consumers and producers of the XML.
In contrast, RDF is composed of simple statements (subject-predicate-object triples) which facilitate immediate consumption without having
to worry about structure or order (i.e, elements, child nodes, attributes, etc.).
RDF is the model. The processing of triples is highly predictable and static, reducing the effort involved when things change.
Plain XML has an ever varying model depending on the vocabulary - only its syntax remains the same.
What happens if a plain XML schema changes, structurally and/or semantically? Combined with an environment of distributed data and multiple parties involved
in owning or generating that data, the time and effort required to accommodate modifications could be quite high.
Efficient Integration of Decentralized Data
As alluded to above, perhaps the most significant aspect of RDF is how the basic triple model enables the merging and integration of decentralized data.
The processing of triples from two or more sources (and with different RDF vocabularies all together) can occur immediately thanks to namespaces.
Integration of decentralized data also requires the ability to uniquely identify resources.
RDF's reliance on URIs (which by definition and nature of the Internet must be unique) provides this uniqueness in a simple and elegant manner.
Graph Based Data Models
Data sets that adhere to a basic graph model are especially well suited for representation in RDF.
The simple hyperlinked characteristics of RDF
allow loose coupling and late binding of resources directly in the RDF model.
The object in a triple statement is often another resource with its own URL, effectively creating a relationship between resources that may not
reside in the same domain. In addition, the RDF construct rdfs:seeAlso offers extension and linking of other sets
of RDF data that exist elsewhere on the web.
Plays Nicely with RESTful Web APIs
RDF fits nicely with web services adhering to a REST architecture.
In an article about connecting social content services,
Leigh Dodds points out the complimentary features of RDF and REST:
as RDF uses URIs as the means of identifying resources, the API URL structure and the response format can be closely related.
Not that a plain XML vocabulary does not have a place in RESTful web APIs, but both RDF and REST are inherently "resource" centric
and can result in a more elegant and flexible service.
DOAP & the OSS Community
My interest in RDF has been piqued by the DOAP vocabulary created by Edd Dumbill, which describes open source software projects. A basic goal of DOAP is to allow people managing a project
to maintain and control project meta data on their own terms and in one place...and, in theory, avoid the time and effort involved in notifying various
repositories or services that updates have occurred.
An interesting project called DOAPspace started by Rob Cakebread is a DOAP repository being actively seeded from freely available project data from sources such as Freshmeat and SourceForge. Rob also has a solution (doapurl.org) to provide an authoritative source of DOAP project URLs following the model of Persistent URLs (PURLs). DOAP URLs will essentially be permanent, allowing authorized project members to edit the PURL-like DOAP URL if the actual project URL they control ever changes. DOAPspace can then reference doapurl.org managed URLs. These services are basically a platform for enabling the decentralized nature of DOAP and allowing project members to maintain their project data. However, there is one more critical piece here - notifying interested parties and services of DOAP updates. It involves an intermediary service called Ping the Semantic Web (PTSW). When DOAP is updated, the PTSW service can be pinged and the update event will be archived and time-stamped. DOAPspace (or any other service) can then use PTSW to learn of any DOAP update events.
ossmosis, a nascent web service I've briefly mentioned before, is in the same 'semantic' realm as it were and the role of DOAP with respect to ossmosis is evolving. The service is focused on contextual aspects of OSS projects and people, and we hope to contribute to (as well as benefit from) the emerging DOAP friendly OSS community. I hope to write a bunch more on this in the future when various pieces and thoughts have solidified.