RDF+XML is a much loathed format.
It is a way of writing RDF data (triples of subject,predicate,object) in XML.
RDF+XML is not RDF. It’s a way of encoding RDF. There are better ones, such as n3, but it’s the one everyone expects you to provide, so you better learn the basics.
RDF+XML is way too big. You can do everything lots of ways. That makes things confusing, so I figured I’d write a guide to the bare minimum you need to know to create valid RDF+XML.
The basics
The subject & predicate are always a URI.
The object is a URI /or/ a literal value. If it’s a literal it may have an associated data type URI or a language code, but not both.
predicate is just a fancy word for “relation”. It relates the subject to the object, eg. Bob hasFriend Jill. (Note that you can’t assume Jill has a friend Bob, it’s a one way thing. Sorry Bob)
The correct mimetype is “application/rdf+xml”
How to write RDF+XML
This is going to cover the smallest learning curve approach. There’s lots more to RDF+XML but it’s all optional sugar. Don’t worry about it.
I’m assuming you already know what actual triples you want to write. If not this isn’t the correct tutorial for you yet.
An RDF document is an XML document and so always starts with
<?xml version=”1.0″ encoding=”utf-8″ ?>
If in doubt, always encode it as utf-8.
Here is a minimum RDF document defining no actual data:
<?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > </rdf:RDF>
See that bit which says “xmlns:rdf” that defines that any tag starting with “rdf:” is in the “namespace” http://www.w3.org/1999/02/22-rdf-syntax-ns#
That means that the unique identifier for that element is http://www.w3.org/1999/02/22-rdf-syntax-ns#RDF
If you want to use any predicates, and you do, you’ll need to define namespaces for them in the opening tag. Most common namespaces have a widely accepted prefix.
To fine the standard prefix for a namespace you can look it up on prefix.cc which is handy. Don’t use a different prefix without a good reason. If you can’t find the namespace on prefix.cc then pick something sensible. If you are writing a document with a bunch of namespaces, prefix.cc has a very funky shortcut… try this link:
Neat, huh? You can just cut and paste it. This saves time and typos. You might not notice missing a “#” from the end, but a computer will treat it as a completely different namespace!
OK. Now to encode some data. Here’s my data. I’m going to use the prefixes to keep it readable:
- My name is Marvin
- http://example.com/marvin#me
- foaf:name
- “Marvin Fenderson”
- I am a Person
- http://example.com/marvin#me
- rdf:type
- foaf:Person
- My hat size is 10
- http://example.com/marvin#me
- myprefix:hatSize
- 10 ( type is http://www.w3.org/2001/XMLSchema#int )
- the big head club is an organization
- http://example.com/bigheadsclub#org
- rdf:type
- foaf:Organization
- The big head club has a member who is me!
- http://example.com/bigheadsclub#org
- foaf:member
- http://example.com/marvin#me
- The big head club is called “The Big Head Club” in English.
- http://example.com/bigheadsclub#org
- foaf:name
- “The Big Head Club” (in English)
OK, that’s enough data. Note that because predicates are one way sometimes you say things backwards. I wanted to say “I’m a member of the club”, but because I’m using a predicate that relates organizations to members, I have to do it that way around.
Note that many things (like Organization in FOAF) have the US spelling. Don’t correct it, computers want an exact string. If you feel annoyed add a label to stuff with a en-gb language version of the label!
Here’s how to encode the above: For each distinct “subjects” (the #me and the #org are the “subjects” in the above data, 3 triples start with each), create a sub-element of the top level rdf:RDF element. Call these sub-elements <rdf:Description> and give them an rdf:about attribute which is the URI of the subject:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://example.com/marvin#me">
</rdf:Description>
<rdf:Description rdf:about="http://example.com/bigheadsclub#org">
</rdf:Description>
</rdf:RDF>
OK! That’s still valid RDF (assuming it’s inside the <rdf:RDF> element), but it still contains no data. We need to relate Marvin to the Big Head Club.
For triples where the object is a URI (which indicates they relate the subject resource to another resource, not just a number or string), add them as a tag matching the predicate. The namespace must have been correctly aliases in an xmlns:xxxx=”yyyy”. The element should close itself at once and contain the attribute rdf:resource=”URI” where URI is the subject of the triple. Note that you don’t use the short version of the namespace in rdf:resource or rdf:about, just in the predicates relating subjects to objects.
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:hats="http://example.com/hats/ns/" > <rdf:Description rdf:about="http://example.com/marvin#me"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" /> </rdf:Description> <rdf:Description rdf:about="http://example.com/bigheadsclub#org"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization" /> <foaf:member rdf:resource="http://example.com/marvin#me" /> </rdf:Description> </rdf:RDF>
OK. The last bit is to add in the literals; the strings and the number. Create a tag of the same name as you would for linking to a resource but this time don’t close it at once, but wrap it around the value. If their is a language to express for a string add an xml:lang=’xx’ attribute, when xx is the language code. Alternatively, if you need to express a dataype, use rdf:datatype=”xxx”.
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:hats="http://example.com/hats/ns/" > <rdf:Description rdf:about="http://example.com/marvin#me"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" /> <foaf:name>Marvin Fenderson</foaf:name> <hats:hatSize rdf:datatype="http://www.w3.org/2001/XMLSchema#int">10</hats:hatSize> </rdf:Description> <rdf:Description rdf:about="http://example.com/bigheadsclub#org"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization" /> <foaf:name xml:lang="en">The Big Head Club</foaf:name> <foaf:member rdf:resource="http://example.com/marvin#me" /> </rdf:Description> </rdf:RDF>
The order of relations inside a description, and the order of the descriptions does not matter. I think it’s nice to put ‘types’ and ‘labels’ near the top of each description. Relations can be repeated.
At this point you could add an additional rdf:Description, the about of which is the URL of the RDF document. This allows you to make statements about the document as a whole, such as who wrote it, what license it is, what it’s called etc. There’s still no agreement on what is useful, but a title and license are handy. Use rdfs:label to label it.
While it’s not strictly required, it’s helpful to add a rdfs:label and rdf:type to describe every URI that is a subject or object in the document, not counting the objeects of rdf:type. Some people say this is overkill, but it does help debugging.
Checking your RDF
Don’t skip checking it. I keep running into broken RDF produced by people who never sanity checked it.
The best way to check your RDF is to put it on a URL and poke things at it. The first thing I usually do is load it in Firefox and check that it’s valid XML. If it’s not that’s a dealbreaker before we start. Here’s a link to an online copy of our file
If you load it in firefox it’ll tell you about any XML errors, other browsers are not so helpful.
Once you’ve done that, you should load it into an RDF aware viewer. I use the Graphite Quick & Dirty RDF Browser which I wrote. Here’s what the data looks like if you view it in the browser.
The rdf:type’s have been spotted and are shown on the right hand top corner of each box. The foaf:names have also been highlighted. This helps you spot obvious mistakes. Also, because we’ve got a valid label for Marvin, the list showing members of the organisation is showing his name rather than the URI (hover the mouse to see the URI). This is also handy in spotting obvious mistakes.
If the Graphite Browser can’t parse your RDF it’ll link you to the W3C RDF Validator which is sometimes helpful. Also double check your xmlns definitions. Missing a character off the end will cause lots of problems!
Better than using my generic RDF viewer, if available also check your data in one that is designed to understand the namespaces you’re working with. There’s not many around yet, but that will change. Personally I find the existing ones quite confusing.
If you don’t check your RDF+XML it’s bound to be buggy.
What I’ve skipped
Almost everything! But the only *useful* thing I’ve skipped is how to write bNodes (resources without an associated URI), that’s to keep this simple and because I have a dislike for them.
RDF+XML offers huge numbers of short cuts, but you don’t need any of them. They just make it easier to make mistakes. Sod ’em.
How to read RDF+XML
Simple. Use a library. Most programming languages now have a good library for parsing all the crazy crap in RDF+XML. Don’t bother trying to do it yourself, it’s a waste of time and there’s more important work to be done!
In PHP, I use ARC2. It handles lots of other RDF formats as well, and so I have one less problem to worry about. I just point it at web addresses and it sucks down triples. How cares how they were encoded?
N3
This is another way to encode the same data. It also has some shortcuts, but is much more elegant. Check out the same data:
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix hats: <http://example.com/hats/ns/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.com/bigheadsclub#org> a foaf:Organization; foaf:member <http://example.com/marvin#me>; foaf:name "The Big Head Club"@en .
<http://example.com/marvin#me> a foaf:Person; hats:hatSize "10"^^<http://www.w3.org/2001/XMLSchema#int>; foaf:name "Marvin Fenderson" .
End
I’m sure that I’ve made a mistake or two myself. Suggestions on how to improve the above would be welcome.
This article is copyright 2010 Christopher Gutteridge and is released as CC-by, you are free to reuse it and modify it so long as you attribute me (my name and a link to this blog).
Thanks for the http://prefix.cc/foaf,skos,gr.xml trick. That is genius!
For readability: use colo(u)rs to highlight the info just added.
For completeness: link to an N3 tutorial, link to a language codes reference, and possibly to more info on common datatypes.
Random thoughts:
*In “We need to relate these things to other things.” s/these things/Marvin and the Big Heads Club
*Watch the open quotes (there are some right smart-quotes where you want left ones)
There is often confusion around N3 and Turtle (both of which are a much easier format to author by hand, and almost readable too 🙂
You can actually do wacky things in N3 that are outside of RDF, Turtle is a subset of N3 that is the useful stuff.
Full syntax etc is a little scary to read, but the examples are useful. http://www.w3.org/TeamSubmission/turtle/
If you’re on a linux based platform the Raptor library (http://librdf.org/raptor/) comes with an extremely useful tool called “rapper” which can both check for parse/syntax errors and convert between a variety of RDF formats (eg rdf+xml, turtle, RDFa (parser))
sudo apt-get install raptor-utils
Hey thanks for the write-up, clean and straight to the point! I’ll definitely pass it on to less rdf-aware colleagues!