Skip to content


Publishing CSV to RDF part 2

OK. Having realised I was dangerously on path to writing my own language, I’ve revised my plan. My current thinking is that I’ll make it a two step process.

Step one converts the CSV/Excel/Whatever into an XML file, with optional extras included to allow local values to be set.

Then this will be processed using an XSLT file, which is already well supported. I’m not a huge fan of XSLT but you need to think carefully before embarking on inventing an arbitrary new language. The goal of this system is to make it easier to maintain for non-research organisations (like our central IT) so using an established technology makes it easier to ensure you can find someone to maintain the system.

That said, I’m not sure what to make the XSLT output. It really needs to be XML (although I *think* you can do other stuff, it’s more fiddly. So assuming the XML restriction my options are:

  • Any RDF/XML
  • Subset of RDF/XML
  • XML format I’ve not yet heard of
  • XML format of my own invention

This last one had me tempted for a while.. something like:

 <triple>
   <subject>http://....</subject>
   <predicate>http://....</predicate>
   <object>http://....</object>
   <datatype>http://....</datatype>
 </triple>

… but I think I’m suffering from an attack of over-engineering. So what it should output is valid RDF+XML, which my tool can then validate & process into the triple-serialisation of your choice.

Posted in RDF.

Tagged with .


4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Jakob says

    How about n-triples? It’s a standard RDF serialization, which can even be handled with simple command line tools or a *very* simple regular expressions. By the way it’s much like CSV 😉

    • Christopher Gutteridge says

      I thought about that but it’s an arse to produce n-triples using XSLT. I did consider inventing some XML format (or just using reified triples) eg. (using [] as wordpress eats angle-brackets)
      [rdf:Statement]
      [rdf:subject]http://...[/rdf:subject]
      [rdf:predicate]http://...[/rdf:predicate]
      [rdf:object]http://...[/rdf:object]
      [/rdf:Statement]

      however that then loses you the handy URI escaping. Converting correct rdf-xml to n-triples is trivial if you install ‘rapper’. Also, using triples means you have to output every URI for every triple which makes XSLT get complex and ugly.

      The whole point of using XSLT is that it is pretty cross-platform. You could easily take the configuration files from this system and use them in Java, .net or whatever.

      … | rapper -o ntriples – http://example.com/

      Works fine!

      More to follow on this soon, much progress has been made!

  2. Jakob says

    Well the cross-platform-argument always depends on the platforms you want to cross 😉 You could also argue for C, Java, JavaScript etc. depending on the exact use cases. Many users already give up if you confront them with concepts like “installation” or “configuration”, so a web service to upload data may be easier?

  3. Alexander Dutton says

    There’s TriX, too: http://en.wikipedia.org/wiki/TriX_%28syntax%29 .

    I have no idea what support for it is like.



Some HTML is OK

or, reply to this post via trackback.