Some academics have urged me to consider using an triple-store as a back-end for some of our websites, as oppose to our normal MySQL approach. I’m not convinced, but it’s an interesting challenge. I started by looking into what common “patterns” we use SQL which we would need to replicate in RDF. Or change how we approach problems in the first place.
Our usual MySQL “patterns”
- Create a record from <object>
We create record, which is in effect a serialisation of an object. Most often it represents a human, an account, an event, an organisation or an article (text + metadata). We use the database to generate a unique key for the item, in the current context. Generally an integer. In MySQL we use AUTO_INCREMENT for this, but every SQL DB has a varient.
- Delete record with <ID>
- Update record with <ID> to match <object>
- Retrieve record with <ID>
- Find/retrieve records matching <criteria>
Update can reasonably be abstracted to “delete then create” so lets ignore it.
“Find” and “retrieve” require some new techniques, but are not a big concern.
My current understanding is that when adding a set of triples you can say that they are all part of a “graph” with URI <x>, and later you can remove or replace all URIs from graph <x>.
The one thing entirely missing is the ability to generate new integers in a sequence.
I’ve been given two suggested solutions by the experts…
UUID
Suggestion one, to use UUID (universal unique IDs) or hashes. But the problem is, I want to use these in URLs and URIs and I want to use http://webscience.org/person/6.html not http://webscience.org/person/e530d212-0ff1-11df-8660-003048bd79d6.html
Flock a file
A second suggestion, was to flock a local file containing the next value. (lock file, read file, update file, unlock file). This would work, but I want the current position in each sequence to be stored with my other data, and accessed/modified using the same rights as can read/write the triple store. That’s what I’m used to with MySQL.
My Idea 1: Sequence Service
My first idea, was to create a stand alone service which could run on the same server as the triple store, and you could query it, via HTTP or command line, for a new integer in an sequence. Sequences could be identified via a URI.
http://dbserver:12345/next_in_sequence?seq=http://webscience.org/people_sequence
Which would return “23” then “24” etc. The locking code could be handled in the sequence server, and the assumption would be that only trusted parties could connect (like SPARQL). This service could work by:
- Locking (all requests processed sequentially)
- Querying the triple store for <http://webscience.org/people_sequence> <currentValue> ?value
- Replacing the triple with ?value+1
- Unlocking
- Returning ?value+1
While this is a bit icky, it does mean that my data remains stored in one place, including the state of each sequence.
What this doesn’t do is provide one access point. All SQL implementations provide a solution for this, and I suspect that, long term, so will triple stores. But I can’t see the purists liking it going through the same access interface as it’s clearly a hack.
Non technical concerns with RDF back-ends
On a non-technical note, I’m also concerned that an RDF+PHP solution is not very maintainable. You can’t easily hire someone with these skills yet.