I’m still looking at the barriers to using an RDF triple store as the back-end for a website. I’ve discussed some of this back in February already, but the problems remain unsolved.
Our usual pattern, when designing a website, is to identify the various types of entity that will be described by pages on the site. For an academic site we have some of people, groups, projects, publications, events, articles. We then create a database table or tables for each of these and php wrapper functions to get individual records, lists of records and methods to create & update records of each type. In PHP, we have an object representing the set of items (eg. Events) and an object representing each item. The SQL is kept abstracted away as much as possible.
The PHP classes which represent an item or a list of items, has methods for mapping the data into various formats; short HTML summary, an HTML page, RDF, XML, .ics, rss, atom etc. Occasionally some fields may be not shown to the public, for example if we use the same database for some internal administration.
On some sites, we have a table which stores all revisions of each item, and a table which maps each primary_item_id to its revision_id. Previous versions should never, ever be shown to the public as they may have contained errors or information we actively do not want to be public.
What I’m interested in is how normal web developers, rather than researchers, can achieve this.
I am still imagining a system with “classes” of things, like people and events, where the PHP is configured in such a way to be able to create/retrieve/update/delete individual “records”, that each triple will belong to only one record, and that we’ll have PHP functions which retrieve data from a set of records (by abstracting SPARQL instead of SQL)
Unanswered questions:
- Internally, do we use our own namespace for the predicates or established namespaces (FOAF, SIOC etc) or a mixure?
- If we use our own namespace, do we map into common schemas (FOAF,SIOC…) for the public view of .rdf data? Do we map it on demand, or when a record is updated? Do we expose our internal namespace predicates? I don’t believe just providing a mapping and let people map it themselves is a reasonable option.
- Do we expose all of the triples? (what about ones used for administration? do we just make sure we have no secrets in the triplestore?) If so, how do we handle revisions? Have 2 triplestores — One for the public and one for admin? Or can triplestore SPARQL endpoints be configured in fancy ways?
- How do we generate brief, unique URIs for items when they are created? In my experience URIs built from any of the meaningful data in an item are a mistake, eg. surnames etc. Using uuid’s are not an option — they are ugly. http://webscience.org/person/6 is better. My previous post suggested some solutions, and Talis have a weird solution using a pool of available IDs, but I don’t regard it as a solved problem. Then again there’s no standard solution in SQL databases.
- If using tools to add value by importing/generating additional triples, how do we manage these? For example do we need to erase any of these if the records they refer to are removed or updated?
I think there are probably answers to all of these, but they need to be moved from ‘research’ to ‘development’. I’ll post updates if people solve any of these for me.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.