I’ve been trying to consider how to make open linked data websites more friendly to consumers of data. With the specific example of data provided by an organisation or by an event. Such data has value as part of the greater scheme, but a key value is going to be for people dealing with specific immediate questions.
It is my strong belief that providing open linked data in standard patterns will make it easier to consume. This will increase the number of consumers and this, in turn, will increase the value of producing open and linked data. Self interest is a much better motivator than altruism!
With this in mind I suggest defining certain RDF classes which indicate that a resource of that class follows a certain pattern.
Aspects of a Pattern
What a pattern consists of can and should vary wildly, but could include;
- What format data is available if you resolve the URI with different client “accept” headers.
- What information will be available in the RDF document you get when you resolve the URI, and using what namespaces.
- How to discover an endpoint to query the data via SPARQL (or OAIPMH2, or even some REST interface etc.)
- How to efficiently download all relevant data pertaining to this thing.
- What sub patterns apply to URIs referenced in the data available from the main URI.
- What structure related URIs will take.
Obviously, if rolling your own data from scratch you won’t always fit into a very specific pattern, but you may still be building large parts of it using a standard pattern. For example, foaf:PersonalProfileDocument already does this. It tells you that this document can be resolved as RDF and tells you foaf facts about a person.
An oddity is that owl:sameAs does not transfer a pattern to the same-as URI as it almost certainly will provide data in a different pattern. That’s kinda the point.
EPrints as an Example
An EPrints repository may serve many purposes. The most common is as a repository of the research output of an organisation, but it could just as easily hold teaching and learning materials or a repository of software.The open linked data about each of these can tell us that they are all datasets, which isn’t that all that useful. We could add some classes to describe the content of each, for example <http://files.eprints.org/id/repository> rdf:type myns:SoftwareRepository . This is all very well, but it doesn’t help a tool consume the data. To help systems understand how to navigate and consume data in an EPrints repository, the top level URI is defined as of rdf:type eprints:Repository which defines the entire way that an EPrints repository publishes open linked data. That way a tool build to work with repositories can see this RDF class and know what to expect. Maybe D-Space would define their own. That way some simple tweaks and you could build an application which could work with the majority of repositories and auto adjust behaviour to paper over the cracks between them.
See our freshly defined EPrints open linked data ontology — note we use bibo, dublincore and voiD to describe most of the data in the dataset, the EPrints classes and relations are only used to describe the native structure of the data.
Doesn’t this happen already?
Well, maybe unofficially. What I’m keen to do is get people attaching specific to such classes and keeping them separate from classes which represent what the resource is. It may be that some people would prefer to link these with a different predicate to rdf:type. Maybe implementsPattern?
One reason it hasn’t happened much yet is that there’s not that many packages which pump out linked data. There’s a few plugins for things like WordPress, but they are not yet mainstream. EPrints 3.2.1 automatically supplies linked data in a reasonable patternĀ with minimal work from the site admin. This means there will be a proliferation of sites offering open data in a very similar structure. A reasonable solution is just to identify that it’s in the pattern as produced by that tool. That’s what we’re doing at EPrints. A better long-term solution will be when people start defining generic patterns.
Where this starts to get interesting
So, I’ve been considering how we might deal with the complexity of open linked data for an entire university (I’m still growing that map, contributions welcome). At a basic level, what we’ve discussed is having a top level index of other datasets. What I would like to aim for is that people could write tools which can find the primary URI for an organisation and find what elements of open data are available, or if the standard element they want is auto-discoverable.
Let me walk you through a scenario. I’m going to a seminar in Building X1 of Example University. I’ve already installed a phone app which helps me navigate open linked data for organisations. The app. understands many standard patterns of data organisations provide.
It is also preloaded with the fact that <http://data.ac.uk/resources/universities.rdf> can be resolved to get a list of basic information about UK universities. In this case we want theĀ primary, and resolvable URI for Example university, but it also contains a list of homepages, .ac.uk domains, foaf:located_near to the nearest major city, the data.gov.uk URI for the university and a few more handy facts to hold at the .ac.uk level. It also describes them all as of rdf:type jisc:University and many as somens:OpenOrgPattern which indicates we can resolve the URI as RDF and it will tell us some basic facts about the org,, and more usefully it will tell us what sub datasets are available indicated, where meaningful, by a set of standard rdf classes indicating both the semantic meaning of each sub-dataset and what pattern or patterns it is made available as.
Selecting the “UK Universities List” I can easily navigate to example university. It’s guessed it from typing “Exa”. The URI is <http://data.example.ac.uk/id/org/exampleuni>
Now the phone nips off and grabs (or uses a cached copy of) the RDF document describing Example University. This document isn’t too large. It tells us some basic foaf such as the name, homepage and primary phone numbers etc. It also defines a whole bunch of hasDataset relations to a variety of datasets from key parts of the institution. Each of these has an rdfs:label, at least one rdf:type indicating its content and usually one indicating what pattern the dataset implements.
For example;
Ā <http://dspace.example.ac.uk/#dataset> Ā Ā rdf:type <http://dspace.com/ns/DSpaceRepository> ;Ā Ā Ā Ā Ā rdf:type myns:ResearchRepository Ā Ā rdfs:label "Example University Research Repository" .
Maybe there’s more data about voID and licenses which isn’t a required part of the OpenOrgPattern. If so my phone doesn’t currently understand it so we ignore it. My phone has discovered a dataset of type myns:OpenBuildingsPattern and is going to follow that. From there it can understand how to find a list of the names of all buildings, and easily find my the lat & long of building X1 and show it on a map. It’s spotted another standard dataset it undestands that’s part of the RDF returned by http://data.example.ac.uk/id/org/exampleuni and that’s myns:PublicTransport which lists locations of relevant transport nodes such as bus-stops train stations and taxi ranks etc. and adds the nearest ones to the map it’s showing me, along with nearby public carparks it found in the OpenBuildingsPattern.
To make all this awesomeness happen, all that’s needed is to start converging on some standard patterns and give software clues of how to consume it.
— Chistopher Gutteridge
‘Application profiles’ are one way to express these sorts of patterns.
Ah, I’d heard the phrase “Application Profile” but never twigged that that’s what it meant.
There appears to be a lot of work describing these patterns but I can see nothing about people consuming them.
I’m moving slightly away from the idea of declaring and RDF document to be a specific profile, and more towards using a predicate to incicate that a document (probably RDF) defines the subject using a specific profile (Application Profile).
These great big committees defining standards are all very well, but I plan to build the applications as I go. This makes it easier for people to santiy check their data.
For example, see http://programme.ecs.soton.ac.uk/1.0/
There’s a PHP tool to render it, eg see http://www.semhe.org/programme.html