Skip to content


Nobody needs a 303

Ian Davis (@iand) has just written this rather challenging blog post about the future of the 303 redirect. He’s onto something, but the idea needs work and I have an idea…

Background

A key point of the semantic web is that you can use URIs (which look exactly like URLs) to represent concepts which are not resolvable into a sequence of 1’s & 0’s. A URI represents a single thing, be it an HTML document about Rice Pudding, and RDF document about Rice Pudding or the concept “Rice Pudding” itself.

Concept: http://dbpedia.org/resource/Rice_Pudding
RDF: http://dbpedia.org/data/Rice_Pudding.xml
HTML: http://dbpedia.org/page/Rice_Pudding

If you use an HTTP request to ask for the concept, it can’t serve you rice pudding over a TCP/IP stream so rather than tell you “200 OK” (and give you pudding) or “404 Not Found” it tells you “303 See Other” and gives a redirection to the RDF document. If it’s being extra clever, it listens to what format your client prefers (your HTTP client expresses a preference when it makes a request) and redirects you to a URL with the most palleteable data format for you.

(Side note, in many ways HTTP response 418 might make more sense in this case if there was no document available).

To watch this in action try (on a Linux command line, or Terminal on OSX):

curl -v http://dbpedia.org/resource/Rice_Pudding

then

curl -v -H’Accept: application/rdf+xml’ http://dbpedia.org/resource/Rice_Pudding

The problem is that this is a pain to configure on a webserver, and makes things complicated in general. Also when you ask a person “what’s your URI?” they stare at you blankly. It’s non-trivial to get URIs out of linked data experts, if we want Linked Data to take off, it must be achieveable by people who don’t really care.

Enter Ian Davis

Ian has just written this blog post: http://iand.posterous.com/is-303-really-necessary. I really want to disagree, just to make the Fatboy Slim refererence, but I think he’s onto something. He is, if I’ve understood correctly, suggesting that when you resolve a URI you should expect to get a “200 OK” and a document about that subject. This does make things more simple, but means that the URI for a document is now different for the URL of that document.

It’s going in the right direction, and really helps solve the problem of how to ask a layman for a URI, but I’ve some ideas of how to make it work, they could either or both be used.

Making it clear what’s going on in the HTTP response

Add a new HTTP return code “208 Metadata” which indicates what you are getting is data about the resource you requested rather than the resource itself. This could also be achivied by putting specific triples in the returned document, but this feels much cleaner. However it still has the issue of requiring special server configuration.

The thing Ian is going for (correct me if I’m wrong) is to allow someone to just place an RDF document in a directory and have it served over the web and you’re done. That’s OKish, but to make apache serve it with the correct mimetype it will need a ‘.rdf’ suffix, which means making a URI for ricepudding like http://data.totl.net/puddings/ricepudding.rdf which feels wrong to me. You can make it work pretty easily with PHP;

curl -v http://graphite.ecs.soton.ac.uk/experiments/208.php

… but now I’ve got a .php suffix! I can’t find any handy apache .htaccess config that lets me set the HTTP response code for a directory, the best I can find quickly is

ForceType text/n3
Header set Semantic Metadata 

Which would at least mean your could create a directory of suffixless files, and indicate to an aware semantic header aware tool that this was not the requested resource, but rather metadata about it.

curl -v http://graphite.ecs.soton.ac.uk/experiments/setmime/ricePudding

Extending the URI syntax to indicate “Subject of URI”

I really quite like this one; which is to add something to the way you write the URIs. Put a symbol, let’s say “!”, at the start of the URI to indicate it represents the subject of the document at the given URL. This feels a little like the use of & in C code.

<!http://users.ecs.soton.ac.uk/cjg/> foaf:name “Christopher Gutteridge” .

<!http://users.ecs.soton.ac.uk/cjg/> foaf:homepage <http://users.ecs.soton.ac.uk/cjg/> .

I really think this could work! Most semantic systems just treat URIs as strings so this in no way breaks their ability to reason and process data. The only time it would matter is when they come to resolve the URI. Resolving the URI would not work for clients that didn’t understand the syntax, but that’s not a big deal, they’ll be easy to fix and just won’t get extra data — their loss.

I’ve done a couple of experiments to see how the ARC2 parser copes with this;

The result isn’t great. It is valid, but it’s treated the URI as relative to the current document. So we can’t really put anything at the front, but we could put something on the end… What’s a character which is in basic ASCII but explicitly not legal at the end of a URI? (we want one which can never conflict with a real URL, so things like “#topic” or “!” are out as they could be legal URLs)…

% as the suffix?

That’s really really not legal to put by itself in a valid URI as it must be followed by two hex digits. So let’s try using <http://users.ecs.soton.ac.uk/cjg/%> to represent the subject of that page (ie. me)

That works much better, but I can no longer put it into a namespace definition as it’s appended rather than prepended. If a naive client tries to resolve it, it will get back a 400 HTTP response, a smart client can understand to strip the trailing %. A nice webserver might add a plugin which spots requests ending in % and if there is a valid URL without the % send a 303 See Other, so that would enable most existing RDF libraries to keep working, unless they were super touchy about the URI being valid before requesting it.

One thing that doesn’t work is using the % suffix in predicates in RDF+XML as that requres you to write <test:foo%>Testing</test:bar%> which is not valid XML. You also can’t use it in the shortcut for class names, eg. <test:Bar% rdf:about=”…”> but that’s not a problem as you just describe it using <rdf:type …>.

In this way, a URI for the Web Science Trust is just <http://www.webscience.org/%>

Using ‘#’ as a suffix?

One other option would be to describe a URI ending in # as referring to the subject of the document without the ‘#’. This has some big advantages over using ‘%’ as it is a legal URI, still, and like any # URI, it will resolve to the source document.

I now that foo#bar indictes a sub part of foo, in XML/HTML it’s the element with id=’bar’. However, what does foo# mean? The element with id=”? or can we safely set a standard that this is the _subject_ of the URL without the #?

You still won’t be abe to use it RDF:XML predicates as <foo:bar#> isn’t legal.

Using this idea, the URI for the Web Science Trust is <http://www.webscience.org/#> which is rather elegant.

UPDATE: Dammit, Steve Harris has pointed out that URIs of the format <http://example.com/foo#> are used to identify namespaces. Dang.

Posted in RDF.


What is our URI?

Canonical URI is already a bit of a loaded term, but what I really mean is what URI should I use to refer to Southampton University when writing linked data about it. Or, for that matter, how about The WebScience Trust?

Here’s the rule I think we should follow:

  1. If the organisation who grants your charter assigns you a URI then use that.
  2. Failing that, mint one for yourself in your own domain.

I don’t think it makes much sense to use your dbpedia URI — they are too volatile.

In both cases you should mint your own URIs for any entities which are within your scope, such as sub-organisations.

The problem is that (1) won’t resolve to your open data about your organisation, but rather to your parent organisation’s data about your organisation. In this case I suggest the following pattern is added to your ‘boilerplate’ which you add to most or all RDF documents:

<http://data.example.ac.uk/docs/exampleacuk.rdf> 
  foaf:primaryTopic 
  <http://education.data.gov.uk/id/school/666666> .
<http://data.example.ac.uk/docs/exampleacuk.rdf> 
  rdf:type 
  oocore:OpenOrgDocument .

What’s oocore?

oocore is the (still in development) core namespace for a bunch of namespaces for tools to help “Open Organisations” provide useful information about themselves and make it discoverable. The focus is not on perfect models (beware the Modeller) but rather on making the data easy to use and reuse.

The idea of an OpenOrgDocument is that it would obey certain conventions, and would be a little like a foaf:personalProfileDocument for an organisation. It will have some strong guidelines on what is useful to include, and link to additional OpenOrg documents for common facets of organisational data, such as buildings and amenities, organisational structure, news, publications, membership, financial information etc.

What if our parent organisation creates a URI for us in the future?

Well, that’s an issue. You’ll have the choice of using that in future, or just adding a sameAs link. It’s a pain, but I suspect most places will just continue to use the URI they picked early. The key thing is not to mint a URI if there’s already one out there.

Discovering the OpenOrg Document

If you request “/” from the organisation’s main domain, eg. www.example.ac.uk, with an HTTP heading that prefers to accept ‘application/rdf+xml’ then you should be redirected to the open org document. In addition, the homepage should have a

<link rel="alternate" type="application/rdf+xml" href='..path to openorg document..' />

This will mean that people can discover standard data about your organisation without jumping through any complicated hoops, or having to try 10 different fiddly approaches.

What should go in an OpenOrg Document?

Well, we’ll work that out as we go, but I’d go with some of:

  • Basic name of the organisation
  • contact details; main email, main homepage, main phone number
  • based_near to the nearest population centre
  • based_near also to a geo:point for simple navigation purposes.
  • links to additional openorg documents which (and this is a neat bit) can be the current document. If it’s a small organisation, you might as well put all the data in one big document which is rdf:type several types of openorg document.
  • links to additional datasets, with enough data to let a system know if it’s helpful to resolve the URI or not.

That last bit is important. By saying that a URL is of type “OpenOrgBuildingsDocument” that tells a consumer that the resulting data will not only be in RDF but will follow a known pattern, which should help it provide a user interface to it, especially for mobile applications.

Posted in Uncategorized.


Local or Canonical URIs?

Here’s a little question; when should you ‘mint’ a URI for something and when should you just use a canonical one?

The idea of a canonical URI is that which is defined by the agent with authority for that resource. Very obvious in terms of organisations, people, journals and events. Less obvious for places. Really not obvious at all for income tax or rice puding.

The University of Southampton does not yet have a formal URI for itself, but lets make one up for this example; http://data.soton.ac.uk/id/uos

This question applies to any RDF dataset with resources with resolvable URIs, but I’ll use EPrints as the example as I know it best.

In an EPrints repository, a passing reference to an organisation (event, person, place..) has a URI generated by hashing the data about it. For example; in this RDF record about a thesis it has generated such a URI for the Univesity of Southampton, the organisation the thesis was submitted to. This URI is:

… which if resolved will tell you every fact the EPrints server knows about the University of Southampton.

So here’s my dilemma

Assume that I have a reliable way to know that http://eprints.ecs.soton.ac.uk/id/org/ext-6aadb868f6af7a581b985086216d8d28 and http://data.soton.ac.uk/id/uos are the same thing. (possibly a human process).

Option One
We add an owl:sameAs predicate to the EPrints RDF to point at the cannonical university URI. As we control the university URI too we can make it describe itself as sameAs the eprints URI.

This is currently what we do for people in ECS as we can reliably identify our own staff. Have a look again at that RDF record and you’ll see the sameAs link right at the bottom.

Pros: URI on the eprints server is resolvable to get all information it holds about that resource.
Cons: Creates a new URI for a resource with a known existing URI. Requires library staff to map onto other URIs inside (and maybe outside) their university.

Option Two
Rewrite the URIs in eprints so that if it replaces certain ext- style URIs with their known canonical version. So that in the RDF always refers to its preferred URI. The ext- style URI is still resolvable and any time a preferred URI is referenced in an RDF document a seeAlso link will be included to the ext- URI which will redirect to more information on the resource.

In this way an RDF record would look something like this:

<http://eprints.ecs.soton.ac.uk/id/eprints/21600>
        dct:creator <http://id.ecs.soton.ac.uk/person/24320> ;
	dct:issuer <http://data.soton.ac.uk/id/uos>;
	dct:title "Social Niche Construction: Evolutionary Explanations for
 Cooperative Group Formation"^^xsd:string .
<http://data.soton.ac.uk/id/uos>
	dct:hasPart epid:org/ext-fc4fc6f35c57c793ab9034d542fa9406;
	foaf:name "University of Southampton"^^xsd:string;
	rdf:type foaf:Organization ;
        rdfs:seeAlso epid:org/ext-6aadb868f6af7a581b985086216d8d28 .
<http://id.ecs.soton.ac.uk/person/24320>
	foaf:name "Simon T. Powers"^^xsd:string;
	foaf:type foaf:Person;
        rdfs:seeAlso epid:person/ext-24320 .

Pros: Reduces the proliferation of URIs, and makes the data more truely and usefully Linked.

Cons: Not all resources will have known preferred URIs so they will probably still use the ext- style URIs. This would be inconsistant but could be mitigated by still including the rdfs:seeAlso predicate on themselves so that every record had the same structure.

Option Three
Use an external service to fight your semantic battles for you. To every entity add a seeAlso to an external service. eg.

epid:org/ext-6aadb868f6af7a581b985086216d8d28
	foaf:name "University of Southampton"^^xsd:string;
	rdf:type foaf:Organization ;
        rdfs:seeAlso <http://bibresolver.org/?
uri=http://eprints.ecs.soton.ac.uk/id/org/ext/6aadb868f6af7a581b985086216d8d28> .

epid:person/ext-24320
	foaf:name "Simon T. Powers"^^xsd:string;
	foaf:type foaf:Person;
        rdfs:seeAlso <http://bibresolver.org/?
uri=http://eprints.ecs.soton.ac.uk/id/person/ext-24320> ;
	owl:sameAs <http://id.ecs.soton.ac.uk/person/1248> .

The theoretical magicbibservice would return a list of URIs it believed were the same as the one you link to it referencing.

Pros: Can be added to the code once and it’ll just get on with it. Can be used in conjunction with options one or two.

Cons: Doesn’t stop proliferation of URIs. Depends on 3rd party for quality and reliability. No such service currently exists.

Conclusion

What EPrints is doing out of the box is OK but the only way to make it ‘linked’ out of the box is option 3, which relies on a service I’ve made up, so not really an option, yet.

I think we should recommend option 2 to be used where possible. It would be pretty easy to configure for a resource type (people, places etc…). You would just override the URI generator and add a seeAlso triple.

Anyone got any alternate ideas or comments on how this would improve/harm the usefulness of the data?

Posted in Best Practice, RDF, Repositories.


More on defining patterns in RDF

Having looked a bit more at foaf:PersonalProfileDocument I am now not sure if the best way to indicate a pattern in RDF is by adding an attribute to a URI or to the document which described that URI.

The problem I’m trying to solve, by the way, is that we are about to see a proliferation of organisations and events publishing Open Linked Data. This will be far more useful if the organisation can publish a description of what standard datasets it makes available. For each dataset there are two key ways to describe it, both valuable. One is to say what information it contains and the other is the way in which that information is structured.

My initial thought was that this should be done via semantics attached to the URI for an organisation (or other entity). eg.

 <data.example.ac.uk/ExampleUni> <rdf:type> <foo:OpenOrg> .

After having slept on it I think the above is probably wrong. It feels wrong. For the eprints:Repository example wasn’t too bad, but for  other things it feels a bit off attaching publishing pattern information to the entity. Looking at foaf:PersonalProfileDocument I noticed it’s not a propety of the person, but rather the document. This makes a lot more sense. Having refined the previous idea, I am now imagining something like this next example as what you would reasonably expect from resolving the URI of a large organisation providing Open Linked Data.

<http://data.example.ac.uk/ExampleUni.rdf> rdf:type foo:OpenOrgDocument ;
  foaf:primaryTopic <http://data.example.ac.uk/ExampleUni> .
<http://data.example.ac.uk/ExmpleUni> rdf:type foaf:Organisation ;
  openorg:hasDataset <http://finance.data.example.ac.uk/dataset ;
  openorg:hasDataset <http://eprints.example.ac.uk/id/repository ;
  openorg:hasDataset <http://data.example.ac.uk/location .
<http://finance.data.example.ac.uk/dataset>
  rdf:type openorg:FinanceDataset ;
  rdfs:label "Example University Open Finances Data" .
<http://finance.data.example.ac.uk/dataset.rdf>
  rdf:type <http://unit4.co.uk/id/AgressoOpenFinanceDocument> ;
  foaf:primaryTopic <http://finance.data.example.ac.uk/dataset> .
<http://eprints.example.ac.uk/id/repository>
  rdf:type <http://eprints.org/ontology/Repository> ;
  rdfs:label "Example University EPrints Repository" .
<http://eprints.example.ac.uk/cgi/export/repository/RDFXML/repository.rdf>
  rdf:type <http://eprints.org/ontology/RepositoryDescriptionDocument> ;
  foaf:primaryTopic <http://eprints.example.ac.uk/id/repository> .
<http://data.example.ac.uk/location>
  rdf:type openorg:LocationDataset ;  
  rdfs:label "Example University Buildings Dataset" .
<http://data.example.ac.uk/location.rdf>
  rdf:type openorg:LocationsDescriptionDocument ;
  foaf:primaryTopic <http://data.example.ac.uk/location> .
...etc...

So here’s an example of why this is useful. If I’m going to visit Example Uni. I want to know where to drive to, and when I get there I want to know which public carpark is best for building X23. Given a tool that understands the “openorg:LocationsDescriptionDocument” and “foo:OpenOrgDocument” patterns, it can confidently find the building with X23 in the name and tell me the lat & long, plus look at the lat & long of all carparks, find the nearest and see if its got a postcode listed to give to my satnav. If it doesn’t it could get a postcode from some other webservice.

The most simple format for a document is one which includes all the information from the dataset in a single RDF document. Where it gets trickier is when the first document does not contain all the information available, but a standard pattern should at least make that easier. For example, if an org has too many buildings and locations to be listed in a single document then you could search for items which match a parameter.

  http://data.example.ac.uk/location?q=X23

…which would return RDF or HTML depending on the accept header. The key thing is to define some standard useful ways of publishing this stuff so people have some basic starting places.

Posted in RDF.


Using rdf:type to indicate a publishing pattern

I’ve been trying to consider how to make open linked data websites more friendly to consumers of data. With the specific example of data provided by an organisation or by an event. Such data has value as part of the greater scheme, but a key value is going to be for people dealing with specific immediate questions.

It is my strong belief that providing open linked data in standard patterns will make it easier to consume. This will increase the number of consumers and this, in turn, will increase the value of producing open and linked data. Self interest is a much better motivator than altruism!

With this in mind I suggest defining certain RDF classes which indicate that a resource of that class follows a certain pattern.

Aspects of a Pattern

What a pattern consists of can and should vary wildly, but could include;

  • What format data is available if you resolve the URI with different client “accept” headers.
  • What information will be available in the RDF document you get when you resolve the URI, and using what namespaces.
  • How to discover an endpoint to query the data via SPARQL (or OAIPMH2, or even some REST interface etc.)
  • How to efficiently download all relevant data pertaining to this thing.
  • What sub patterns apply to URIs referenced in the data available from the main URI.
  • What structure related URIs will take.

Obviously, if rolling your own data from scratch you won’t always fit into a very specific pattern, but you may still be building large parts of it using a standard pattern. For example, foaf:PersonalProfileDocument already does this. It tells you that this document can be resolved as RDF and tells you foaf facts about a person.

An oddity is that owl:sameAs does not transfer a pattern to the same-as URI as it almost certainly will provide data in a different pattern. That’s kinda the point.

EPrints as an Example

An EPrints repository may serve many purposes. The most common is as a repository of the research output of an organisation, but it could just as easily hold teaching and learning materials or a repository of software.The open linked data about each of these can tell us that they are all datasets, which isn’t that all that useful. We could add some classes to describe the content of each, for example <http://files.eprints.org/id/repository> rdf:type myns:SoftwareRepository . This is all very well, but it doesn’t help a tool consume the data. To help systems understand how to navigate and consume data in an EPrints repository, the top level URI is defined as of rdf:type eprints:Repository which defines the entire way that an EPrints repository publishes open linked data. That way a tool build to work with repositories can see this RDF class and know what to expect. Maybe D-Space would define their own. That way some simple tweaks and you could build an application which could work with the majority of repositories and auto adjust behaviour to paper over the cracks between them.

See our freshly defined EPrints open linked data ontology — note we use bibo, dublincore and voiD to describe most of the data in the dataset, the EPrints classes and relations are only used to describe the native structure of the data.

Doesn’t this happen already?

Well, maybe unofficially. What I’m keen to do is get people attaching specific to such classes and keeping them separate from classes which represent what the resource is. It may be that some people would prefer to link these with a different predicate to rdf:type. Maybe implementsPattern?

One reason it hasn’t happened much yet is that there’s not that many packages which pump out linked data. There’s a few plugins for things like WordPress, but they are not yet mainstream. EPrints 3.2.1 automatically supplies linked data in a reasonable pattern  with minimal work from the site admin. This means there will be a proliferation of sites offering open data in a very similar structure. A reasonable solution is just to identify that it’s in the pattern as produced by that tool. That’s what we’re doing at EPrints. A better long-term solution will be when people start defining generic patterns.

Where this starts to get interesting

So, I’ve been considering how we might deal with the complexity of open linked data for an entire university (I’m still growing that map, contributions welcome). At a basic level, what we’ve discussed is having a top level index of other datasets. What I would like to aim for is that people could write tools which can find the primary URI for an organisation and find what elements of open data are available, or if the standard element they want is auto-discoverable.

Let me walk you through a scenario. I’m going to a seminar in Building X1 of Example University. I’ve already installed a phone app which helps me navigate open linked data for organisations. The app. understands many standard patterns of data organisations provide.

It is also preloaded with the fact that <http://data.ac.uk/resources/universities.rdf> can be resolved to get a list of basic information about UK universities. In this case we want the  primary, and resolvable URI for Example university, but it also contains a list of homepages, .ac.uk domains, foaf:located_near to the nearest major city, the data.gov.uk URI for the university and a few more handy facts to hold at the .ac.uk level. It also describes them all as of rdf:type jisc:University and many as somens:OpenOrgPattern which indicates we can resolve the URI as RDF and it will tell us some basic facts about the org,, and more usefully it will tell us what sub datasets are available indicated, where meaningful, by a set of standard rdf classes indicating both the semantic meaning of each sub-dataset and what pattern or patterns it is made available as.

Selecting the “UK Universities List” I can easily navigate to example university. It’s guessed it from typing “Exa”. The URI is <http://data.example.ac.uk/id/org/exampleuni>

Now the phone nips off and grabs (or uses a cached copy of) the RDF document describing Example University. This document isn’t too large. It tells us some basic foaf such as the name, homepage and primary phone numbers etc. It also defines a whole bunch of hasDataset relations to a variety of datasets from key parts of the institution. Each of these has an rdfs:label, at least one rdf:type indicating its content and usually one indicating what pattern the dataset implements.

For example;

 <http://dspace.example.ac.uk/#dataset>
   rdf:type <http://dspace.com/ns/DSpaceRepository> ;    
   rdf:type myns:ResearchRepository 
   rdfs:label "Example University Research Repository" .

Maybe there’s more data about voID and licenses which isn’t a required part of the OpenOrgPattern. If so my phone doesn’t currently understand it so we ignore it. My phone has discovered a dataset of type myns:OpenBuildingsPattern and is going to follow that. From there it can understand how to find a list of the names of all buildings, and easily find my the lat & long of building X1 and show it on a map. It’s spotted another standard dataset it undestands that’s part of the RDF returned by http://data.example.ac.uk/id/org/exampleuni and that’s myns:PublicTransport which lists locations of relevant transport nodes such as bus-stops train stations and taxi ranks etc. and adds the nearest ones to the map it’s showing me, along with nearby public carparks it found in the OpenBuildingsPattern.

To make all this awesomeness happen, all that’s needed is to start converging on some standard patterns and give software clues of how to consume it.

— Chistopher Gutteridge

Posted in Uncategorized.


A University Linked Data Diagram

Many of us will have seen the Linking Open Data cloud diagram, which shows which datasets are linked to each other.

This is great for showing the rough state of the web of linked data, and seeing how it changes over time.

As a developer though, it’s of less use. It serves as a guide to which identifiers for a subject might be good to use (e.g. lot of datasets link to DBpedia, it might be wise to refer to that rather than creating our own URIs for things), but doesn’t really say how that data is being used.

I’ve been thinking about how to best show this for the growing area of University linked data. As someone working on University linked data, I want to know what other Universities are using and how. I want to know what best practice should be for describing the same concepts.

At this stage, it seems unlikely that many institutions will have data that links to other institutions – most institutions will initially just have (and be making available as linked data I hope!) data about themselves.

It therefore seems more important to me that institutions talk about things in the same way, rather than necessarily talking about the same things.

This basically comes down to using the same ontologies to describe things – if a dozen Universities all describe their people using FOAF for example, that’s a great start. People can start building common tools/applications/aggregators based on this, without having to reinvent the wheel to deal with University X’s custom data structures.

I’m not great at visualisation, but I’ve started putting together a simple diagram to indicate some of the things that some Universities have made available as linked data, and more importantly, the ontologies they’re using:

(view full size)
University Linked Data

As you can see, it’s pretty small/limited at the moment, and the only University data sets I know about are our own, Sheffield’s, and the Open University’s.

Additions welcome! I’ll aim to keep the diagram updated as much as is feasible – any suggestions on a better visualisation for this would also be welcome. I’m hoping once we’ve got a few more institutions on board, it’ll be a useful starting point for University linked data developers to see which vocabularies are most commonly in use.

Dave Challis
dsc@ecs.soton.ac.uk

Posted in Uncategorized.


The Modeller

Ben O’Steen, gentleman library hacker, just sent us this rather awesome portrait of The Modeller.

I’m tempted to upload it to cafepress…

Posted in Uncategorized.


Using RDF data to add value to a page

I am a regular reader of the Blog written by Tony Hirst at the Open University. He’s always looking for existing tools which can solve new problems. I think my thing is looking for little tools which don’t exist but should.

I’ve had an idea for a while that it would be cool to use RDF to add data into a webpage, but there’s no easy way to do it. So what I’ve done is a combination of a web service and a javascript library.

Here’s how to use it to take a URI of a location with a geo:lat and geo:long and embed a map in your webpage:

 <script src='injectgeo.js' type='text/javascript' ></script>

 <div id='mymap'></div>
 <script type='text/javascript'>
 injectGeo('http://dbpedia.org/resource/Brading','mymap');
 </script>

It would be only a little extra work to start using linked data goodness in the mix.

Images in Linked Data

One interesting problem with the above page is the picture of TimBL is bloody huge. We don’t include size information in the FOAF data so it’s not possible to pick which size to show. Where last week I said we should aim to always include an rdf:type and one of the normal label attributes (dc:title, foaf:name, rdf:label etc.). I think I’m going to suggest that images should have a width and height. Anyone know some good predicates for that?

Posted in Uncategorized.


Changes

This is a week of beginnings and endings. The new freshers are arriving. But we are bidding farewell to two members of the ECS Web Team. Sarah Prendergast and Joe Price. Both have applied for and been accepted for the university voluntary severance scheme.

We’re very sorry to see them go, they are valued members of the team.

Joe is a skilled all round web designer and developer who is going to start his own small business. To anyone considering engaging him, let me say that he has been a pleasure to work with. His designs are always of a high standard and he has shown a level headed pragmatism – balancing innovation with good basic design and use of standards. He finishes what he starts and people are always happy with the result. (I figured it would save me responding to queries for references if he can just link to this blog post *grin*)

Sarah has also been a pleasure to work with, and her work can be seen throughout our publications, posters and websites. As a new mother and an enthusiastic equestrian I’m sure she’s going to find plenty to keep her busy for a while!

Planning for the Future

With the university being restructured there’s bound to be some change, but with Web Science and Linked Data on the ascent, there’s still plenty of interesting stuff to look forward to. What I have been doing for the last month is trying to plan how to manage with such a reduction in resources. That horrible phrase “more with less” is being spoken.

I understand why web design has taken the brunt of our staff reductions. While I am sad and frustrated, I can see that reducing the web team means fewer and less rich websites, but reducing the network, servers or email means things take longer to get fixed when they break. This has a very obvious cost compared to the very subtle cost of confusing and out of date websites.

My job now is to work out where to cut the services we provide. Joe has had some simple but excellent suggestions.

Triage

The key thing is to work out what we currently do in terms of costs and benefits. The only real cost to my team is time, servers and bandwidth are negligible compared to staff time. The place I think we can reduce our effort is in providing bespoke websites for projects and conferences. Historically we’ve done some great work,  but we just don’t have the slack to build awesome sites from scratch for things. Joe’s suggestion is to build up some standard site templates and use these to build project & event websites. We’ve already started this in a small scale by creating a standard template for projects, and also building sites with the intent of reusing the templates and back-end design. I think this is a good approach as it reduces variation not quality. I’d rather have both, but needs must as the devil drives.

Other tasks we may hire students for. There’s some great talent amongst our students and they work relatively cheaply, but gain experience and it looks good on the CV. They are going to be inexperienced but we exist to help make students better, so that’s not a big concern.

Other very time consuming things which Joe and Sarah did for us, such as producing videos, we’ll just have to do less of.

I plan to take a firm line with academics expecting the same bespoke service they’ve got used to. We already work both smart and hard. I’m not going to spend my weekends doing work unless it’s the bits I enjoy. What does make me sad is seeing an academic on the pay grade above me go and buy a book on PHP. His time costs the university more than mine.

Austerely Awesome

At the past two Institutional Web Managers Workshops we have all talked about how big cuts to university web teams were on the way. For my team they are now here. If it was just us I would be more unhappy than I am. There’s some comfort in knowing we did nothing wrong. We’ll not be able to do us much now, but we’ll still run the ECS web systems to the best of our ability. The best of our ability is pretty darn awesome. Our level of awesomeness will remain high, just the quantity will have to reduce a little.

Posted in Uncategorized.


Don’t just assume people will resolve URIs

Some RDF documents are very large and not intended for immediate consumption, but rather loading into an endpoint or complex analysis.

However, others are intended for more-or-less stand alone use. Examples would include a FOAF profile or the RDF event programme that we’ve been working on. Having worked with a few of these as a developer, I suggest that it is good practice to include an rdf:type and a suitable label (dc:title, foaf:name, rdfs:label, skos:prefLabel) for them. This makes working with them much much easier.

For example if you include:

example:myphoto foaf:depicts <http://dbpedia.org/resource/Andrew_Eldritch> .

That’s all fine and good, but I can’t actually render that into, say, a caption without at least knowing…

<http://dbpedia.org/resource/Andrew_Eldritch> foaf:name "Andrew Eldritch" .

Resolving secondary URIs from a document is expensive, if done on-the-fly. Any reduction is very helpful, so if I’m trying to, say, index-by-Person the photos described in your photo collection RDF then knowing…

<http://dbpedia.org/resource/Andrew_Eldritch> rdf:type foaf:Person .

…would also help a hacker out.

The new version of the Graphite RDF Browser and SPARQL Browser both show this information if it’s available, and it makes things much easier to work with, for example, the Data.gov.uk schools data is pretty good for give a label to everything and a type to most things.

The SPARQL Browser attempts to get labels, using the following ugly but surprisingly fast query:

SELECT DISTINCT ?s ?p ?o WHERE {
{ <http://id.ecs.soton.ac.uk/person/1248> ?x ?s . ?s ?p ?o . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?o .  }  UNION
{ <http://id.ecs.soton.ac.uk/person/1248> ?x ?s . ?s ?p ?o . ?s <http://purl.org/dc/terms/title> ?o .  }  UNION
{ <http://id.ecs.soton.ac.uk/person/1248> ?x ?s . ?s ?p ?o . ?s <http://purl.org/dc/elements/1.1/title> ?o .  }  UNION
{ <http://id.ecs.soton.ac.uk/person/1248> ?x ?s . ?s ?p ?o . ?s <http://xmlns.com/foaf/0.1/name> ?o .  }  UNION
{ ?s ?x <http://id.ecs.soton.ac.uk/person/1248> . ?s ?p ?o . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?o .  }  UNION
{ ?s ?x <http://id.ecs.soton.ac.uk/person/1248> . ?s ?p ?o . ?s <http://purl.org/dc/terms/title> ?o .  }  UNION
{ ?s ?x <http://id.ecs.soton.ac.uk/person/1248> . ?s ?p ?o . ?s <http://purl.org/dc/elements/1.1/title> ?o .  }  UNION
{ ?s ?x <http://id.ecs.soton.ac.uk/person/1248> . ?s ?p ?o . ?s <http://xmlns.com/foaf/0.1/name> ?o .  }  UNION
}

And then a similar one for rdf:type. These are both, of couse, subject to the max-results of the SPARQL endpoint. As in the past I was an SQL programmer, this limit bites me again and again.

Posted in Best Practice, Graphite, RDF, SPARQL.