Skip to content

Using SPARQL to help with the next reorganisation

If there’s one thing that really causes extra work for webmasters and database admins it’s a university reorganisation…. and the current one at Southamton is a doozy! We are basically restructuring everything; academic and professional services. Most of this isn’t actually a problem for the ECS web team, but now ECS is part of a faculty (along with what was the School of Physics and the Optical Research Centre).

One of the reorganisations is to reduce the number of research groups in the faculty to a managable number. Thankfully, the decisions about all of this happen well above our paygrade, but it does mean that all our research groups will be changing. Merging, splitting, renaming.

Our current list looks like this:

Some of these have very custom websites but, most have a very standard basic architecture.

The new world order will see these groups about halved so that’s a pretty major change! Since we’ve had groups websites, until now, we’ve only had one group change at a time.

Architecture of a Research Group Website

The current pages are built from our SQL database which already contains the required info. PHP libraries abstract much of the detail. Research group webmasters may edit the PHP directly or just edit the content in pages via a web interface (skill levels and available time vary wildly). The current set up adds a slightly annoying requirement that the webserver serving the group websites must be able to connect to our core database server (not ideal, security wise). Also, we use the PHP libraries to manage what data is shown. A rogue member of staff (or, more realistically, a postgrad who quickly fixes something on the site without understanding the bigger picture) could easily expose information on staff and students who’ve not given permission to appear on the website. We’ve never had any really serious incidents, but the set-up could be better.

Publications info is not via SQL but grabbed via HTTP requests to

So anyway, the layout of a research group site is generally something like this. I’m using IAM as an example, but many of the groups are very similar.

  • Homepage – some plain text about the group with dynamic content showing news, recent publications, etc.
    • About Group X – plain HTML
    • News – News pages with content pulled from main ECS news database, but only items tagged as about Group X
    • Research Themes – A list of the research themes selected by this group.
      • Theme Page – A dynamically generated page for each research theme. The content comes from the local CMS, but the list of projects come from the SQL.
    • Current Projects – A list of all current projects in the group, with info such as funders, dates etc.
      • Project Page – A dynamically generated page for each project in the projects DB
    • Seminars – This section is also entirely built from the SQL database.
    • Publications – These pages are built almost entirely by data grabbed via HTTP from the EPrints server.
    • People – Various lists of members of the group, filtered different ways but don’t show to the public anyone who didn’t give permission to appear in the online phonebook.
    • Join – Plain HTML page
    • Contact – Also a plain HTML page
    • Intranet – beyond scope for today.

We like this design as almost all the content comes from databases which are used to build other sites and also perform other functions for ECS. (I nearly wrote “the School”, but ECS isn’t a school anymore, we’re a conceptual slice of the new Faculty.

Mapping our existing records to new groups

This is really the painful bit. All our databases need to be updated.

People: Our People database can cope with the idea of people having a primary group and be added to other groups too so we could create the new groups and transition slowly. The mapping of people is going to be probably a by-hand exercise.

Groups: The list of groups in the database can just grow, and eventually some groups get flagged as ‘deprecated’ but that won’t break much.

Seminars: This is an interesting one as we’ve got seminar series associated with groups. I’m tempted to suggest that we rename each of these to the most appropriate group, merging two series in some cases, but don’t lose too much sleep over it.

Projects: Ended projects can remain attached to their old groups. To sort out the current projects, I guess we’ll need to produce a page which lists all projects still associated with a deprecated group and get staff to sort it slowly. Any member of ECS staff can edit any project record, wiki style, so that’s not impractical.

Themes: These probably need rethinking by every group, or maybe even retiring temporarily or permenently. Themes are assoicated with research groups and projects.

News: We’ll just have to go and retag the last 6 months or so of news to add the new groups, just so they have some news to start with. With only 5 groups to choose from it shouldn’t be that painful and we can always link them to multiple groups if in doubt.

Publications: This one is an utter nightmare! research papers are directly linked to research groups in addition to being linked to the authors (who are members of groups). This way if someone changes group their paper does not follow them. Right now I’m tempted to either (a) not have publications on the new research group sites or (b) just list them on the assumption that the papers of a group are the publications of its members. (b) causes problems as our normal staff database only lists current members and just because Professor Awesome retired, it doesn’t mean that the group won’t want his papers to linger long after he’s going to the great conference dinner in the sky. I think the best thing to do here is to add the newly mapped research groups of the authors to each paper, and tweak anything later if people don’t agree.

Getting ready for the New Group Websites

It seems that there’s some planning into getting people who can write copy for the new websites, but I suspect the actual database mapping will just be done quietly between the cracks as usual. Ah well. The real thing we need to do is get a nice template ready for the sites so that they can hit the groups running, but customise them too as they’ll each have their own quirks that they will want to show. One of the new groups is going to have  a Web focus so they now doubt will want the moon on a stick in HTML5.

So, here’s the plan; we create a basic PHP website, using a version of the template we’ve been working on which is entirely on-brand in the new University style, but still quite nice to work with and standards compliant etc. But here’s the funky bit…

We use the ECS SPARQL endpoint, which we recently launched, to build the site. (almost) all the information we need to build a research group website is now available from our SPARQL endpoint, which is a way to query all our public data.

Pro: Demonstrating new technology. ECS is very involved with open, linked data and it’s a good move to use it to solve our own problems.

Pro: Does not require special firewall rules. It just runs over HTTP so the website does not have to be on a webserver with special firewall rules, or even inside ECS. Ideally, it does need to have a pretty quick network connection to the SPARQL server, but no more than with an SQL server.

Con: Extra point of failure. The old way was; website generated from database. This was is the SPARQL is built from the database, website built from the SPARQL. so that’s adding in one extra point of failure for these sites. However, the risks are reasonable. An outage on a research group website is annoying but less serious than on a site providing a service, or for an upcoming or current event such as a conference.

Con: Untried technology. Stuff that, we’re a university, it’s our job to try new things if it all goes horribly wrong we just fall back to the old system, so that’s not a good excuse not to try.

Con: Can’t show extra information to internal people: The SPARQL only knows about public things. The current pages show all staff if you view the pages from within the university IP range (even those who don’t want to appear to the public). This is actually kinda a bad idea; as discussed in an earlier article, if you trust an IP range with random research webservers in then it’s pretty likely someone will build some kind of proxy, by accident, which will allow google to see all pages that proxy can see, so this practically leads into the next pro…

Pro: Bad Group Webmaster code can’t expose confidential data: Because the datasource just doesn’t have any confidential info in, there’s no risk of anything leaking, no matter what whacky queries they write.

So there’s plenty of pros and cons, but as a leading webby research type place we should really be trying this stuff. Our next step is to start to build a generic group website on top of the SPARQL and see how many issues we run into.

Posted in Best Practice, PHP, RDF, SPARQL, web management.

2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Neil Crookes says

    Have you seen how the BBC music website is built? SPARQL and XSLT, it’s pretty sweet.

  2. Christopher Gutteridge says

    That’s all very well, but most people can’t maintain XSLT. HTML+CSS is common, PHP common enough, but XSLT is a tad arcane for a research group webmaster.

Some HTML is OK

or, reply to this post via trackback.