Skip to content


Twilight of the JISC

This year many JISC funded services are “sunsetting”, presumably due to the cuts.

(nb. not everything JISC does is ending, but enough to be pretty brutal)

I have benefitted hugely in my career and projects from the support of many JISC services, events and staff. Dev8D changed my professional life in a really good way.

I offer my sincerest thanks to all JISC funded staff moving on to new jobs this year.

– Christopher Gutteridge

How have JISC staff, services or events helped you?

UPDATE: OSS Watch “changing funding model” http://osswatch.jiscinvolve.org/wp/2013/02/15/a-new-future-for-oss-watch/

Posted in Uncategorized.


Gateway to Research API Hack Days

Ash and I are at the Gateway to Research API hack days. Gateway to Research contains data since 2006 about UK research project funding and related organisations, people and publications.

They use the CERIF data model, which is a bit of a monster. The CERIF people are very nice, but have limited resources to produce the kind of documentation I’ve become accustomed to. I enjoy cursing the darkness, but eventually I feel guilty and decide to light a candle. The CERIF people kept looking sad when I berated them about documentation, and all they really had were the XML from their modelling tool (TOAD) and the XSD documnent which it spits out. With some Perl & DOM hacking and lots of advice from them, I’ve managed to produce a CERIF description document which I feel is more useful to code hackers who get twitchy when the only documentation is in PDF and the only introductions are in Powerpoint slides. They got me a couple of pints as thanks, which was nice.

GtR API

I’ve also been kicking around the API. The things I noticed were some minor inconsistancies with XML naming which I’ve pointed out to them. But they are niggles. There’s more pressing things so here’s my wishlist:

  • URI scheme: All (most) stuff in GtR is identified by a UUID but it would be very helpful for creating linksets.
  • Data dump location with ALL the data in one big file (maybe put this on bit-torrent)
  • In the individual pages put <link rel=’alternate’ > headers and icon on the HTML pages to link to the XML and JSON versions of the information.
  • RDF Output (well, I would say that, wouldn’t I)
  • Release the code early and often. The current plan is to release code at the end of the project which means no community input to the code will be possible.

Posted in Gateway to Research.

Tagged with .


Agile Documents for agile development

Like a lot of large IT providers the work we do here in iSolutions is often steeped in documentation. This comes in various levels of usefulness from “god send” down to “written but never read” (aka complete waste of staff time). In TIDT our processes tend to be quite documentation light. If a document doesnt serve a purpose to us we do not write it. Less time shuffling paper means more time writing code. However just because we do not have a lot of paper work does not mean we do not have a plan. We work closely with users and develop in a agile way. Because our changes are small and frequent we use need far less documentation per change.

People who do not understand the way we work don’t understand our documentation. A excellent example is a document  (linked bellow) emailed to me by Lucy Green from comms regarding some changes to SUSSED. This documentation is a beautiful example of  agile documentation. It is information heavy, easy to understand and because the change is relatively small it is nice and short. Writing it down serves an important purpose because it gives us an artefact to talk around in our meeting. Because it’s highly visual there are fair less misunderstandings of intent. Documents like this make me happy. It tells me what I need to know. After the change it will serve no purpose, the reasons for making the change will be listed in the iSolutions formal change management documentation a much drier and less well read affair.

Agile documentation

Posted in Uncategorized.


Adding a custom Line Break Plugin to the TinyMCE WYSIWYG editor inside Drupal 7

This is a long title for a blog post, but it is a complicate and tricky task and I couldn’t find a complete solution, so this is a summary of how I did it. It also provides a good basis for adding other features to TinyMCE inside Drupal. First of all, the versions of the software I was working with were TinyMCE 3.5.4.1 and Drupal 7.14 (yes, we need to upgrade that!) I spent a lot of time hacking inside the Drupal WYSIWYG plugin and inside tinyMCE itself before I discovered the clean plugin-base solution. My starting point was this simple TinyMCE newline Plugin from SYNASYS MEDIA. This didn’t work for me out of the box. I came as only compressed javascript, so I had to figure out how to decompress it first. Once I’d done that, after lots of debugging I worked out that the reason I couldn’t get it to show up inside Drupal is that you have to make a new (minimal)  Drupal plugin to register it properly with the WYSIWYG plugin (see below). After that I worked out that they had used ‘<br />’ which didn’t work in all circumstances so I changed it to “<br />\n” which nearly did what I wanted but the cursor got screwed up if you did newline at the end of the text, so I tried adding ed.execCommand(‘mceRepaint’,true); but that didn’t help. I kept looking at the list of mce commandsand spotted “mceInsertRawHTML” but that was worse. In the end I decided to ignore the glitch as it’s purely cosmetic.

My final version is below. I’ve kept the name “smlinebreak” but I’ve bolded it so if you wanted your own name for a plugin you can see where you’d have to tweak it.

(function(){
        tinymce.PluginManager.requireLangPack('smlinebreak');
        tinymce.create(
                'tinymce.plugins.SMLineBreakPlugin',
                {
                        init:function(ed,url){
                                ed.addCommand('SMLineBreak',function(){
                                        ed.execCommand('mceInsertContent',true,"<br />\n")
                                });
                                ed.addButton('smlinebreak',{
                                        title:'smlinebreak.desc',
                                        cmd:'SMLineBreak',
                                        image:url+'/img/icon.gif'
                                })
                        },
                        getInfo:function(){
                                return{
                                        longname:'Adapted version of SYNASYS MEDIA LineBreak',
                                        author:'Christopher Gutteridge',
                                        authorurl:'http://users.ecs.soton.ac.uk/cjg/',
                                        infourl:'http://www.ecs.soton.ac.uk/',version:"1.0.0"}
                        }
                });
        tinymce.PluginManager.add('smlinebreak',tinymce.plugins.SMLineBreakPlugin)}
)();

which replaces the editor_plugin.js in the SMLineBreak I downloaded from http://synasys.de/index.php?id=5. The other files are trivial, just the image for the icon in img/icon.gif and a language file in langs/en.js which looks like

tinyMCE.addI18n('en.smlinebreak',{desc : 'line break'});

This plugin I placed in …/sites/all/libraries/tinymce/jscripts/tiny_mce/plugins/smlinebreak Then I had to register it, not directly with TinyMCE, but rather with the Drupal WYSIWYG plugin, using a custom Drupal module…

Drupal WYSIWYG Plugin

I gave my plugin the catchy title of “wysiwyg_linebreak”. This needs to be inserted into the filenames and function names so I’ll put it inbold for clarity, so you can see the bit that’s the module name. This module gets placed in sites/all/modules/wysiwyg_linebreak/ and has just two files. wysiwyg_linebreak.info is just the bit to tell Drupal some basics about the module. As it’s an in-house hack I’ve not put much effort into it.

name = TinyMCE Linebreaks
description = Add Linebreaks to TinyMCE
core = 7.x
package = UOS

The last line means it gets lumped-in with all my other custom (University of Southampton) modules so they appear together in the Drupal Modules page. The module file itself is wysiwyg_linebreak.module and this is a PHP file which just tweaks a setting to add the option to the Drupal WYSIWYG module.

<?php
/* Implementation of hook_wysiwyg_plugin(). */
function wysiwyg_linebreak_wysiwyg_plugin($editor) {
  switch ($editor) {
    case 'tinymce':
      return array(
        'smlinebreak' => array(
            'load' => TRUE,
            'internal' => TRUE,
            'buttons' => array(
              'smlinebreak' => t('SM Line Break'),
            ),
        ),
      );
  }
}
?>

… and that seemed to be enough. To enable it you first need to go into the Drupal Modules page and enable the module, then go to Administration » Configuration » Content authoring » WYSIWYG Profiles and enable the new button in the buttons/plugin section. Then if you’re very lucky it might work.

Summary

It’s possible, even easy, to add new features to the editor inside Drupal. I’ve written this out long form as I couldn’t find a worked example myself of how to add such a feature, and it took me enough time I hope this may give a few short cuts to people needing this or similar features.

Posted in Drupal, Javascript.


Combining and republishing datasets with different licenses

We’ll soon be launching data.ac.uk! Right now it’s all a bit of a work in progress. The plan is for us to start with a few useful subdomains then have other subdomains run by other organisations. Southampton neither can nor should be the sole proprietors.

The goal of the domain is to provide a permenant home for URIs, datasets and services. The problem with the .ac.uk level scheme is that sites are named either after an organisation, or after a project. But a good service should outlive the project which creates it, and if you’re trying to create a linked data resource for the ages then using http://www.myuni.ac.uk/~chris/project/2008/schema/ as your namespace is a ticking timebomb of breakiness.

There’s serveral different projects to create sub-sites right now. These are all focused on “infrastructure” rather than “research” data, but that should not be seen as a firm precident. That said, UK level services for research data are artificial — it shouldn’t matter where good data comes from, but from a practical point of view the UK is a funder of research so there may be times when national aggregation and services are created.

For projects like Gateway to Research to create good linked data they’ll need good URIs. Obviously some of their datastructures are going  to be complex and specialised, but we want solid URIs for institutions, funding bodies, projects, researchers, publications, patents etc.

hub.data.ac.uk

OK, this is the bit this post was supposed to actually be about.

One of the sub-domains which already exists is http://hub.data.ac.uk/ which is intended as a hub for UK academia open-data services. It has a hand-maintained list fo the current open data services and their contacts. We also set it up that it would periodically resolve the self-assigned URI for each university, and combine the triples it found their into a big document which you could query in one go.

The first problem we encountered for this was that Oxford and Southampton have chosen to make their “self assigned” URIs resolve to short RDF documents describing the organisation [Oxford] [Southampton]. However the Open University made a different assumption of what should happen when you resolve their URI. Their services generates a document describing every triple referencing their university. This isn’t wrong it’s just large and answers a differnt question.

To address this we’ve hit on the idea of asking each open data service to produce a “Profile Document” which may be what their self assigned URI redirects to, but will also be auto discoverable from their main website. This we can (more) safely download knowing more or less what to expect, and we can provide standard ways to describe elements which may be useful to list on hub.data.ac.uk.

Combining Datasets

The problem I’m facing this week is how to handle combining datasets with multiple licenses.

Right now I’m thinking:

For every source dataset, include a “provenance event” describing where it was downloaded from, and the license on the document that was used as the source.

nb. this is not proper RDF, I’m just explaining my thoughts:

 <#event27> a ProvenanceEvent ;
     source <http://www.example.ac.uk/profile.rdf> ;
     action <downloaded> ;
     result <#source27> .

 <http://www.example.ac.uk/profile.rdf> 
     license <Open government License> ;
     attribution "University of Examples" .
 <#event27> a ProvenanceEvent ;
     source <#source20>,<#source21>,<#source22>,<#source27> ;
     action <merge> ;
     result <>

OK. So the above is true but I’m not sure how useful it is. If I’m using a dataset, all I really want to know is:

  • Can I use it for the purpose I have in mind?
  • What restrictions does it place on me?
  • What obligations (attribution) does it place on me?

So far as I can see, combining datasets with different licenses results in a dataset which is licensed by all at the same time. This isn’t the same as when software is “duel licensed” and you can pick which license, this dataset is simultaneously under several licenses (like wiring them in series, rather than in parallel). Even a “must attribute” license gets out of hand with data from 180 sources (BSD was modified for a reason!)

The licenses we’re plannng to accept (or at least recommend) are, in order of increasing restrictions, CC0, ODCA and OGL.

One option we’re considering is to provide several downloads:

  1. CC0 data only under a CC0 license
  2. CC0 and ODCA data only under a ODCA license (with a long attribution list)
  3. CC0, ODCA & OGL data under the OGL. (with a longer attribution list)

I’m not a lawyer, but this seems to go with the intent of the origional publishers licences.

There’s also the issue of the ODCA phrase “keep intact any notices on the original database” which would be easy to do if combining datasets by hand, but is going to be very difficult to automate. What if their notice turned out to be in the XML comments in and RDF/XML file?

I came quite late to the Semantic Web, so I suspect much of these issues were discussed a decade ago, so any tips or leads from the community would be most welcome.

All, in all my favorite license remains the “please attribute” rather than “must attribute”. It’s legally the same as CC0, and makes not additional requirements for reuse, but just asks nicely if you could credit the source when and if convenient.

 

Posted in RDF.

Tagged with .


How to mirror a TWIKI

We ran a few TWikis back in the day and they were pretty good but now we tend to prefer media wiki. We wanted to retire some of our old TWikis because they were putting a lot of load on our webserver. Some of the code isnt very efficient in the version we were running but rather than upgrading we decided to close them and make a static mirror using wget. If you’ve never heard of static mirror or never known how to make one I have always refered to: http://fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

I searched pretty hard for how to do this best and couldn’t find any kind of useful information. TWiki gets into an infinite loop if you try and spider it so I had to find the combination of arguments to wget which wouldn’t get trapped in a loop but still give me all the important content of the site.

wget -mk -w 1 –exclude-directories=bin/view/TWiki,bin/edit,bin/search,bin/rdiff,bin/oops <site_url>

 

Posted in Uncategorized.


Dissappointed by THE Awards

So I’m actually quite excited to be going to the Times Higher Education Awards, as Southampton have been short-listed for outstanding ICT Initiative for http://data.southampton.ac.uk/. When (OK, if) we win, it’ll give us some great bragging rights. Although I’ve met one of the other ICT short-listed teams, as we’re working with them doing cool stuff with equipment data, so I won’t be too grumpy if they win as they’ve done some neat stuff too.

The problem is, what use are these awards? Check out the “Previous Winners” page from last year – it’s bloody useless. It doesn’t even tell you the names of the projects. This entirely fails to promote good practice in the sector, and it would be so easy to link to the winners (and short-listed) teams entries, or better still to their project sites so we could check them out for ourselves. I want to see what other great things are going on in UK ICT and they are failing to take this simple step.

These awards are like if the Oscars announced only that a “Paramount” movie won the award for best supporting male actor, but didn’t bother to tell anybody who the actor was or what the movie is called. That’s a bit lame.

Win or lose, it’s a missed opportunity for us and the other projects involved.

I’ve got to rent a tuxedo for the first time in my life so that’ll be… novel.

*** UPDATE ***

I’ve heard back from them, and they were (a) good natured about my bloggy-banter and (b) seemed to be willing to consider the issue. I don’t think they are going to change the policy, which is a pity, but if they start to hear this from more angles then maybe in time they’ll work out how they can do it.

Posted in Uncategorized.


Merging WordPress Multisites

ECS had a blog server for some years, home to a number of mature blogs.  As part of the university-wide systems centralisation, these blogs had to be migrated to existing Southampton WordPress server. Patrick and I were tasked with this.

Our initial googling return very little information about this, other than people saying how hard it was, so we decided that it was well worth documenting.  It wasn’t as hard as all that, though we did things that can’t be considered good computer science.

This is presented as a set of instructions, and we’re assuming that there are two multisite installations that need to be moved onto a single new server.  It relies on the database structure that wordpress 3.4.1 uses, so if you have a different version, your mileage may vary.

Continued…

Posted in Wordpress.


Join us…

So we’ve just written too fairly cynical blog posts about some of the challenges we’ve encountered producing open data for the university. The other side of the coin is that, other than governments and councils, we’re one of the first organisations in the world to attempt this. We feel it’s really important to share the negative as well as the positive, and to describe our mistakes as well as our sucesses. Being as honest as possible should help other people following a similar route to us.

The other big win is that our senior management have really bought into the idea of Linked & Open data. I got asked recently to produce a slide summarising RDF & LOD for Phil Nelson, one our Pro V.C.s who was giving a talk about equipment lists open data to senior staff from other Russell Group universities.

This year sees the launch of the Open Data Institute, which is being set up primarily by Southampton staff.

Our team has also recently been granted data.ac.uk to create a central point for ac uk data and help create Cool URIs for the sector.

Why am I talking this all up? …because the university management have decided to invest some of the ‘strategic fund’ in creating a full time Linked Open Data specialist post. I believe we’re the first university in the world to have committed to open data to the degree of creating a post. Because of the way it’s funded, it’s a 2 year fixed post. It’s very possible it will become permentent after that, but I don’t want to make promises I might not be able to keep.

I love my job. I started in October 1997 and I’ve still not got bored.

I’ve put more information about this on the data.soton blog.

Location: Highfield Campus
Salary: £27,578 to £33,884
Full Time Fixed Term
Closing Date: Sunday 19 August 2012
Interview Date: To be confirmed
Reference: 146112JF

More Information

Job Description and Person Specification [Word Document] — we will use the person specification to determine who gets the job, I anticipate we may know, or even be friends with, some of the applicants, so judging everybody by the person spec. helps keep it fair.

Posted in Uncategorized.


Enterprise Disasterware

This post is in some ways a follow up to Chris’s previous post http://blog.soton.ac.uk/webteam/2012/08/01/fear-uncertainty-and-doubt/. Specifically:

Our software has no API (because it’s enterprise, unlike that crappy interoperable stuff)…

  1. ..so tough, you can’t have it
  2. ..and the SQL is too difficult to understand, it’s all like “table233.fieldA3″
  3. ..we’ll have to pay the people who make the software a fortune to get the data

This is a problem we encounter on a daily basis. We bought this thing and it does the bare minimum we needed for about £100,000. We are in an ever changing world and the requirements for the system change, sometimes before it is even deployed. At this point the enterprise solution becomes an albatross around your institutions neck. You start hearing phrase like “we can’t do that because we can’t add functionality to Banner easily” or “we can’t understand how to move our data out of syllabus+”.

Now you have a number of problems, not only can you not go forward and build the system in your grand plans, but you also can not go back. You can not go back because you have spent your £100,000, the money for this solution is spent. Even more concerning, in the unlikely event your institution will stump up the money for a project to replace this system, it is virtually impossible to get the data out (its the reason you want to replace the damn thing). This means if you find a new system that works in your new plan you have to go through a painful migration process. Add to that the potential that the system you are moving to will have the same problem by the time you have successfully done the move. What results is Enterprise Rot, you are land locked, your institution can’t innovate because your information systems are stifling you.

One thing it seems we never plan for when purchasing systems is an exit strategy. It is highly likely sooner or later, for what ever reason, you are going to have to leave your system. If you have a good exit strategy that is one less burden when choosing new systems. An enterprise solution will not have any discussion of what happens when you want to leave. Why would it? they don’t want you to leave, you are paying them. With an open source solution you download the code and try it out and find out how easy it will be to leave. Also with a good open source solution you are much more likely to be able to add functionality and export/import data so there might be less need to move.

I come from the repository world where the name of the game is data sharing. Our business is information modelling and preservation. Importing and exporting data is one of the core functions of all the major platforms. Most of the solutions are also open source which means that you have a lot of flexibility about how integrate the system into your business architecture. This makes learning about enterprise systems where this is not the case quite frustrating. I have discussed my feelings before at http://blog.soton.ac.uk/oneshare/2011/11/25/your-insitutions-data-mobility/

The Take Home

The aim of a business solution is to facilitate your business objectives. Your business objectives change frequently in response to environmental changes and often in unpredictable ways. It is vital you have solutions which facilitating your objectives not hamper them. Open source software provides some potential opportunities for exploring exit strategies before you commit to the solution.

Tell Us

If your institution is bogged down by enterprise disasterware tell us. It would be nice to have a list of bits of software which we should avoid buying. It would be good to have a list of bits of software which are recommended. Tell us about your personal horror stories. You can reply in the comments or tweet #disasterware. I there is any interest I might look at running something on this vein at next years Dev8D.

Posted in Best Practice, Open Source, Repositories, Sharepoint.