Open Data Camp 2

Last weekend we went to UK Open Data Camp 2 in Manchester. This is an unconference so you don’t know quite what you’re going to get and it was indeed a mixed bag.

The very first session I attended was the most shaky. It was about asking the audience for feedback on an app their fire service was having developed. It would have worked OK with a smaller group or some facilitation, but had the misfortune to be at a time with only one other session. Hopefully the chap got something useful out of it. I’ve been to enough of these events to know the ‘law of two feet‘ and failed to enact it, so more fool I.

The session that I should have gone to was about a service someone had set up inspired by Open Data Camp 1 giving left over food to the nearby homeless shelter. The idea of his site is you can register as being interested in acceping left over fresh food. This means if some organisation has overcatered they can conveniently find a worthy use without having to think too hard about it. http://ediblegiving.org

I have a bunch of open-data training that I could give on different topics, and would quite have liked to but didn’t know the audience well enough to pitch something. My fear was ending up having to do a poorly attended session and having to miss something I could learn from. What would work better is if maybe we aimed to have more ‘speculative’ pitches which only run if there is enough interest.

One thing that I’ve not seen happen before at an unconference was that two sessions were so popular that they got declared spontaneous plenaries. One of these was about open data from DEFRA and the other was a history of the UK address (data) wars. This was a great decision from the organisers because I would have hated to run a competing session.

An interesting moment in the DEFRA session was where they asked if people wanted the cleaned and checked final-version data about river water levels or the raw current sensor data. Pretty much everyone who was interested wanted both and that theme recurred in a couple of other sessions. It’s OK to publish the same data in different ways for different audiences. If I’m buying a house or doing town planning then long term data on flood patterns is very useful. If I own a house by a river than real-time water-height data, even if it’s sometimes flawed, could save my carpets.

Making open data suck a bit less

At the end of the first day I ran a session called ‘making open data suck a little less’ which I think came out OK and people said some nice things on Twitter, but the frustration is (surprisingly) we’re not a very empowered community. I think the involvement of really big guns like government agencies means that everybody expects someone else to do stuff and it’s a bit intimidating.

I wanted to ensure that we got some suggestions of solutions as well as problems so we started by brainstorming what pisses us off about open data and then I tried to resolve that into some key areas and get people to suggest solutions. In the front row was Amadnda Smith, the Community Engagement Manager at the ODI (Open Data Institute) making lots of notes which was heartening.

The things which pissed us off included:

restrictions on who can use the data, eg. acadmic only
having to fill in forms to get data
Javascript interfaces which make it hard to access the data directly and mechanically
open data available only from an API rather than also providing a bulk download
data not linked to related data, eg. you’ve found the 2012 street crime data for Basingstoke, but is there other 2012 data? Is there 2015 data for Basingstoke?
formats without any clues of how to get started with them (Geodata, we’re looking at you)
no licenses is a problem for people wanting to use the data more formally.
data in CSV especially “O.N.S.Esque data”, a sarcastic nod to the office of national statistics less helpfully formatted spreadsheets.
data supplied in PDF (sigh, people still do that)
data supplied as a picture. Graphs, for example.
non-intuitve XML structures.

The solutions were not as fun as the gripe session; I tried to boil the above frustrations down into broad sections. Some of them we didn’t have much idea of how to realisticly address.

Frustrating Legal Stuff
Frustrating Access
- a problem here is that the first time the community sees the user interface is usually after the development company have already finished the product. Release early, release often.
Frustrating Formats
- For formats that might be opaque to newbies, it would be good for the community to have well known pages that can be linked to when you publish data in certain formats or APIs. The question is who does this? The ODI? OKFN? The Open Data Handbook?
- It would be very helpful for more obscure formats to tell people what tools and libraries exist, and to remember that your ‘standard format’ might well be baffling to others.
- One persons’s poor bad format choice is another person’s preference. It’s OK to publish data in more than one format.
- Promote tools like CSV Lint.

Since the event I have created a small page about LIDAR data which has some of the information I wish I’d had when I started.

Pub vs Resturant?

In the evening we ended up taking the ‘pub’ track rather than the ‘resturant’ track, and ended up with just me, Ash and a chap from the Isle of Man. I learned lots of interesting stuff about the Isle of Man – it’s less population that the Isle of Wight and has it’s own government and laws and would prefer to not be seen as a tax haven. I suggested they should make sure all their data is on open corporates (if it’s not already, I’m not sure). We did manage to catch up with some more delegates later in the evening.

I should also mention that this was not a great weekend to get a hotel in Manchester. There were several sports-ball events and Ash’s hotel room was expensive and ideal if you wanted to go to bed after attending the nightclub underneath until 4am (sorry Ash..). I was in the Premiere Inn. Fun fact: There are 6 Primiere Inns in Manchester. Another fun fact, all the buildings in Manchester are huge and solid and really screw up GPS so trust the street signs, not the GPS. Another tip for Manchester navigation; when dealing with canals stop and look at the map to make sure you’re on the right bank or you’ll end up backtracking.

Oh, and I’m still trying to work out who I borrowed £20 in the pub so I can pay them back (sorry about that, utterly forgot about it until I was on the way home)

Isle of Data

The next day I went to the session about what an open data service for the Isle of Man could/would/should look like. While technicially this was another ask-the-community-for-help session, it was far more interesting as it was about practical solutions and best practice. As someone who both creates and consumes open data I had lots of opinions and I did my best to let other people talk too, but my best wasn’t very good (sorry). Some interesting points that I remember:

start with identifiers not datasets
publish what you have already, but aim to make better datasets later (raw and cooked)
make recurring publication plans clear so people can confidently build systems with confidence
where possible make government departments communicate using open data methods

Some things I learned.

I should have gone out and purchased a nicer jar of coffee than the gawdawful Nescafe. I could have stuck ‘sponsored by data.ac.uk’ on a jar of more expensive coffee granules and got some very cheap promotion.
Provide a bit of guidance to help people new to unconferences think about what they should do in advance.
For unconferences, let people request sessions as well as propose them
Give some time after the pitches for people to mingle.
Check sports fixtures if possible when planning a city event
Trust read a map and street signs more than GPS

Thank-you to those who organised this. Please know that any negative comments are always about making it the best it can be, not complaints. I know how much work these things are to put on.

Also, the venue was excellent. Hardly worth mentioning as the worst thing about it was the chairs that were a bit too slopey to put my coffee on, all the other stuff was ideal for this kind of event.

Posted in Events, Geo.

Tagged with Data.

No comments

By Christopher Gutteridge – October 23, 2015

Weeks Five to Eight

Sorry its been a while since my last post, in this post I will be looking over the week I have been doing for the last four weeks. I’ve mainly been working on the two projects I have already started on, the Drupal website for the Project Showcase, and the Javascript application for viewing desk layout maps of iSolutions buildings.

But as well as those two projects, for the last few weeks I have been working on writing unit tests for FloraForm, a library for generating and processing web forms with complex data types. FloraForm was written in PHP by two other members of the web development team, Patrick McSweeny and Chris Gutteridge, and is intended to be used to generate complex list fields on web pages, allowing components to be easily added onto each other (for example, a list of fields of different types) and provides constructor methods for the libraries classes.

My task has been to expand on the tests for the different component classes within FloraForm. This has been my first encounter with PHP, a web scripting language similar in function to Javascript. I’ve found PHP to be a far easier language to work with than JavaScript. In order to write the tests I first had to get to grips with the libraries classes, which often required some trial and error in order to understand how the classes function. For each field class my intention has to been to test the construction methods for the class, any functionality for formatted input to the fields and the classes validation on data sent from the form.

Example of several tests having been run

For Project Showcase I needed to improve the site enough that it could be shown to a 3rd part as a demonstration of the final website’s design. This meant ensuring the site was using the Universities website branding and fulfilled almost all stories described in the requirement documents. The focus was on requirements which were related to student usage of the site, such as functionality for searching and sorting lists of supervisors, projects and themes within the system, and less on more advanced administrator features such as the ability to import projects from the ECS Handin system.

Whilst working on the Project Showcase more and more I found the Drupal platform has both considerable strengths and flaws. It is a powerful platform which can be the basis for very complex websites, but I think it would prove a highly inappropriate choice of content management system for new or experienced developers, due to its complexity and poorer quality documentation. The latter is often a result of Drupal’s reliance on user developed modules, an approach which on one hand makes it easier to expand and manage the functionality of a Drupal website, but also can result in important functionality being reliant on poorly documented or badly written modules. The lack of good documentation often requires a developer to experiment with different modules in order to find one appropriate for a specific piece of functionality they are trying to implement. For example, I tried several different approaches till I settled on using the Rules module to automatically un-publish projects which have been marked as inaccessible by an administrator.

The work I have been doing on the Floorplan Map Viewer has mostly been concerned with completing the datasets for the iSolutions office in Guildhall Square One, and the two floors of the research area in Building 32 on Highfield Campus. The aim at the moment is to, by the end of my internship, deliver these datasets, a Javascript library for parsing these datasets into usable formats and a simple exemplar application which can render the datasets onto the floorplan maps. Since I have a week and a half left until the end of my internship, I hope to before then expand on the current simple example application, and refactor my code into a more useful library format.

Posted in Uncategorized.

No comments

By Henry Wilkes – September 15, 2015

Deliberately Tainted Data

Do you see trainted data? Call now…

In iSolutions, we run a fairly common system for services of having a live, a pre-production and one or more development copies of the service. It took me a while to come around to this approach, having mostly done seat-of-the-pants hacking on live services in the past, but come around to it I have. One problem with this approach is a problem we’ve encountered a number of times where the pre-production service contained more or less the same data as the live service so either people used it in error, or it sent real emails to real people based on test data.

A long time ago I came up with a way to massively reduce such incidents. Not stop, but reduce. The idea was inspired by the smell of natural gas. Natural gas doesn’t actually have much of a smell but the distinctive smell of unburned gas is added artificially and makes it very easy to notice if there’s a leak. While this approach doesn’t directly stop explosions, it means that 99% of incidents are caught longbefore anything bad can happen.

ChristopheX GutteridgX

My idea is to add a “taint” to some columns in the dev. and pre-production databases to make it obvious to a human that the data is tainted, but not impact testing. To do this I pick some free text columns which are going to be frequently viewed in any user-interface. For example Person_Forename, Person_Surname, Event_Title, Document_title. If these have 3 or more characters, I replace the last with a capital X. That way it doesn’t change the length of any data or notably change the indexing. So I would appear as “ChristopheX GutteridgX” and John Wu would be “JohX Wu”. It’s immediately obvious that something is off, but the system can be tested as usual. If ever preprod or dev data accidentally ends up in a live system, it’s immediately obvious. This can happen if a database hostname is accidentally included in the version controlled part of the application, rather than in a config file outside the normal version control.

This is no substitue for proper checks and processes but it makes an excellent extra line of defence for no significant cost.

It works! (sample size: 1)

Today someone told me that their live database is showing tainted data. I’ve checked the database tables, and they have the correct untainted data, so I can deduce he’s still using an ODBC connection to the pre-prod database. A small victory, but it’s the first time this approach has paid off, so I wrote this blog post to celebrate.

I’m sure I can’t be the only person who’s thought of this. Does the technique have a name? Is it a good idea or an antipattern?

Posted in Best Practice, Database.

Tagged with Data.

5 comments

By Christopher Gutteridge – August 27, 2015

Week Four

This week has been somewhat quieter in the office due to various members of the team working else where or being on annual leave at different times of the week, and I’ve largely been working on my own on outstanding tasks for the Project Showcase project, and also some work on the Floorplan Map Viewer project.

I have been taking time getting to grips with Drupal. Drupal is admittedly a platform with a difficult learning curve, and its reliance on 3rd party modules for much of its functionality can be frustrating, this is made up for by Drupal being a very power platform to develop on once you have gotten to grips with it. The Drush utility makes developing for Drupal considerably easier by providing a straightforward way of managing modules within a project. For the tasks I have been working on this week I’ve needed to often test out several modules, one after the other, in order to find a module which provides the best solution to the problem I’m trying to solve or which provides the functionality required in order to implement a desired feature. For example, I have used the Relation and Relation edit modules to implement many-to-many relationships between content types (Drupal’s name for data classes within a website), and at the time of writing I’m investigating the use of the Search API module for providing better and more detailed search tools for the website. One part of Drupal I have found consistently useful is the Features module, which allows the user to save elements of a Drupal website and the required dependencies for those elements into self contained packages (‘features’) which can be saved to files and exchanged between Drupal installs. This has helped simplify the process of version control for the project considerably, with the features and some setup scripts written by Kevin Puplett being added to the project Git at opposed to the entire Drupal directory.

Front page of Drupal site in development, using Southampton theming and with the admin toolbar visible at the top.

The work I’ve done on the Floorplan Map Viewer has involved fixing outstanding issues with the code in order to get the demonstration application functioning before next week, when I will be spending the entirety of the week extending on this demo. When I last worked on the demo I was experiencing a frustrating error where the latitude of my points were being flipped horizontally within the building. I discovered this was due to an error in one of my data sets. I also attempted to extend the demo to be able to display both the floorplan maps for floor four and floor five of Guildhall Square One as separate layers, functionality offered by Leaflet.js.

The leaflet.js demo, with the desk plan of floor four of GS1 shown.

Posted in Uncategorized.

No comments

By Henry Wilkes – August 19, 2015

Weeks two and three

Unfortunately I missed last weeks blog post so this week I will be catching up on both. Its been over a fortnight since my last post due to me being on holiday for a week, and since then I’ve been working on two different projects.

Last week I worked on fulfilling outstanding tasks on the CLECC research project. CLECC (Creative Learning Enviroments for Compassionate Care) is a web API being developed by iSolutions to be used by health professionals for recording interactions between themselves and patients on hospital wards. The project is late into its development, and my task last week had been to go through and attempt to complete open tasks for the project on Sourcekettle. This week was an interesting experience for me since it required me to interpret and make use of code written by a number of different developers at different times, whilst also requiring me to understand an unfamiliar application’s source code midway through its development. Most of the tasks I completed during this week were usually smaller enhancements to existing features or bug fixes, and I also found several small bugs which I added to the open tasks for the project as I went.

Task screen for CLECC project

This week I spent some of my time away from the rest of the web development team, working with Chris Gutteridge on a demonstration project which is intended to be later extended into a more general purpose API. The basic concept was to develop a web application which presents maps of the floor plan for the iSolutions offices in the One Guildhall Square building. The project quickly expanded beyond this when it became apparent there were other team members who were interested in a potential solution for similar problems. For example, Kevin Puplett, another member of the web development team, was interested in a application for displaying current usage of workstations in the undergraduate labs in the Zeplar building. We expanded the project to be essentially an API which could be adapted to different roles, with the aim of initially concentrating on a demo application which could overlay the locations of desks on the fourth floor office of 1GS on a interactive map.

We first needed to produce a collection of datasets which could be used to model the relevant data on the objects we wanted to be able to map in our application, from university sits through to buildings, floors within those buildings, rooms and desks on the floors and ‘resources’ (computers, monitors, etc) which could be associated with desks. The format we used were CSV files, since this was a lightweight data format which we could easily produce initial test data for, but would also be easily expandable in the future. I found later in the project the risks that can be encountered when attempting to produce such datasets by hand, but I was able to fix the problems I had without much issue.

The concept behind the backend code for the application was that for each floor plan which was going to be mapped within the application, the data for that plan would include the longitude and latitude for three known points on building/floor that plan was for, as well as the x and y coordinates of those points within a relative coordinate system. Chris produced a function which would be able to find the longitude and latitude of a given point on the plan where only its x and y coordinates within the relative system are known.

The application itself required a number of libraries to be written ourselves or found online to fufil various tasks it would need to perform. The two libraries I found online or which were suggested to me to use were Papa Parse, a javascript library for interpreting CSV files (http://papaparse.com/) and Leaflet.js (http://leafletjs.com/), a javascript and CSS library for embedding interactive maps into applications. Chris wrote a library, asyncset.js, for accessing the CSV datasets themselves and reading them into strings using Ajax, and I wrote the pointlocator.js library which implemented Chris’s point finding function.

Example of Leaflet.js taken from the API’s website

The end of the week was spent working to produce a demo that brought together these different libraries and parts of the application after they had all been demonstrated to work independently. This application is intended to be an initial demonstration of the concept for the application, and the code is meant as the basis of what will eventually be a more fully rounded application. Next week I am working on a different project, but hopefully I’ll be able to spend some more time on this application over the summer.

Posted in Uncategorized.

No comments

By Henry Wilkes – August 10, 2015

Some of my many failures

I fail quite often. I have tons of ideas which seem like they might work and loads flop. Occasionally I do something that fails to fail and those are the ones people notice. I usually have fun and learn some skills on things that don’t work out so it’s not wasted time. I thought it might be useful to review some of the things which never really worked out.

You can’t see it in our wordpress template, but most of the headings are links to the github pages for each project.

Command Line Triple Tools

I started work on a suite of tools to process N-Triples on the command line. It’s been used by me a couple of times, but nobody else. I really thought unix pipeline tools would be useful, but not yet.

Southampton Developer Meetup (SoTech)

For two years I organised a monthly pub meetup for technology enthusiasts. It limped along for a while but without really reminding people they just didn’t show up, and a couple were only a couple of people at which point I called time on it. It was worth trying, but wasn’t the event people really wanted, which would have had more actual content, and that was more than I was willing to commit to.

JS Document Viewer Framework

This was an idea to view documents linked from a webpage using nothing but javascript. It worked OK and was aimed at the research data repository community, but it’s not captured anybody’s imagination. The nice thing was how extensible it is and that it can run on top of any repository as it’s pure js.

For a demo see http://lemur.ecs.soton.ac.uk/~cjg/tableview/ and check out how clean the page source is.

JS Tweak

This tool modifies the fontsize until the text fits into a specific height and width. Kinda like what powerpoint does. It’s still a bit buggy on single words for some reason. It’s ideal for making HTML based display screens where you don’t know the resolution or aspect ratio.

SharePerltopus

This monsterous creation is a perl library and command line tool which talks to some of the more common SOAP functions in sharepoint and lets you get at the data. It’ll dump out datasets as CSV and calendars as iCal. We use it internally but I’ve never heard of anybody else getting any value out of it.

Alicorn

This one is a bit of a heartbreaker. I thought this was a game changer but I’ve never got any interest in it. It is built on top of what Ash & I learned from building and rebuilding data.soton.ac.uk and lets you make a templated website with minimal configuration.

I used DBPedia as the initial demo — I thought it was a good chocie but other people have suggested otherwise.

Check out this page. It’s built using dbpedia SPARQL and these configuration files: config, template.

Ah, well, maybe it’s time hasn’t yet come.

XTypes

Extra types for RDF. Normally you just use the XSD types for RDF but we found it very useful to use a datatype to indicate if a literal contained plain text or a fragment of HTML markup. Extrapolating from that I created the XTypes vocabulary thinking it might be useful, but it’s never caught on.

Inside the Box

I never fully got this one working. It was intended to look at an RDF dataset and tell you what the main types were, what properties they had, and what the objects of those properties were. This is distinct from the vocab, it’s instead what is actually in a given dataset. I thought it would reduce the time-to-grok when working with undocumented RDF datasets.

They Work For SUSU

This was an attempt to automate turning the Southampton University Student’s Union meeting minutes into RDF. It looked at the layout of the document, not just the order of words, and gave surprisingly good results for the effort involved but the interest from the Union waned (as it always does with a one year turn over in leadership).

See a demo of They Work for SUSU.

RDF of the Library Jewish Collection

I thought I could map the metadata from one of the library collections into something semantic, but it was just a list of terms which might be dates, approximate dates, names, organisations or anything else and in the end I gave up. It would have been faster to go through 500 records by hand to annotate if each keyword was a date, person, organisation or something else.

cpv.data.ac.uk

This is a linked data version of the European Common Procurement Vocabulary. When any public sector organisation in the EU buys something big it has to go to tender, and these terms must be used. It contains codes for everything from cigarette papers to warships… but not 3D printers as it’s from 2008. As most university research equipment in the UK will have been procured using these codes I thought it would make a good base vocabulary for categorising things on equipment.data.ac.uk but nobody has done anything with it that I’m aware of.

Event Programme Ontology

I really felt that this ontology was a missing bit of the linked data web and would help start to enable real open linked data for large events, which could massively benefit from economies of scale in producing tools. I’ve got good use out of it myself, but it’s not been used by anybody else that I’m aware of.

observatory.data.ac.uk

I expected this to really take off. It keeps weekly stats on a bunch of factors about *.ac.uk homepages and I thought the data would be really valuable to university comms deptartments and web scientists, but nobody seems very interested.

Southampton Open Data Minecraft

I used open data sources to generate 1.6*5.5km of Southampton as a Minecraft map. I thought people would be really interested and it would inspire people to build on top of it, but I’ve had virtually know intererst even after presenting it at a couple of events.

Summary

Well, I feel a bit glum looking at that list, although I enjoyed working on every one of them. I have a few successes under my belt, but they come at the price of a vast number of failed ideas, or at least yet-to-take-off ideas.

What projects have you failed to get off the ground? How do you decide when to stop working on them?

Posted in Open Source, Programming.

Tagged with Tips.

4 comments

By Christopher Gutteridge – July 20, 2015

Brave New Jisc

(Chris Gutteridge)

I’m also at the Jisc dataspring sandpit. This set of project funding has several experimental innovations from Jisc and I want to put some of my thoughts down about how they are working.

Innovation one: These projects were formed out of a workshop with a lot of opportunity for feedback and similar ideas to be merged into a single stronger proposal. The proposals were turned into posters and then the attendees at the workshop used sticky stars to indicate projects they supported. After this there was a day of formal pitches.

Innovation two: The funding is broken into phases and the projects must justify their continutation at each step. The current event is 3 months in and the projects are spent the first day doing a show and tell with questions, and today is a pitch to judges for their continued funding.

I think that it’s important for Jisc to change up as in it’s previous life as JISC it had a reputation for funding quite a few turkeys. Projects which didn’t actually produce something of value for the community in return for the money given. Don’t get me wrong, they also funded some gems, but there was a clear need to change the rules to improve the value delivered.

I felt both the innovations were a good idea in theory, but in practice I’m not sure the second is working in it’s initial form.

The first workshop was a very new experience and the resulting projects are much stronger than usual. To be more specific, there’s no obvious lame ducks that got lucky, and the community of projects has a much better picture than usual about what each other are doing and why it is worthwhile. It was also helpful to show poorly received projects that they had failed to engage the room and gave the funders a clear idea of what the community felt was worthwhile.

The expectation of a life-or-death review at 3 months has put a rocket up the bottom of all the projects and many already have active github repositories and demonstrators.

However, the 3 month review event is not making me so happy. It’s costing a good amount of time and effort, and people are not as collaborative as they normally would be at a Jisc event. The pitches on day two are not an update, they are a plea for survival and so people are putting the focus into that rather than talking with each other.

Yesterday afternoon went quite poorly. The idea was projects would give an update with time after for questions, but the venue was bad. I couldn’t hear much of what was said and many people just did an update without a chance for questions, at which point I’m not clear what the value is. The first day should have been friendly and collaborative but someone in Jisc sent the full force of their comms team to film and photograph and this made people uncomfortable. It would have been find for a ‘see our cool stuff show and tell’ event like Dev8D or Digifest, but was inapproprate for talks where the theme is ‘please can you give us any ideas to justify keeping getting paid’. The comms staff were just following orders, and apparently eventually took their shoes off to try to minimise disription, but ultimately they caused a fair bit of harm and whoever at Jisc comms dispatched them needs to tone it down for such stressful situations.

Many people had an early night, or went to their room to polish their pitch. This meant there was a bit less of the after-dinner chats than usual, and people who did socialise fragmented around 9pm which was a pity

Today each team pitches why they should get contination funding and there’s too much information for the audience to absorb and it’s primarily for the 4 judges, but taking up the time of 100 other people who are sitting in silence hearing information they heard mostly yesterday.

I really did like it when one of the judges mentioned that he’s been noticing the amount of commits on github. That’s a pretty solid metric for projects which are intended to produce infrastructure not just documents.

Suggestions for the future

Don’t make people go through a second round of dragons den style justification. Do have deliverables that if not delivered will get funding cut, maybe a video or blogpost with the same content, but the times are so short that these projects can only be done by permenant staff and permanant staff already have a day job.

Don’t force everyone to sit in on a day of pitches. Jisc would get more value if the people in the room were talking to each other and getting ideas and feedback.

Don’t record any talk which is designed to get feedback rather than to inform. It is intimidating.

Ensure that people arrive promptly to sessions with a tight turnaround. People coming in late was very disruptive. Give a brief at the start of what the audience are there to do — the room is full of experts who could help, but it would be helpful for the chair to provide a frame of what feedback they could be giving.

Not everybody knows how to project their voice — provide mics.

Don’t ever use the rooms at Imperial we used yesterday. The doors bang and don’t auto shut, reaching the second room meant crossing the first room banging two doors. The floor echoed very badly and the columns meant it was hard to see and hear.

Timings on sessions with 5 talks in an hour needed to be strictly timed and the chance for discussion is essential otherwise why did we all come to London rather than just record a youtube vid?

Conclusion

Overall I think the fact that Jisc is mixing it up and trying new approaches is absolutely appropriate and is bound to have mixed results. The aim of this blog post is not to complain or grump, but rather to provide what I hope will be useful feedback.

All in all I feel that Jisc is back on track after the massive cuts and refocus they were forced to and I am optimistic about their future work.

Posted in Events, Jisc, Research Data.

No comments

By Christopher Gutteridge – July 14, 2015

Building a project legacy

(Patrick McSweeney)

I am at #dataspring sandpit 2 listening to what people have done in their projects so far. A project can achieve a lot in over its course but to get real value for tax payers money it needs other people to use it’s outputs. This legacy is where good work can become really valuable. One of the key things I have identified at the centre of a lasting legacy is the project website. Most people will find out about your project outputs from your website so you should make it clear and easy for them to explore.

In the spirit of openness I will review three past projects which I have worked on and talked about how I think their website has influenced or hindered their success. In all these projects I was developing software but some projects will be generating policy or guidance type material. The same still applies here I just don’t have any of my own examples to critique.

The Faroes Project

URL: http://blog.lsl.ecs.soton.ac.uk/Faroes

This was the first JISC project I ever worked on. As a team we were not clear what was expected of us and we were struggling to find out feet. This really shows in our website. The website still exists 7 years later which is one of the few things it has going for it. It has a tiny bit of information on it but that information is not very useful. It says what we did but does not provide any outputs from those activities. We did a big requirements capture workshop which is documented on the blog but the requirements we gathered are not. This was a document that existed and it would have been really easy to attach to the post. The project also produced a running service, languagebox.ac.uk, which is still accessible and working today but completely unlinked on the website. The code for this service is open source but that is not linked either. What we learnt is not really documented here and it easily could be. It featured in reports and power point presentations which for almost no effort could have been attached. It also spawned a few papers which could have been linked to for very limited effort. The low link count on this site means search Faroes project in Google puts this site on the second page. A well linked blog post on another blog gets our top spot.

Legacy: The languagebox.ac.uk site although you would not know that from the website. The website provides no value that I am really aware of, this project which lead to the software has been more or less completely lost

Lessons

Do:
* Have a website hosted somewhere it will last.
* Upload as much information as you can. Even if its not polished it is better to get it on the web than leave it on your laptop to polish up later (that never happens).
* Link all the external materials from your project website so people can find it.

Don’t:
* Make managing your website into a chore. Do not let perfect be the enemy of good. More medium quality information is much better than almost no high quality information.
* Put too much stock in a domain name. As soon as your domain lapses (and it will) your site can disappear off the web.

The OneShare Project

URL: http://oneshare.ecs.soton.ac.uk/

This was a follow up project to Faroes which joined up with the EdShare project. The website itself is quite poor. The site was flat html (no CMS) and so it was difficult to update meaning it was never really updated. It links out to a few different resources and most importantly to the OneShare blog (http://blog.soton.ac.uk/oneshare/). This should have been the project website, it is on wordpress maintained by Southampton’s IT department so the software is kept up to date by someone else. It was easy to add content to and this was a huge boon. We did a lot of other projects off the back of this blog because it was so easy to use, we even made interns each kept a log of their activities on there. There is a lot of value in the blog posts and the pages link out to the vital information. I have had several people contact me by email to ask for help installing the EdShare software so it does get read. The website is a bit of a flop but the blog was a complete success. The website does turn up on the front page of Google but if you get there your might miss the good content which is on the blog.

Legacy: Six repositories containing a substantial set of open educational resources. The blog has a reasonable explanation of our approach and links out to the code we produced. The code has been reused in a number of places by people who were not related to the project.

Lessons

Do:
* Use a tool which allows you and your colleagues to publish information easily (or you might not do it at all)
* Link out to live demonstrators
* Link out to your code
* Show your workings, talk about what you found out and how it informed your development

Don’t:
* Fragment your content any more than you have to (we should have just had a blog)
* Don’t be afraid to run multiple projects from the same blog and just provide aggregations using tags or categories

All About MePrints Project

URL: http://blog.soton.ac.uk/meprints/

This is probably my favourite project website just because of its simplicity. Having learnt our lessons we produced a really good, simple site which had really valuable content. We used University hosted wordpress again but kept all the content there, having realised a blog is just a website where it is easy to add content. There are clear links out to the code, documentation and even the little workshop paper we got out of it. Making the software easy to install was a big win which resulted in high uptake. For a short 6 month project this project really gives bang for the buck. If I had been wiser when I worked on the Faroes project it would have been a similar success.

Legacy: As well as the seven repositories which we installed MePrints on during the project there have been many installations since including some high profile ones (e.g http://eprints.gla.ac.uk/). The way we approached the project is well documented enough that someone might even use it to inform their approach to a project.

Lessons

Do:
* Make any software as easy to install as you can
* Do link to the software documentation
* Write posts with lots of content so someone can repeat your approach

Don’t:
* Think that your project is too small for a website. We wrote 11 blog posts in 6 months and it would have been worth it if we had only done half that

Conclusion

A website really is showcase of the work that you have done. The value of the material you put on it will directly influence how much your outputs are reused. The combination of the quality of your outputs and how well the are communicated on your website will build a picture of your projects legacy. Make sure you do the dos and avoid the don’ts and in 7 years time you might have a project legacy which you can be proud of.

Posted in Uncategorized.

No comments

By Patrick McSweeney – July 14, 2015

Week One – Starting my placement!

My name is Henry Wilkes and I’ve just finished the first full week of my summer placement at iSolutions, the University of Southampton’s IT department. I’m here until the end of the summer when I begin the final year of my MEng course, and until then I will be keeping this blog about my experience.

I applied for this placement through the University’s Careers Service Excel Programme, and in the short time I’ve been working at the office in Guildhall Square I’ve already learned a lot and have really been made to feel like a valued member of my department. I’m working as part of iSolution’s technical innovation and development team, making use of a range of web technologies in various projects related to the University’s websites and network infrastructure. This is a field I’ve previously not had experience in, so I’m having to learn a lot of what I’m doing as I go but I am receiving guidance from members of the team.

My task this week has been to update a set of HTML templates used for several Student Services websites to a more up to date look and feel, and in particular to make use of the Bootstrap framework. Bootstrap provides a library of CSS which define a consistent look for HTML pages, with an emphasis on providing a good experience on a range of mobile devices as well as desktops. The task involved replacing a older library of CSS files written within the department with Bootstrap and adapting the relevant HTML templates to make use of them.

As a starting point I used a template which was already written to use Bootstrap, and implemented the various features present in the original HTML templates which were being replaced. In order to test my changes I set up a virtual host running on Apache which allowed me to view the template’s test pages in Firefox. Here I could use the Firebug plugin to identify the cause of specific issues with the HTML and styles. By afternoon I was able to fully complete this task, having produced a template which can be applied to web projects.

An example of a page using the new template

That was my first week at iSolutions, and in that week I feel I’ve managed to settle in well in the department and make good progress on my personal goals to expand my range of technology skills. I’ve managed to produce something I’m proud of and I’m looking forward to moving onto new projects in the coming weeks.

Bootstrap is available at

http://getbootstrap.com/

Firebug is available at

http://getfirebug.com/

Posted in Best Practice, Open Source, Team, Training.

No comments

By Henry Wilkes – July 10, 2015

Character set hacking

This little note is mainly for my own benefit. Dealing with character sets can be a bit of headache. If you didn’t understand that statement then begin by reading Joel on Software’s beginners guide right now! Your going need to know all this one day. All programming languages I have ever used have utf-8 compatibility. Java is probably the best and python2 is utterly shocking. This is just one of the reasons python is not, as some people have claimed, “the perfect language”.

Character encoding problems mainly fall into two buckets:

You didn’t know what character encoding you received (or worse you were told what it was but that was a lie)
You are saying your output is a different character set than it is.

Usually option 1 results in option 2 because you take input from the user, store it incorrectly and then output it incorrectly. It doesn’t take long to get in a real mess.

There is a command line tool called iconv which can very helpful for solving these problems. Most languages have an iconv library which you can make use of. Most people’s go to solution for this problem is clean up the data on input into uft-8, store it as uft-8 and then (you guessed it) output it as utf-8.

You can get what ever crap the user sent you into utf-8 using iconv. The thing you probably need to do is detect the character set as best you can. In PHP that looks like this:

$current_encoding = mb_detect_encoding($text, ‘auto’);
$utf8_text = iconv($current_encoding, ‘UTF-8’, $text);

There will be some situations where the person getting data from you wont be able to accept utf-8. This provides a superb headache if you are storing all your data as utf-8 because you will more than likely have a series of characters they are unable to read. iconv is to the rescue again, it can convert into smaller character sets. You probably don’t want to throw away the data that doesn’t fit in the output character set so use “transliterate” to substitute characters in your data for a near alternative. This means nonsense like right-quote and left-quote get turned into ‘ (quote) and other similar substitutes. Its not perfect but its better than just throwing letters away in the middle of words for all the Áá Ắắ Ấấ Ǻǻ Ćć Ḉḉ Éé Ếế Ǵǵ Íí Ḯḯ Ḱḱ Ĺĺ Ḿḿ Ńń of the world.

If the person on the other end doesn’t know what they actually need (frustratingly common) then you probably want to give them ascii. In PHP that looks like this:

$ascii_output = iconv(‘UTF-8’, ‘ASCII//TRANSLIT’, $output);

For more information on the PHP iconv library you see the docs at http://php.net/manual/en/function.iconv.php

Making open data suck a bit less

Pub vs Resturant?

Isle of Data

Some things I learned.

ChristopheX GutteridgX

It works! (sample size: 1)

Southampton Developer Meetup (SoTech)

RDF of the Library Jewish Collection

Summary

Suggestions for the future

Conclusion

The Faroes Project

The OneShare Project

All About MePrints Project

Conclusion

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags