Week 4 – Choices goes live once more!

Skills demonstrated
– Unit testing
– Manual testing
– Bash scripting
– User Consultation
– Crazy Golf

My hope to have a choices free week was delusional. Choices had to go live last week which meant – more testing.

These tests were different to my usual unit testing, these were full system tests. Usually a developer would use a program called selenium to test a system as a whole. This program simulates keystrokes and mouse input; it can be creepy to witness or so I hear. I wouldn’t know though, there was not time to implement selenium tests. Instead I spent Monday, Tuesday and part of Wednesday doing the tests by hand. Most likely, I only spent a few hours doing the tests. It took two-point-five days of my time to fix the endless torrent of errors. Who would’ve thought that an innocent action such as opening a form can blow up in my face? I accept that none of the allocation controllers worked first time, they are magical. But opening a form? That is just unfair. As is the way with choices.

Having managed to corral choices into its holding pen once more we celebrated with a round of TIDT golf. I didn’t want to show up my colleagues too much, so I kept it close and managed to pull away with a 4-point lead. We will return for the last 9 holes; the suspense is killing me.

Next on the agenda was Docpot and its migration. Docpot is a vestigial file-share which needs operating on. It’s only used by a handful of ECS staff and it’s my job to merge the remnants into a newer preexisting share. The implementation of this is simple enough, keeping the users of Docpot happy is a tricky one. By using some shell scripting magic, I was able to list all the files modified this year with who modified them. Then more shell wizardry to convert their names into emails to copy them into outlook. You can see where this is going. It turns out that it was only one directory which needs preserving. More developments on Docpot in next week’s blog post.

I’m now a third of the way through my internship. Having made it this far I’m sure that I’ve enjoyed every day so far, despite the sheer difficulty curve. I curse more at my work monitor than i do at a game of rocket league. But I’ve learnt more in these four weeks than any other four weeks in a long time.

Posted in Apache, Database, HTTP, Javascript, PHP.

Tagged with Data.

2 comments

By Joseph Sturgeon – August 1, 2016

HESA Open Data Consultation: University of Southampton response

This response has been submitted to HESA in response to their consultation on open data.

This is a temporary location, the text will be moved to our consultations responses site in the next few days, and I’ll put a link in here instead.

Question 1

Do you support HESA’s aim to make as much of our core data as possible available as open data?
Please explain your answer

In general we support the aim to make data as open as possible providing the data is drawn only from the University’s signed off HESA returns—i.e., the data has been through rigorous quality control.

Question 2

Do you agree with HESA’s assessment of its data sources regarding suitability to publish as open data (Annex A)?
If not please elaborate on any areas in which you disagree

Yes, we agree with HESA’s assessment of the suitability of the data sources to publish as open data. It must not be possible to identify individual students or staff members from information published. It is vital that there is consultation and collaboration with HE institutions on the content of the data subsets released.

Question 3

Do you feel that the list of open data resources to be published in Annex B is comprehensive, or do you feel there are any other types of open data publication HESA should be planning?

Yes, we feel the list published in Annex B is comprehensive.

For linked-data purposes it would be very useful to provide mapping tables (“linksets”) from ID schemes used by HESA data to other common ID schemes. http://learning-provider.data.ac.uk/ currently provides a number of these “linksets” which could be polished and expanded. These reduce the costs to data-consumers wanting to join HESA data with other datasets.

Question 4

Do you agree that it is important for HESA to publish meta-data as open data in addition to the data sets?
What benefits will this deliver for users?

Yes, we feel the meta-data should be published—without this, data users will struggle to understand and analyse the data. We feel that the HESA data model meta-data should be released in the same time frame as the data sets—without this information it is likely that data users who are not familiar with HE may misinterpret the data and reach incorrect conclusions, generating unnecessary queries and work for HE institutions.

Question 5

Do you feel that HESA’s aims on ODI certification are pitched at an appropriate level of ambition?
If not please elaborate on the reasons for your answer

A cautious ‘yes’. There is a risk of spending limited budget on less important aspects in order to achieve certification.

Question 6

Do you agree that Creative Commons Attribution 4.0 is the most appropriate open data licence for HESA to use?
Please explain your answer

Yes, but for some datasets it may be desirable to make them even more open—e.g. not to require attribution at all. This should be considered for any data which is likely to be combined with data from dozens of sources or the most simple (non-statistical) parts of datasets, where attribution could impede use in some cases. For example simple lists of ID, label or linksets.

Question 7

Do you have any advice for HESA in establishing communications channels to open data communities and users?

HESA have very strong ties with HE institution administration teams, but less so with academics, students and developers. Attempts should be made to promote the existence and value of the newly opened data directly to researchers and developers (who may be able to provide surprising new uses), to make it easy for such people to report back new applications to HESA, and always to be asking “why aren’t you using our data”. In our experience, it’s often hard for an open data producer to see where small amounts of investment could remove an impediment of which they are not aware.

Question 8

Do you think the list of proposed actions is appropriate and comprehensive?
If not, are there other elements which should be considered?

In addition, we would encourage HESA to consider reviewing

http://learning-provider.data.ac.uk/ and http://opd.data.ac.uk/ — these are very low cost sites that we have produced for the benefit of the UK HE community. It may be appropriate for HESA to lift some of these ideas or build upon the work.

Question 9

Do you have any other general or specific comments about HESA’s proposed approach to open data?

The area where we have the most concern is the availability of data without any supporting context to aid interpretation. Raw data can result in the wrong assumptions being drawn and comparability between an institution from one year to the next, and between institutions within the sector, can be difficult. For example, the introduction of new accounting standards (FRS 102) for the 2015/16 year onwards has introduced significantly more volatility in a university’s financial results, which makes comparisons between years and with other universities difficult without a supporting narrative that explains the key factors impacting on those figures. This applies to all data sets. We think that HESA needs to ensure that the project addresses how this data can lead users to draw appropriate conclusions.

We are also concerned about how the data may be used commercially—at the moment consultants who market to us do not have access to our data; providing open access may generate a flood of firms writing to us to sell services and systems. A concern has also been raised that if data is going to be open it may impact the completion of the optional elements of returns; not completing these aspects of the returns may become a way to prevent the data becoming widely available.

We encourage HESA to make a clear plan for preservation and long-term access.

The University of Southampton has been very active in exploring the benefits of open data approaches for the UK HE sector. We have a comprehensive open data service covering many aspects of the university infrastructure. http://data.southampton.ac.uk . We founded data.ac.uk as a deliberately generic place to host data and linked data identifiers (URIs) so that they could be unchanged even if the hosting or sponsoring organisation changed, renamed or rebranded.

This policy could engage new and less-experienced software developers and other consumers but it shouldn’t be assumed that they have the same cultural background and training as those consuming HESA data historically. Clear guidance and easily noticed warnings will be required.

One unusual aspect for the Open Data community is that staff at universities may be subject to additional professional restrictions about how they can publish data from HESA, even if it has a CC-BY licence. This will need to be communicated clearly so nobody unwittingly breaks rules.

It is desirable to make datasets from multiple years compatible so that an investment in tools and services based on one year’s data gives value for several years. Changes to the data structure of datasets are inevitable but effort should be made to design dataset structures that can be extended in future years while still being compatible with tools developed to work with earlier years.

HESA is one of the pillars of data in UK HE. As such, HESA should work with the other data services in the sector to align identifiers as much as possible. This provides two important benefits. The first is the ability to join other datasets to the HESA data without expensive mapping exercises. The other is to provide an information infrastructure that other organisations can use for their own datasets.

All field definitions, terms and identity schemes used in the datasets and metadata should be available for other people to view in their entirety and reuse under a licence at least as open as, and compatible with, that of the dataset. Where possible international schemes should be used or mapping provided to reduce the costs of comparing and combining datasets from other providers both domestic and international. Where possible, and relevant, data terms and classes should use established data vocabularies such as the Organization Ontology https://www.w3.org/TR/vocab-org/. A listing of vocabularies in popular use in open data can be found at http://prefix.cc/popular/all.

Much of the data HESA publishes needs to go through strict quality control. This data will probably only be published annually. There is other data which can be “self-certified” by an organisation, for example their current undergraduate admissions page or email address. This information may change out of sync with the annual publication process. We have successfully built a system to harvest such self-certified information and datasets http://opd.data.ac.uk/ and would encourage HESA to consider this as a route for keeping up to date with self-certified data. This route is also lower cost as it doesn’t require a constant formal relationship with every data provider (although HESA will have one anyhow).

A simple, but powerful example is this dataset http://opd.data.ac.uk/dataset/linkingyou, which is built nightly from open data from 32 HE organisations. This data may be valuable to HESA and its users, but can be “self-certified”—unlike statistical information, which requires quality control before publication.

Information on the process for correcting issues and errors should be included in the metadata for a dataset.

We would be interested in working further with HESA on this project.

Posted in Open Data, Uncategorized.

No comments

By Christopher Gutteridge – July 27, 2016

Open Data Internship: Open Data Pipelines

I’ve spent this week reorganising the folders in My Documents. “But Callum!” you might say, “Isn’t that a complete waste of a week?”. Perhaps for some. In reality, I’ve been working towards creating an Open Data Pipeline.

Open Data Pipeline is a term I just created, and I think it refers to something like this:

In this post, I’m going to outline the pipeline I’ve created so far and the lessons I’ve learned in making it.

The pipeline so far

So far, the pipeline has four key stages:

Gathering – This is gathering the raw data using a system. This could just be pen and paper. In my case, I’m using a homegrown tool called OpenGather. It’s a web application designed for gathering categorised open data. You can input data, record GPS locations and send the data for remote storage. This database then exports a CSV file.

Storage – This is the long-term storage for data. Currently, I’m using an Excel (or Calc) Workbook for text and numeric data. It has a sheet each for data gathered by the tool and for data I’ve had to enter by hand. A folder of images is kept for each category. For example, “Buildings” or “Portals”. Long term, the folder hierachy and exact data format still need work.

Processing – This is taking the stored data and converting it into a format ready for publishing. To process the output from OpenGather, I use another homegrown tool. Yet unnamed, it reformats the data as a CSV file, marking any missing data for completion. Optionally, it attempts to use timestamps to match up data with the images taken using the camera.

Publishing – This is the act of making the data available to the public. To do this, I hand a USB stick with the data on to Ash. Occasionally I have to copy CSV data to a Google Document. The rest of the open data service takes over from here!

One of the Excel sheets used for long term storage

With that said, here are some of the things to do and avoid when building a pipeline:

Things to do

Use an appropriate set of input fields for each category – I originally had the fields “Timestamp, Tag, Category, Latitude, Longitude, Accuracy” for everything. I found that for some objects, such as Images, I could throw away the geo data. For others, I needed extra data I later had to enter by hand.
Be consistent in your data gathering process – For example, knowing that an image is always taken after the data is entered is extremely useful. It can be used to later infer information you’ve forgotten, lost or never had.
Keep a backup of your data – You never know when your tools will delete or corrupt an important file. Excel did this to me more than once!
Be thorough in gathering your data – Gathering too much data and throwing it away is far easier than needing it and not having it.
Challenge your assumptions and provide for corner case – I guarantee making assumptions about the properties of a data type will come back to bite you.
Get some data up – Even if it’s just one entry, it’s a great feeling getting it hosted for the world to see.
Process the data as a single, large batch – Removing the need to repeatedly process different chunks of data will save time in the long run.

Things to avoid

Taking photos in many formats – Some cameras take both RAW and JPG. This makes storing the files and matching images to data entries that much harder. Use a single image format and convert it as you need it.
Directly using Geolocation data – For most types of object, I’ve found GPS accuracy to be too low (6m radius at best). I used a clickable map to get accurate data, using the GPS to roughly centre it. If nothing else, precise data adds a level of professionalism.
Using Excel for CSV files – If you do, format it all as “text”. Otherwise, Excel is fond of re-formatting your data to be less accurate when you save it back to CSV.

So where am I going from here?

My upcoming ideas for the OpenGather tool involve:

Using different input fields for each data type. This should make processing the data more accurate.
Provide the option to submit data to iSolutions via Serviceline. This should allow each contribution from students to be reviewed.

In the longer term, I’m looking to:

Start work on an Open Data link validator. This tool will detect broken URIs and URLs, flagging them for correction.
Start building maps of Union facilities ready for use during bunfight.

Posted in Geo, Open Data, Open Source.

Tagged with advice, Data, Data Storage, Gathering Data, Not to do, Open Data Pipeline, OpenGather, Pipeline, Processing, Publishing, Tips, TODO.

No comments

By Callum Spawforth – July 26, 2016

Minecraft Archaeology day.

This weekend I helped out at the University of Southampton Archaeology department family day. There were lots of hands on activities such as reconstructing a horse skeleton and melting copper in a firepit. I was, of course, up in the (air conditioned) computer room playing Minecraft.

More accurately, I was running a row of computers with a variety of Archaelogy related Minecraft maps.

CoDh5KxW8AAMD4Z

All of the maps used for the event can be freely downloaded online:

Dig Site

Not a very professional excavation.

My far the most engaging thing was something I created at the last minute. I used a nice model of a Roman Villa and replaced all the “air” blocks with dirt up to a height just above the roof. I then made the top look like a normal minecraft field with flowers and trees but with a tell-tale ridge in the dirt showing the top of the building.

The player then gets a box of spades to dig out the site. It’s very naive but in Minecraft, creations are almost always there on the surface. To have to uncover one interactively was quite a novel idea and it really engaged the younger visitors. It made my day when the first child to play it seemed to get a real thrill to discover the edge of something just under the field.

Portus

We also provided a fairly out-of-date model of the Roman port of Portus for the visitors to explore. This was much less engaging as there wasn’t enough interaction. It really needs a bit more to pull you into this model. Maybe a printed worksheet to put into context the interesting bits you are exploring. Something that makes it more interactive rather than just a thing to walk around.

Contemporary LIDAR models

I’ve done a lot of work creating a tool to combine LIDAR and open streetmap into Minecraft maps. These are good at showing some of the techniques used in Archaeology but showing modern cities like London & Southampton wasn’t really in the spirit so I created two more appropriate maps. Stonehenge & Avebury.

Avebury from open data sources.

The problem with Stonehenge is it’s pretty boring landscape for the most part, but the LIDAR does show some of the nearby earthworks quite well. The child who interacted the most was a girl who switched to creative mode and had fun “improving” the world heritage site.

The Avebury model is a bit more interesting but it only really clicked with the parents, not the children. I got a nice result by running the model through Chunky, the 3D renderer for Minecraft.

Further work

Some possible ideas came to me as a result of this event.

Create a Minecraft mod that adds interesting archaeolical features. There’s already some that add “ruins” but this would be intended to leave things mostly buried so that they could be sometimes found by accident when digging.
Make a Minecraft map that can be used by a school as part of teaching archaeology. The map would seem like a vanilla Minecraft world but would have interesting and reasonably accurate things to excavate. We could simulate “ground penetrating radar” by making a website where you submit your world cooridinates and it gives you back a fuzzy picture. I think running this as an open server would be a mistake as only one person gets each ‘discovery’. Running a server for a small group makes more sense.
Look into using TerraFirmaCraft in outreach. This is a mod with far more realistic crafting. Ores are found in veins and small amounts can be collected from the ground or in streams. Tools are made by chipping stones. Fire pits and so forth. The problem is that it’s quite hard for younger children and I suspect that anything that requires leaving “vanilla” Minecraft will put-off most teachers who don’t have the slack to learn to install mods for a one-off thing.
Write a tool which takes a Minecraft world and ages it.
1. Remove plant life
2. Repeat a few times
  1. Randomly collapse unsupported structures
  2. Add a (random?) layer of dirt or sand, but make it “fall” off edges so that large things stick out of mounts, small things are in mounts.
3. Re-add grass and trees to the new top layer.

Overall

I really enjoyed my day as an honary archaeologist. Compared to my experiences in computer science events, they were less well organised but with significantly better taste in the beer provided for the wind down afterwards. When it comes to pitching in and love of the subject both archaeology and computer science are about equal, but if I ever have to choose who makes the camp fire, it’s going to be the archaeologist.

Posted in Minecraft, Outreach.

No comments

By Christopher Gutteridge – July 25, 2016

Week 3 – Composing the Merge

Demonstrated Skills:
– Composer
– Documentation Writing (Trail and Error…)
– GitHub

I Think that Monday morning will become the regular slot for the weekly blog post. Once again, Friday afternoon consisted of testing, not blogging.

So week 3 was another week of choices, and I hope the last for a while. Same procedure as before, look at the Kanban board, pick a task and try to fix/implement it. There was a major change which I assisted with; cake 2.4, which we were using before, was ancient. Oli Bills, a fellow temporary colleague, installed composer on choices. Composer is a dependency manager for PHP. It makes keeping a system up to date easier and should mitigate the issues PHPUnit had with incompatibility. At least that is idea. Oli did most of the legwork behind composer, I acted as his test subject; let me elaborate. He needed to write some instructions so the rest of the team could install composer on their machines. He told me to install it on my machine using his initial instructions, he amended them every time something blew up in my face. After a long afternoon we got composer to work on my machine and Oli’s instructions were much more concise. This week, as in the week starting today not the week I am currently writing about, is Oli’s last week in the office. So the instructions are quite important.

I spent most of Tuesday making sure that the rest of the source files were in the right directories. As well as installing composer Oli reorganised the project files so that the cake framework was in a ‘src’ directory. On the surface, the reorganisation appeared to work, but when I started a new task this was not the case. I noticed that none of the plugins worked anymore. There was a single line in the cake bootstrap which still referred to the old plugins directory. This was fun to find. I added the ‘_APPDIR_’ constant so that any future directory changes should just work.

After fixing plugins, I modified the choice priority plugin. When a user deletes an existing choice, the priorities remained in a separate table. This broke random allocation for the user’s choices. I updated the priorities controller to include a function which deleted the choice priority. It also had to reorganise the priorities so they were sequential again. I ensured that the choice controller invoked the new function when a user deleted a choice. This function appeared to work, but appearing to work is not enough. Thus I spent most of Thursday writing PHPUnit tests. I’m becoming quite acquainted to PHPUnit.

Friday was merge day. There were about five branches which needed merging into the composer branch. But by then end of the day all branches converged into the up-to-date master branch, how satisfying.

I am scheduled to be spending week four doing something called “DocPot Migration”. I’m not too sure what a “DocPot” is and why it would need to fly south for winter. Perhaps I should investigate.

Posted in Apache, Best Practice, Database, Javascript, PHP, Programming, testing.

No comments

By Joseph Sturgeon – July 25, 2016

Open Data Internship: The Winchester Expedition

Friday saw the first real world test of the data gathering tool. Surprisingly, it didn’t crash, erase any of the gathered data or otherwise combust spectacularly. I’m almost dissapointed.

Instead, Chris and I spent a few hours wandering campus in the glorious sunshine. We took photos of buildings and doors, using the tool to log their locations. In theory, the timestamps produced by the tool will help match tags to photos. We’ll have to see about that one, check back next week!

Tomorrow marks the start of the serious data gathering. I’m mounting a one-man expedition to the wilds of Winchester, to pay a visit to the School of Arts. Whilst there, I’m hoping to discover:

The names they call their buildings. Supposedly, they’re different to the names on record. Mysterious!
The locations of any water fountains. A rare catch, these need documenting for further study.
Lecture rooms available, and which buildings they’re in. After all, nobody knows what’s really inside the Winchester School of Arts buildings.
Showers available for use by cyclists. There’s a surprising number of these hidden away in the ECS buildings in Highfield.

This is in addition to the usual building and doors data.

The plan is to start on one side of the campus and do a first pass through the building interiors. This is because my guide only has a limited amount of time. I need his valuable door access to delve deep into the bowels of WSA.

The outside of buildings is next, recording portal data and building images. I’ll follow this up with an excursion to Erasmus Park, the local halls.

Who knows, maybe I’ll grab an ice cream along the way too?

Warning: Technical description ahead.

This is the full process I’m using to gather data, as of this post:

Preparation

Clear any testing data from the data gathering tool database.
Apply geo locations of items to be investigated to map.
- – Generate a KML/CSV/GEOJSON file of the items.
- – Host the items in a publically accesible location. I prefer Git, Google Drive or an online Paste tool like Pastey also work.
- – Using a mapping tool such as umap (http://umap.openstreetmap.fr/en/), add a layer, then either import the data from the remote, or in umap, add as a remote data source.
- (When using Umap, tick “Use Proxy” to ensure the icons load correctly)
Screenshot and print off map
Print off open data consent forms
Make sure your phone and camera have adequate amounts of battery (ideally full).

Overall Process

Pick a location on the map and decide which buildings to gather data from.
For each building, gather the data needed, using the instructions below.

Taking a Building Image

Take a picture of building, attempting to get as much of the building in frame as possible.
- A good photo will make the building easily identifiable as you walk past it.
Using the open data tool, select the category “Image”, write a tag in the tool.
- Wait for the GPS to update to the current location.
- If the accuracy is low (say, less precise [higher] than 6m), click/touch the map to mark a more accurate position.

The geo-location data isn’t necessary for buildings that are already marked on the map, but it helps automatically match images to names later on.

Gathering Portal Data

Walk around building, try to identify all entrances that aren’t fire escapes (which we aren’t permitted to gather as of 14/07/2016).
For each entrance, take a picture identifying it. A good photo will make the entrance easily identifiable as you walk past.
Follow the procedure for getting consent, if any people are in your photo (an ideal photo has no people).
Use the data gathering tool to mark the location of the entrance on the map. Try to get as close as possible to where you think the entrance is on the map.
Select the “Portal” category in the tool.
Add a tag, starting with the building ID, followed by the type of entrance. For example, “32 Main” or “32 Main North” or “32 Rear”.
Submit the data.

Requesting Consent
Attempt to get nobody in the shot, unless you’re taking pictures of a reception or Point of Service stand, where behind-the-counter staff can make it look friendlier.

If people need to be in the shot:

Verbally ask permission before taking the picture, explaining that you represent the Open Data Service, and what that is. Ensure they’re okay signing a consent form.
Take the photo.
Ask them to fill in an entry on the consent form.

Cross buildings off as you go, to mark them as completed.

Posted in Community, Open Data.

Tagged with Data, Data Gathering, Data gathering process, Ice Cream, OpenGather, Process, Technical, Winchester, Winchester School of Arts.

No comments

By Callum Spawforth – July 18, 2016

Week 2 – The Birth of a new Controller and Unit Testing

Demonstrated Skills:
– Testing With PHPUnit
– Controller Generation with Cake Bake (Cake PHP)
– Controllers the Old fashioned way
– Data Validation
– Caffeine Stimulation

So it’s Friday of week two and it is time for another blog post. Okay, it’s not actually Friday, its Monday morning. I spent Friday writing PHPUnit tests. For the sake of consistency, I will explain Friday after I’ve explained the other four days.

So on Monday I began by drinking some coffee. Then returned to the Kanban board for more tasks. Unfortunately for me, only difficult tasks remained. I completed most of the easy ones. After a second early morning coffee I was ready to commit to a task. In week 1 when working on the failed user feedback, I discovered the lookup system. The idea is that when you type in usernames in the list, you can get auto complete to speed things up. The previous implementation did work; but it was not an adequate solution. The issue is that the lookup functionality was not in a controller of its own.

Lookup was implemented in three separate source files with each bolted on where necessary. It was sloppy. I actually had to make a new controller which communicated with the alpha table. The alpha table is a nightly update of all the users in the university. It’s faster to communicate with this table as it is non-volatile. This makes it ideal for autocomplete. The three files: ‘lookup.php’, which looked up users from an identifier; ‘lookup2.php’, which looked up users from a course code and ‘lookup3.php’, which looked up courses from an identifier formed the functions of the new controller.

After a lot of headache, and help, the lookup controller worked. The new controller needed testing, this is where PHPUnit comes in. PHPUnit does what it says on the tin, makes unit tests for PHP. This was where the real fun began. Getting PHPUnit to behave on my workstation was a real son of a glitch (see what I did there?). Bit of advice, make sure your version of the testing software is compatible with the pre-existing tests already on the system.

Tuesday and Wednesday occurred along the same sort of vein. Coffee -> Confusion -> Coffee -> Lunch -> Testing -> Coffee -> Coffee -> More Testing -> Coffee -> Home. This brings me onto Thursday. I started my first high priority task. By using a combination of browser history and luck, users could pass the selection page with an invalid number of choices. As users added to their choices, the table containing their choices updated on the fly. This meant an invalid number of choices could exist in the database. My initial thoughts were to just make a temporary table to store choices as a user makes them, then validate the whole basket at once. On consultation with Pat he recommended I change how stage validation worked instead. We pair programmed a solution together so that when a user goes onto a new stage, the server validates all previous stages and redirects the user to any invalid stage. The state of the existing stages implementation shocked us so we decided to add “fixing the mess” to the Kanban board. We implemented a temporary working solution using a refactored version of the existing code. Then we made a joint decision to conclude our evening at the pub.

I spent Friday working on testing the event actions. They can send email notifications to users upon completing their choices for example. It started off easy enough but then choices threw a screwball my way when I was only expecting a curveball. The version of PHPUnit choices uses, cannot use the new function to mock static methods. I spent a lot of time trying to make a full unit test for email notifications. It turned out that a semi-integrated test was actually the way to go for now. My tests found a few logical errors in the email code. The errors would otherwise have gone unnoticed. There was a lot of functionality hidden in separate source files. The code was not written by me which made it more challenging to test.

All in all, last week was quite a hectic week, hence me writing the blog on Monday morning. There’s not much work I can do at the minute as half of our team are not here currently. This week may be less frantic, this may just wishful thinking though.

Posted in Apache, Database, HTTP, Javascript, PHP, Programming, testing.

No comments

By Joseph Sturgeon – July 18, 2016

Week 1 – PHP Overload

I will try to Include a list of Skill that each post is related to in the hopes that someone can read my posts without actually having to read them… I tend to ramble on a bit

Demonstrated Skills:
– CakePHP
– Project Management

Having previously worked for the University I thought I had a general idea of what my first week would be like. However, this is iSolutions, things are a bit different here. They had me working on the choices system which is implemented with cakePHP.

A frustrating morning installing apache2 on a VM was immediately followed by a mystifying afternoon trying to sacrifice the correct sequence of farmyard animals required by the cakePHP gods. They returned the favour by granting me with the power to add a small line of text underneath an image upload box stating that the maximum size of an image upload is 2MB. This triumph, although small, is still a worthwhile success. It basically meant that I could move a piece of paper from the backlog part of the Kanban board to the resolved part. Not bad for the first day – did I mention that I’ve never had any previous experience with PHP before this internship?

Choices Kanban board.

PHP is easy enough to pick up, I think that my previous experience with C/C++, HTML/CSS and JavaScript helped a lot, but nothing has prepared me for choices. Choices is the system for the university in which students are allocated supervisors and vice versa. It’s based on cakePHP, a framework which ““makes building both small and complex systems simpler, easier and, of course, tastier”” – official cakePHP homepage ‘cakephp.org’ (I used double quotes: one set for the quote itself, and one for the sarcastic delivery. Let me know if there is a special piece of punctuation for that exact purpose).

I soon moved onto the next piece of paper. This was more complex than the first. A problem was bought to our attention; a user could enter a long list of choosers to the list. A flash message would appear stating that ‘x’ choosers were added but with no feedback about which choosers were not added. I implemented a method to retrieve the error associated and list the un-added users. This forced me to delve deeper into the inner workings of cakePHP. With each new piece of paper passed my way I slowly (and painfully) understood cakePHP slightly more.

I have learnt a lot in this first week of my internship and I truly enjoy being part of the iSolutions team. Despite the headache of cakePHP, the work is engaging. The satisfaction of moving a piece of paper from one side of the board to the other is akin to completing a challenging boss fight in a video game. I look forward to when I can start my own project and also the obligatory round of mini-golf with the iSolutions crew.

Posted in Apache, Database, HTTP, Javascript, PHP, Programming.

No comments

By Joseph Sturgeon – July 14, 2016

Open Data Internship: The First Week

Last friday marked the end of the first week of my Open Data Internship. It’s the first time the University of Southampton has had an open data intern. Or, to my knowledge, any University for that matter. This puts me in the interesting position of figuring out what it is an open data intern does.

A Little Backstory

I’m Callum, a graduate going-on PHD at the University of Southampton. I started as a developer about 5 years ago, around 2011, scripting for a game called Garrysmod. I’d never touched a programming language before. For the most part, I learned by studying the works of other people.

I didn’t realise it then, but that was my first introduction to open source. Since then, I’ve come to understand how important open software is to driving innovation. It aids learning and provides a platform for other projects to grow on. All that said, the most important thing is does is bring the community together. It brings together total strangers to work towards a common goal. I think that’s vital in an increasingly insular society. In many ways, I believe open data can do the same thing.

With all that covered, what has my first week been like? Well, the open data service here has been running for around 6 years. Over that time, it’s amassed quite a large amount of data. I feel like a new-age explorer, navigating twisting jungles of information. I’ve come across some spectacular datasets I didn’t know existed. My favorite moment so far was discovering building 185. In the middle of the indian ocean. In other words, it’s pretty fun.

What have I been up to?

The team has had quite a few ideas on the back-burner for while, generally things that they haven’t had time to do. These ranged from straightforward data gathering to projects in their own right.

One of the more straightforward tasks was to fill in some of the bigger holes in the data about the University. One of which was in the building data.

The first phase was to identify which buildings were missing data. The tool to do this, without a shadow of a doubt, was the University’s SPARQL endpoint. SPARQL is a language that allows querying of the open data service, much like how databases use SQL. In fact, the syntax is similar. I spent the first part of my week learning SPARQL and creating queries to hunt for the missing data.

Next up, I imported the locations of the buildings missing data into a mapping tool. At that point, I realised the magnitude of the task.

Unfortunately, the department was unable to sanction a photo-gathering trip to Malaysia. The sad result being I booked my holiday in Cornwall, instead.

The difficulty of the task assessed, I developed a cunning plan to gather the data. Naturally, being a Computer Scientist, I dislike the notion of writing on paper. Even more so, I dislike the idea of then copying that data to a database by hand. I thus proposed creating a tool that would allow quick gathering and labelling of data.

This aligned well with the aims of the team. They had been looking for the time to create a tool to crowdsource data from students on campus.

Thus, I set out on a quest to kill two rats with one high-precision stone.

Update: The SPARQL Code

PREFIX soton: <http://id.southampton.ac.uk/ns/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?building ?label ?lat ?long WHERE {
    ?building a soton:UoSBuilding .
    OPTIONAL {
      ?image a foaf:Image ;
      foaf:depicts ?building
    }
    OPTIONAL {
      ?building rdfs:label ?label 
    }
    OPTIONAL {
      ?building geo:lat ?lat .
      ?building geo:long ?long
    }
    FILTER (!BOUND(?image))
}

Posted in Open Source, Team.

Tagged with Data.

No comments

By Callum Spawforth – July 13, 2016

3D Model Escrow

This is a quick note to record an idea I had in response to a conversation over lunch at “Connected Data London“.

The question was around how to enable 3D printing for the construction industry, or more specifically making 3D replacement parts, but this idea works for a lot more usecases.

There are a couple of problems here. First of all, taking a 3D scan of something and using it to make a copy is probably copyright infringement (I am not a lawyer). It also may infringe patents. The other problem is that for more complex shapes, there is no incentive for the copyright owner to help you — no business will help you make a cheaper copy of their own product, with the possible exception of celebrity chefs.

So here’s a possible future where everybody wins.

One or more trusted organisations set up a data escrow service. This service will hold the 3D print information required to make a copy of the part, with a set of conditions under which it will be made public. It also contain other design documents, manufacturing instructions etc.

A manufacturing company pays a small fee to the escrow service to hold the data for each product and to host the data on the web in the event that certain conditions are met. My first thought is that the data should become public if the part is no longer available on the market and isn’t going to be back on the market in short order. These rule applies even if the rights to the part are sold on or the campany is purchased etc. because the copyright holder has given the escrow service the data under a license which allows them to share it under a public-domain-like-license if the product is no longer available.

This shouldn’t cost very much and gives the first mover a unique selling point. It has the advantage that the company can guarantee that replacement parts will be possible either from them or from anybody given the specifications should they go bust or stop that product line.

As more companies offer this service, it can be added as a preference or even a requirement on deciding suppliers. While the initial question was about the construction industry it could work for any manufacturing industry.

I like this idea because everybody wins — it doesn’t require anybody to work against their own interests.

Never appeal to a man’s “better nature.” He may not have one. Invoking his “self—interest” gives you more leverage.

– Robert Heinlein, “The Notebooks of Lazarus Long”

Posted in Repositories.

No comments

By Christopher Gutteridge – July 12, 2016

« Previous Next »

Week 4 – Choices goes live once more!

HESA Open Data Consultation: University of Southampton response

Open Data Internship: Open Data Pipelines