Categories:

# Image focal point hint

This is a quick idea that may already exist somewhere. Let me know if “there’s a vocab for that”.

We have a large set of images of our university buildings. There’s a variety of sizes & aspect ratios. Sometimes there’s more than one image of a building.

To render these in the university templates we need to trim them to certain sizes and aspect ratios. What would be useful is if we could store a “hint” of where the most important content is in the picture. For example take this image:

Clearly the most important part of this picture is the relationship between the researcher and the tool. I would say about here:

Which is about 36% from the left, and 50% from the bottom. What I’m wondering is if there is (or should be) some standard terms for indicating the focal point of this picture eg.

<http://example.com/pictures/research.jpg> ns:focalPointX "0.36"^xsd:float .
<http://example.com/pictures/research.jpg> ns:focalPointY "0.5"^xsd:float .

That way our HTML page generation can get cropped images but instead of a default focus (usually the centre point) it could know how to crop for this picture. You can see the results of making a portrait crop of this image using a focal point hint and without:

Without hint:

Using hint:

I think this makes a massive difference and seems like a really useful thing to optionally store with our images and publish as part of the metadata for them in the open data service.

Other useful metatags for images linked in open data would be:

• An indication that this image should be treated as the illustrative image of something.
• For logos; if the image is more suited for a light or dark background (ideally letting the renderer pick the more suitable variation
• For background style images, a hint if the image is a light or dark background (so we know how to place contrasting text over the top.)

Does this exist already somewhere out in ontology land? Are there any other useful things we should add?

Posted in Best Practice, RDF.

# Open Data Internship – How to Gather Data: Mark III

This is an updated set of instructions on gathering Open Data in Southampton. The previous versions of these instructions have been deprecated.

This is written assuming you’re using the Open Gather tool. It covers the sort of data we’re looking for and how to gather it.

Objects we’re looking for:

• Drinking water dispensers (Fountains, coolers, etc)
• Gender neutral toilets (Toilets a person of any gender could use, e.g most disabled toilets) (These can be found in the “Room” category in OpenGather)
• Portals into buildings and between them (E.g Doors)
• Public showers a cyclist could use
• Reception Desks (And any other Points of Service)
• Images of University buildings (that we don’t already have)

It’s a bit like a game of eye-spy (or Pokemon Go). The aim is to hunt round, find each of the items above and record info about it. For some of them (Portals, Building Images) we know the data we’re missing. For the others, we have no idea, so a little urban exploration may be required!
For all of these, we’re interested in where they are. This involves a building number, floor and geo-location for most of the above. Open Gather has a clickable map to allow for precise geo-location. Otherwise, http://lemur.ecs.soton.ac.uk/~cjg/clickymap/ is available.

### How to Gather Data

Preparation

• If you’re hosting your own copy of OpenGather, make sure to clear any testing data out first!
• If you’re recording portals and building images, it can be helpful to plot the things you’re looking for on a map. If you’ve retrieved a list of items using SPARQL, you can use the following to plot the items on the map.
• Run a SPARQL query to generate the list of missing items, complete with latitude and longitude (examples to come soon!)
• Generate a KML/CSV/GEOJSON file from the data produced by the SPARQL endpoint.
• Host the KML/CSV?/GEOJSON file in a publically accesible location. I prefer Git, but Google Drive or an online pasting tool like Pastey also works.
• Using umap (http://umap.openstreetmap.fr/en/) as a mapping tool, add a layer, then either import the data from the remote, or in umap, add as a remote data source.
• (When using Umap, tick “Use Proxy” to ensure the icons load correctly)
• Print off a copy of the map. UMap doesn’t work very well in mobile browsers.
• Print off Univeristy of Southampton photograph consent forms. These are needed to use photos with people’s faces in.
• Make sure your phone and camera have adequate amounts of battery (ideally full).
• If your camera and phone are different, check the two clocks show the exact same time. This makes matching images and data far easier.

Overall Process

1. Pick a location on the map and decide which buildings to gather data from.
2. For each building, gather the data needed, using the instructions below.

General Method

This is the quick-and-easy summary of how to record data. More specific information is available below.

1. Open the OpenGather tool in your browser.
2. Select the type of object to record.
3. Fill in the appropriate fields.
4. Submit the data.
5. Take a picture of the object.

Taking a Building Image

1. Using the open data tool, select the category “Building Image”.
• Fill in the “Building Number” field.
• Wait for the GPS to update to the current location.
• If the accuracy is low (say, less precise [higher] than 6m), click/touch the map to mark a more accurate position.
2. Take a picture of building, attempting to get as much of the building in frame as possible.
• A good photo will make the building easily identifiable as you walk past it.
3. Send the image(s) to opendata [at] soton.ac.uk.

The geo-location data isn’t necessary for buildings that are already marked on the map, but it helps automatically match images to names later on.

Gathering Portal Data

Walk around the building looking for entrances. Try to identify all entrances that aren’t fire escapes (which we aren’t permitted to gather as of 14/07/2016).

For each entrance that you find:

1. Select the category “Building Entrance” in the OpenGather tool.
2. Record the geo-location of the entrance. This can be done by tapping on the map in the OpenGather tool. Do not rely on the GPS being accurate.
3. Fill in the fields using the tool.
• “Building Number” – Number of the building the entrance is attached to
• “Entrance Label” – An arbitrary letter to identify the entrance. Typically starting from ‘A’.
• “Description” – A brief description of the entrance, such as “Staff”, “Main”, “Side”, “North-east”.
• “Access Method” – Is a card or key needed to get in?
• “Opening Method” – How do you physically open the door? This is used to determine disabled accessibility.
4. Submit the data
5. Take a picture identifying the entrance. This doesn’t need to be recorded seperately in the tool. A good photo will make the entrance easily identifiable as you walk past.
6. Follow the procedure for getting consent, if any people are in your photo (an ideal photo has no people).

Recording a Drinking Water Source

Should one of these rare and majestic creatures be spotted:

1. Throw a greatball at it
2. Select the category “Drinking Water Source” in the OpenGather tool
3. Fill in the fields using the tool.
• “Building Number” – Number of the building the water source is in
• “Floor” – The floor the water source is on, level 1 is usually the ground floor.
4. Record geo-location using the map. Zoom in on the building you’re currently in, and try to mark your position in the building by clicking on the map.
5. Submit the data
6. Take a picture of the water source. Ideally, this will make it clear where the water source is located in that part of the building.
7. Follow the procedure for getting consent, if any people are in your photo (an ideal photo has no people).

Recording the location of Public Showers

1. Select the category “Public Showers” in the OpenGather tool
2. Fill in the fields using the tool.
• “Building Number” – Number of the building the water source is in
• “Floor” – The floor the water source is on, level 1 is usually the ground floor.
• “Room Number” – Room number of the shower, if it has one.
3. Record geo-location using the map. Zoom in on the building you’re currently in, and try to mark your position in the building by clicking on the map.
4. Submit the data

Reception Desk (Point of Service)

1. Select the category “Point of Service” in the OpenGather tool
2. Fill in the fields using the tool
• “Description” – What the Point of Service is. For example, “Library Reception Desk” or “Student Services Information Desk”
• “Building Number” – Number of the building the point of service is in (assuming it isn’t a standalone service)
• “Phone” – A phone number for contacting that point of service, if available.
• “Email” – An email for contracting that point of service, if available.
• “Opening Hours: Mon…etc” – Times when the service is usable. For example, a reception desk is usable when it’s manned. E.g “9:00-18:00”, “24 hours”, “7am-7pm, closed 12pm-1pm”
3. Record geo-location using the map. Zoom in on the building you’re currently in, and try to mark your position in the building by clicking on the map.
4. Submit the Data
5. Take a picture of the desk. It’s nice to have a friendly receptionist in the photo if possible, but don’t force anyone!
6. If anyone (including any member of staff) is in the picture, follow the procedure for gaining consent.

Requesting Consent
Attempt to get nobody in the shot, unless you’re taking pictures of a reception or Point of Service stand.

For a Point of Service, staff can improve the photo by making it look friendlier.

The consent form required is available here.

If people need to be in the shot:

1. Verbally ask permission before taking the picture, explaining that you represent the Open Data Service, and what that is. Ensure they’re okay signing a consent form.
2. Take the photo.
3. Ask them to fill in an entry on the consent form.

Cross buildings off as you go, to mark them as completed.

Posted in Data, Open Data, Open Source, Training, Uncategorized.

# An Intern’s WAISfest: Introduction to the BBC Microbit

Over the Wednesday, Thursday and Friday of last week, the WAIS (Web and Internet Science) group held its annual WAISfest. This event is a chance for people in the group to explore side-projects and ideas they haven’t had time to do. The aim of the event is to get people thinking about possible areas of research. To stimulate some extra creativity, so to speak.

Luckily, I got to take part.

Wednesday morning began with the ideas unconference. The aim of this was to source ideas, loosely grouping people together to work on them. Ideas ranged from virtual reality workspaces to ways of teaching programming in schools using Microbits.

It was this latter project I hopped aboard, swayed by their stash of robotic buggies and a mountain of BBC Microbits.

# Investigating the BBC Microbit

At the start of the project, I had no idea what exactly the Microbit was, let alone how to use it. We spent the first day of the WAISfest digging up information on how to use it. Hopefully, by posting this here, it’ll make someone’s life a little easier than ours was!
For those who don’t know, the Microbit is a low-cost embedded board given free to all year 7 students in the UK. It has an accelerometer, magnetometer, radio, GPIO pins and USB on-the-go. It’s able to be programmed in a variety of languages, including Python, Javascript, Microsoft Block Editor, Microsoft Touch Develop and C++.

Behind the scenes, all of these languages use the same core C++ library, published by Lancaster University. This library provides a simplified means to interact with the hardware.

The source code for the C++ library, MicroPython runtime and editor and Touch Develop are available at https://www.microbit.co.uk/open_source.

# Getting Started with the Microbit

How to get started with the Microbit depends, in my opinion, on your level of experience:

• If you’re completely new to programming, try playing around with one of the online editors. They’re well documented, most coming with tutorials. Uploading your program is as simple as copying the file onto the board as if it was a USB drive.
• If you’re a bit more experienced, or want to do a medium-sized project, try using Mu and MicroPython. Writing longer programs (up to a few hundred lines) is pretty straightforward in MicroPython. Mu makes putting the file on the board significantly quicker.
• If you’re an experienced developer looking to do something complex, get the runtime and write using C++. There’s issues with using MicroPython for large projects I’ll get into in the next section.

We opted for the second approach, as we only had 3 days to produce a result. It’s also essential that kids and teachers could understand the code we were writing.

# MicroPython for Microbit

“MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library” – MicroPython Website

MicroPython is one of the languages available to program the Microbit. In my opinion, it exists as a middle ground between Javascript/Touch develop and C++. It’s useful for programs a few hundred lines in length, but struggles with anything larger.

• Derived from Python, a widely used programming language. Skills easily transferred to other platforms.
• Quick to upload to the Microbit. Deploying a new script takes seconds.
• Easier to debug than many other embedded languages, as error messages scroll across the Microbit’s LEDs.
• Editable, buildable and uploadable offline. There’s no need to use the BBC’s online editors.
• Avoids needing to understand memory management, as is the case with C and C++.
• Radio library provides a simple, minimalist interface to radio based networking.

• With scripts larger than a few hundred lines, the Microbit runs out of memory. This essentially imposes a limit on how long your program can be.
• Bluetooth isn’t available due to the memory usage of the Bluetooth stack.
• Unable to easily split code across files. Importing requires extra files to be flashed onto the Microbit each time a script is uploaded.
• Radio library only available in the Mu editor and not the BBC’s online editor.

# Getting Started with MicroPython

There’s two main ways of getting started with MicroPython:

## Using the online code editor.

The online editor is provided by the BBC. It provides the ability to write, edit and save code online and compile it for the Microbit. Uploading the code is as simple as clicking “Download” and copying the file to the Microbit. No installation of any software is needed, the editor runs in any modern browser.

While this editor is easy to use and fast to get started with, it has some downsides. To save scripts for later, you need to make an account and sign in. The editor also requires you to have a constant internet connection.

Another downside is, as of the date this was posted, it doesn’t support the Microbit ‘radio’ library, which allows Microbits to communicate with each other.

## Using Mu

Mu is an offline, open source editor made by Nicholas Tollervey. You can write, edit and save code with it similarly to the online editor. However, files are saved locally, giving more flexibility about when you work and how you store the scripts.

The Mu editor also has several other features the online editor does not:

• Upload scripts to the Microbit by pressing a button
• Browse files stored in the Microbit’s flash memory
• Error messages output to the screen via “REPL”
• Syntax checking (identifying any errors)
• Tabs for editing multiple scripts

The downside is that the Mu editor needs downloading and running, something that may not be easy on many school systems. On Windows, the Mbed serial driver is also needed for file browsing and REPL functionality.

Unless you’re just having a quick play, I recommend Mu over the online editor. The extra functionality (especially REPL) is invaluable, as is the ability for more experienced developers to version control their scripts.

As for writing your first script… there’s plenty of tutorials on Python, and the syntax here is identical. Access to the Microbit’s hardware is provided by the Microbit library, with documentation available here.

# Aside: Fixing the MicroPython Memory Error

When a MicroPython for Microbit script grows to a certain size, MemoryErrors start appearing. I’m not sure if the size is file size, or number of function/variable definitions. The error looks something like this:

MemoryError: memory allocation failed, allocating 1584 bytes

If this occurs, the simplest solution is to remove unused lines of code. Shortening variable and function names is an alternative solutions. Using long strings as comments exaggerates the issue, as they aren’t optimised out by the MicroPython runtime. This results in them using annoyingly large amounts of memory.

Another option is to try the online MicroPython editor. When compiled on there, I didn’t get MemoryError issues. I’m not certain why this is, but it seems to work!

# Other Bits of Useful Microbit Info

• The online editors run a different version of MicroPython to Mu and the board firmware. In situations such as the “MemoryError” issue, different results can occur.
• The compass in the Microbit can’t be relied on for consistent bearings. It appears to be affected by any nearby cabling or bits of metal.

Posted in Community, Open Source, Outreach, Programming, python, Tips, Tutorial.

# Introduction

Front end testing of web projects is a useful part of a testing strategy allowing you to test how the full stack works together. Automating this process has significant advantages over manual testing in terms of speed and reliability of testing and when included as part of a CI environment is a real asset for your testing strategy.

While .net based projects will often use Microsoft tools to implement continuous integration of automated front end tests, the open source world also provides a set of (free) tools to do this work.

Getting open source tools to work together is not always as easy as the Microsoft route. This article discusses the tips and tricks we learned when getting Selenium, Jenkins, ASP.NET MVC and SQL Server to integrate and provides a sample solution with web site and tests project to show how these tips might be implemented.

## Sample Code

Example web and tests projects, plus scripts used in our Jenkins setup can be downloaded from GitHub here.

## Technologies

• ASP.NET MVC – Microsoft web framework.
• Selenium – an open source automated testing framework.
• Jenkins – an open source continuous integration environment.
• SQL server Express – a free implementation of the SQL Server database.
• Code first migrations – the mechanism we chose to deploy our database.
• IIS – Microsoft application server.

# The Website

The example website project is a simple MVC 5 site that allows you to list, create, edit, view and delete basic details about countries stored in a SQL Server database. The website uses standard .net forms authentication, Entity Framework to handle data connections and code first migrations to deploy the database.

## Things we have learned

### Separate the connection string and other configuration information.

Placing configuration information such as connection strings in a separate file is useful as it allows different settings on the Continuous Integration (Jenkins) machine. In our case we used the config source attribute on the connections strings element and provided a separate DB.config file – this allows the connection string to be easily changed based on the environment and means that connection strings can be easily excluded from source control.

 <connectionStrings configSource="DB.config" />

If taking this approach it is important to remember to specify that the included config file(s) should be copied to the output directory.

Visual Studio -> Solution Explorer -> Right click file -> Properties -> Copy to Output Directory -> Copy Always

### Use a SQL Express instance rather than localDB.

The LocalDB database does not allow remote connections and so is fine when working in a development environment but does not work well when the website is run from full IIS (which is used for hosting when running integrations tests).

# The Tests Project

The example tests project (SeleniumDemo.IntegrationTests) is generated from the standard UnitTestProject template. We then add the following nuget packages to enable Selenium testing:

Install-package Selenium.Support
Install-package Selenium.WebDriver
Install-package Selenium.WebDriver.ChromeDriver

We also use the FluentAssertions nuget package to improve the readability of our tests and add EntityFramework to the tests project:

Install-package FluentAssertions
Install-package EntityFramework

## Test project structure

The tests project has the following structure:

SeleniumDemo.IntegrationTests

• Countries – contains test classes relating the Countries webpages.
• PageObjects – contains PageObject classes for reading and manipulating the Countries webpages via the Selenium API.
• KnownData – classes which ensure the database is populated with expected data.
• Model – classes used when populating the database with expected data.
• Login – classes relating to accessing and completing the login webpage to ensure the user is logged in to the webpage before other pages are requested.
• App.config – configuration settings related to the test project.
• DB.config – separate file used to store the database connection string.
• IntegrationSpecificationBase.cs – base class providing common functionality for test classes.
• Packages.config – Nuget configuration file which records packages associated with the project.
• PageObjectBase.cs – base class providing common functionality for reading and manipulating webpages via the Selenium API.
• UrlBuilder.cs – class that handles generating URLs for the website being tested.

## Things we have learned

### Ensure known data before each test is run.

It is critical that the database is in a consistent, known state so that you can assert the existence of expected data on the page. To this end create a set of classes that handle generating known data.

In the example test project the KnownData folder contains classes for generating known account information (AccountDataGenerator) and known Countries information (CountriesDataGenerator). Data generating methods on these classes are then called before each login is attempted (see LoginPageBase.PerformLogin) and before each test is run (see CountriesSpecification.TestInitialize). Note: These classes should also clear any data generated as part of the tests. It makes debugging easier if you make calls to clean up data BEFORE tests are run, rather than after they are run as this allows you to manually check the state of the database after a failed test.

### Use the PageObjects pattern.

The Page Object pattern separates the operations for reading/interacting with the page being tested from the tests themselves. This makes the tests more readable, less brittle if the UI markup changes and allows you to move from page to page within your tests. This and a number of other general Selenium best practices are available in this informative article.

The example test project shows an implementation of this pattern. The Countries/PageObjects folder contains a class to expose the data and operations required by the tests on that page. These objects inherit from a standard base class that provides a number of methods to simplify these operations. Where possible these operations return the current page object to allow for a clearer, fluent interface. The tests in CountriesSpecification can then make simple, readable calls on the page objects.

### Use the Login PageObject as an entry point for other pages.

In a system where a login is required, the login page object can be used as a mechanism for accessing other page objects (see example code in CountriesSpecification.cs). This ensures that the user is always logged in before trying to access a page that requires login.

In the example test project the CountriesSpecification class inherits from an IntegrationSpecificationBase class which initializes a LoginPage within the TestInitialize method. The tests can then make calls on the LoginPage to access any page object within the application.

The actual login is performed in the LoginPageBase.PerformLogin method. This is called each time a page object is requested and checks whether the user is logged in, ensures a suitable account exists in the database, then performs the actions on the webpage to log the user in.

### Use Chrome developer tools to identify selectors.

The page objects rely on css, xpath or other selectors to identify DOM elements within the page under test. An easy way of getting a starting point for selectors is to use Chrome developer tools.

Load the page in Chrome -> F12 -> Find the element within the Elements tab -> Right click -> Copy Selector or XPath

It is best to use the output of these as a starting point as they can be verbose and brittle if you change your markup.

Use CSS selectors are preferable to XPath – here is a useful game that provides training on the subtleties of CSS selections.

### Use IIS on your development machine to host the website when running tests.

We experimented with using IIS Express or Casini to host the website while running the automated tests, however both of these proved hard to configure appropriately and unreliable. Eventually we turned to installing IIS on our Windows development machines and this proved to be more successful. A couple of points that relate to this:

• When creating the websites within IIS select ‘All Unassigned’ IP addresses but specify a port. This allows you to host multiple websites on your development machine without needing to configure DNS.
• Use Visual Studio publish profiles. As you will need to redeploy the site to your local IIS instance frequently it is convenient to set up Publish Profiles with configuration transforms to make the publishing process as simple as possible. Publish profiles are available by right clicking the website project in Visual Studio Solution Explorer. If there are permissions problems with publishing your site to the local machine then run Visual Studio as an administrator. It is

### Don’t be afraid to change your markup to simplify testing.

Some selectors may be complex and brittle (i.e. when the markup changes the selector no longer applies). It is OK to add ids or classes to your HTML markup to make Selenium selection easier.

### Don’t hard code URLs within tests.

URLs can be hard to manage as site structure and environments change. Rather than hard code URLs within tests provide a class that manages URLs and make this available to your test classes.

In the example test project the URLBuilder class provides a mechanism for obtaining URLs. It gets the base URL and port of the project under test from a config file, and is passed around the test project wherever it is required e.g. Login/CountriesLoginPage.cs

# The Jenkins Configuration

At a high level the key steps we include in our Jenkins project are as follows:

• Pull the solution from source control.
• Run a powershell or similar script to setup and build the project.
• Run unit tests with VSTest.console.exe

The scripts used in the example project are available in the CIBuildResourceFolder.

The Jenkinks CI will need the following elements installed:

• Jenkins
• Visual Studio – this is required as it provides command line build tools that are required. Plus it can be useful for debugging and identifying build problems.
• IIS – use IIS to host the website to be tested.
• SQL Server Express – the Express version of SQL Server is free and accepts external connections so it suitable for the Jenkins server.

## Things we have learned

### It’s easier to work on the command line than Jenkins.

By far the most important thing we learned about working with Jenkins is that it is relatively slow and can be hard to work with. This is because Jenkins is essentially a workflow and each step needs to complete before the next step must be run. This makes the debug/fix/test cycle slow and cumbersome. The consequence of this is:

• Write scripts to do the work where you can.
• Make sure all the steps work correctly before you integrate them with Jenkins.
• When everything is ready configure the Jenkins project.

When you work with handwritten scripts you can easily call them from the command line, tweak them, comment out sections etc. This massively increases the speed of getting each step to work correctly as you don’t need to wait for the source control pull to complete etc. before you check that some post build action has completed. Because many tweaks may be required to get an entire build process to work this approach can save days of frustration.

### Use IIS on the Jenkins box to host the site to be tested.

As with the development phase it is easier to use a full IIS instance on the Jenkins box to host the website to be tested than use IIS Express. The website will need to be setup and tested.

### Enable Nuget package restore on the Solution.

Nuget packages are now an essential part of .net. To ensure that all packages are downloaded when required on the Jenkins CI Server enable Nuget Package Restore from Solution Explorer.

Visual Studio -> Solution Explorer -> Right click Solution -> Enable Package Restore

### Use Migrate.exe to deploy a code first migration based database.

Strategies for deploying the database will vary based on your data layer, however if using code first migrations you can use the migrate.exe file to perform the update-database action that would usually be called in Visual Studio Package Manager Console. Migrate.exe is included when Entity Framework is added as a Nuget package and can be found in the following folder:

Solution/packages/EntityFrameworkxxx/tools

Migrate.exe needs to be copied into the bin folder of the assembly that contains your migrations. See this article for more details.

# Putting it all together

Once Jenkins, IIS and SQL Server Express are installed on the Jenkins server create a website within IIS for the project to be tested.

Next, create a script that will prepare, build and deploy the website and database each time the Jenkins job is run. The example code includes the folder CIBuildResources which would be included in the Jenkins workspace folder. This folder contains the script setup_and_build.ps1 – this performs the following steps:

• Removes old Test Results files
• Removes all files from the existing IIS website folder website.
• Drop the existing database
• Copy the DB.config files containing appropriate connection strings into the website and tests project ready for building.
• Copy the migrate.exe file to the website bin folder to allow for migrations to be run.
• Build the solution using MSBuild
• Recreate the database schema and seed the database by running the migrate.exe file in the website bin folder.
• Create necessary SQL Server user and update the database permissions by running the databse_permission.sql file.
• Copy the website from the workspace folder to the IIS folder.

Once each step is tested and you have confirmed that the website can be server by IIS this script can then be included in the Jenkins project.  Below is a screenshot of the Jenkins configuration for the example project to show how the steps combine.

The Jenkins configuration for the sample project.

Posted in Programming, testing, Tips.

# Open Data Internship: What is OpenGather?

Our tale begins a month and a half ago, in a lab at the University of Southampton….

I was at the beginning of my internship, and we had decided one of the key jobs was fleshing out our data. The Open Data Service previously gathered data using pen and whiteboard. The issue with these archaic tools is time. Not just the time spent gathering data, but processing it all by hand at the other end.

As such, I set out on journey. A journey to create one tool that would facilitate the easy gathering of data. It’s still far from complete, but here’s what it does so far.

Overview
DISCLAIMER: None of this tool is considered “stable” at this point in time. Data formats can and will change, use at your own risk!

The aims of the tool are:

• To speed up the gathering and processing of data. More specifically, data gathered on-location.
• To enable and encourage non-technical people to contribute open data.

The result is a responsive website, written in PHP. PHP was chosen as it’s trivially integrated into the Open Data service. – https://github.com/Spoffy/OpenGather

User Interface

The current interface is extremely simple, but it gets the job done. It shows a set of object types (schemas) people can submit. Changing the schema changes the form fields shown. These can then be filled in and submitted. Any required fields that weren’t filled in are highlighted in red.

The tool currently supports text fields, dropdown fields and geolocation fields. For geolocation fields, the initial values of longitude and latitude use the phone’s GPS. It’s possible to click on the map to select a more precise location. This is especially useful when recording the location-sensitive objects such as doors.

The Database

A small subset of the tables.

The tool uses MySQL as its default backend. The details are configurable in config.php. There’s a single central table that records each data item entered. It stores an id, the time and the schema id.

Each schema has its own table of details. Each entry’s id acts a foreign key, relating an entry to the details about it in the schema’s table.

Exporting Data

Currently, the data is exportable as JSON. This format allows several schemas to exist together seamlessly. JSON is also human editable, making it easy to correct long-term data. The tool makes the export publically available at http://yourwebsite/path/to/tool/dumpjson.php. There’s no issue with making data public, as it’s designed to gather OpenData.

The data is also available through the MySQL instance. This method of access isn’t recommended.

The Schema Generator

Personally, I think this is the coolest part of the system. It allows you to quickly specify a schema using PHP. This schema is then transformed into HTML for the web forms and SQL for the database. The web interface updates the schema list when the page loads. Dynamically loading these schemas allows submissions to go straight to the database.

The upshot is that defining new schema is incredibly easy. There’s no need to mess around with HTML or SQL. Just a few PHP objects gets the job done!

The following is a sample schema I use to gather data from around the University:

//Format:
//    new ObjectSchema($name,$fieldsArray);
//    new TextField($name,$id, $required); // new DropdownField($name, $id,$optionsArray, $required=true);$schemas = array(
new ObjectSchema("Building Entrance", array(
new TextField("Building Number", "buildingId", true),
new TextField("Entrance Label", "entranceId", false),
new TextField("Description", "description", true),
new GeoField("Latitude", "lat", true),
new GeoField("Longitude", "long", true),
new DropdownField("Access Method Daytime", "accessDaytime", $accessOptions), new DropdownField("Access Method Evening", "accessEvening",$accessOptions),
new DropdownField("Opening Method", "openingMethod", $openingOptions) )), new ObjectSchema("Drinking Water Source", array( new TextField("Building Number", "buildingId", true), new TextField("Floor", "floor", true), new GeoField("Latitude", "lat", true), new GeoField("Longitude", "long", true) )) );  In conclusion…. The tool is still very much in the early stages of development. Feel free to use it, but be wary that things may break between versions! If you’re feeling particularly adventurous, merge requests are more than welcome… Improvements currently on the roadmap include: • An updated, friendlier user interface. • Versioning for the schema, including a tool to move data between schema versions. This should help with the long-term preservation for data. • Support for image uploading. One major weakness is the need to take images seperately and link them later. • Add a README to explain installation and usage. The source code is available at https://github.com/Spoffy/OpenGather (Edit) Next week Eventually, I’ll be talking about taming QGIS, building tilesets and designing GeoJSON maps! Posted in Uncategorized. # Week 7 – Choices Hospitalised Me. Skills Demonstrated – User Consultation – Specification Acquisition – Adding Functionality to Legacy Code This was another successful week working on choices. Just like the previous six weeks. Due to my mangers spontaneous expedition to Iceland, Docpot was put on hold. I’ve been to Iceland before, didn’t take me a whole week to get my groceries though…(Sorry). The medical department uses a convoluted method to allocate students to their courses. This requires a separate allocation algorithm due to the uniqueness of their demands. Last year they split the students into three cohorts depending on their background. Some students were able to pick a language in both slots; some could not as one slot was pre-allocated. This needs changing now. It turns out that choices has finally forced me to make an appointment in the hospital. Meetings are much more productive when all parties are in the same room, even if that room is in a dark corner of a large city hospital. The medical department was a pleasure to work with. It’s refreshing when an agreed solution doesn’t need a complete reworking of existing code. Kev and I split the work between ourselves. The endgame was to ask questions about the language options so students were aware of the requirements. There were three difficulties of a language: beginner, intermediate and advanced. Each required varied qualifications. We needed to implement questions which are only asked when a student pick specific options. If the student does not pick the option, the answer should be set to a default false value. Kev worked on making actions to add choosers to groups depending on their options. I made the changes to the add and edit questions pages to allow a admin to add this functionality to their form. Using what I learnt last week, I refactored the questions controller so it was easier to work with. I cannot stress the usefulness of testing enough! We got a working demo working by Friday. Kev was showcasing it to the medical department this morning. There are a few minor bugs to fix but they seemed happy with our progress. I am now over half way through my internship. Choices starting to look healthier now. I know I say every week but I am looking forward to when i can move on from choices. This week TIDT are doing WAISFest, which is a hackathon hosted by the university. It should give me a good enough reason to escape choices. We shall see… Posted in Uncategorized. # Weeks 5 & 6 – The big refactor Skills Demonstrated – Integration Testing – Refactoring Legacy Code So week 5 was the tail end of front-end fortnight, what is that you ask? Well I can tell you in layman’s terms as I was the only one in the whole team who had no involvement with it. Our team had two weeks to figure out which front-end framework they would use for the foreseeable future. That left me pretty much to my own devices for the duration. That meant choices for me – there is always something to fix in choices. The more astute of my readers will have noticed that this blog post covers two weeks. These few weeks have left my blogging ability rather diminished. But I digress. It is now where I would like to link to you my managers recent blog post “Zen and the Art of Legacy Camp Site Cleaning”. I paraphrase: “Bolting unit tests onto legacy code is pretty much impossible. Untested legacy code which has been chugging along without issue should not be rewritten. The trick is to leave the campsite cleaner than you found it”. There were two incomprehensible behemoth functions in the options controller totalling about 400 lines. Much like choices, 70% of the universe consists of dark energy. No physicist can explain what dark energy is, or how it got there; only what it does. This is where integration tests come in handy. Unlike in unit testing, the controller communicates with fake databases called fixtures. This allowed me to give this pile of code a data set and then examine how it was digested at the other end. I assumed the add and edit functions worked before I touched them and proceeded to make many integration tests using this method. It was important that I had good code coverage in the tests before starting the cleaning process. Good testing allowed for quick refactoring thanks to immediate feedback. Since pushing choices to live, we’ve had even more requests from users. And so it goes on. A user found a bug where the ‘all pages’ button on the paginator was dysfunctional. The code for the all button was commented out, it appeared as though somebody couldn’t get it to work but then forgot to remove the button from the view. The fix was actually simple. They tried setting the options per page limit to 1000 to make the all button work. This would have worked but there is a sneaky cake setting called ‘max-limit’ which is set to 100 by default, this overrode the 1000 limit. There are few more things which I fixed and implemented on choices this week. But this blog post is already getting rather long for my liking, and most likely yours so this is it for now. Posted in Uncategorized. # Zen and the Art of Legacy Camp Site Cleaning This post combines a few ideas from the following books: Zen and the Art of Motorcycle Maintenance – Disregard the title, this book is about the philosophy of value and quality. I enjoyed reading the meandering tale of a man on a motorcycle tour thinking about what it means to do things well. You will need chapter 25, things which shake your quality focus during a maintenance task. Working Effectively with Legacy Code – This is a practitioners guide to coping with a problem we have all created for ourselves at some point or other. For me this period was about the first 10 years of my career. Legacy code is code without tests, almost always accompanied by having no documentation. This code has merrily chugged along, doing what it does since its birth but now you must add a new feature or fix a bug. All had a system with 9 bugs where you fix one and now you have 12 bugs. Clean Code – I read this book at the tail end of last year and it completely changed the way I programmed. If you only read one book about programming read this. It also gave me an excellent gauge for what Pirsig describes as classical quality. There are a lot things I intuited were good but did not know why and this book made those things explicit. The important message for this post is “The boy scout rule”; always leave the camp site cleaner than you found it. Combining the lessons from these books has helped me take a beastly code base and make it elegant. Begin by getting your “Zen” right, this is not a smash and grab you are going to turn this mess into a quality product. The gumption (quality focus) traps to avoid include: • Wishing you started from scratch – Don’t spend your maintenance time wishing you had re-written the code from the ground up. The grass always looks greener in that empty document root but trust me it is not. If this was so easy to do from scratch how did you make such a hash of it the first time? The problems of your current code are hard, start-over code will have hard problems too but they are further away. You felt as positive when you started this code base as you do about starting over. You will feel just as bad again, the fact is good software is hard to write. • Thinking you don’t need tests – You need tests. Yet you would be a fool not to have a good click around your in your application after you make a proper change. Clicking around your software is slow and inconsistent. Even with a testing team you wont click the full set of possibilities. Write tests for the existing code before you refactor and you will be sure you won’t introduce bugs. • Fear of big changes – It is surprising how you stop worrying about making changes when you have tests. It is reassuring to know you aren’t breaking things. • The big rush – This code base is hard to change, it makes everything slow, writing tests is even more time consuming. Writing it from scratch will be slower and once you have written a few tests you will get the rhythm. As you test more the speed you can make changes will start to improve until you can actually go quite fast. What’s the hurry though? You are trying to build something quality, you shouldn’t rush quality. Also you have the perfect excuse for going slow “sorry this code is old and hard to work on”. You wont have the excuse if you start from scratch. The trick of working with legacy code is to determine suitable chunks to test. From my experience adding a full set of unit tests to a legacy code base of any size is almost impossible so do not try. User interface tests like Selenium can be good but it is hard to get good coverage with them. They are slow to write, slow to run and they can be quite brittle. Have a light covering of these tests over a broad set of features to avoid embarrassing bungles. The real name of the game here is integration tests. If your code wasn’t written with unit tests it is certain to be un-unit-testable; there will be tight coupling all over the shop. Use integration tests to test natural lumps of functionality. You are likely to need a database of test data. Your integration tests will soon end up taking tens of seconds to run so your tool chain must let you run a sub set of tests. This will let you iterate on the particular area of the code you are testing. Once you have a nasty lump of code under test you can start refactoring the components and add better tests as you go. After not so long the lump of code will be a set of quite well encapsulated classes with unit tests. If you repeat this pattern for every lump of code in the project you will have a quality code base. The hardest test to write is the first. It’s psychologically hard because it is change of mindset. It is a big learning curve because you will have to work out how to get the test framework into the existing project. It will being boring because you need to create a bunch of test data. It is a technical challenge because you have to work out what the existing code does to test it. In spite of all that you have to do it. It is the only way you will ever achieve quality and if you read this far it’s because you want to deliver quality. So how now you have a code cleaning approach how do you choose what to clean? The boy scout rule! The whole code base is a mess so start with the bit where you want to add your feature. To add the feature you will have to learn how the surrounding code works. You can specify that knowledge in tests. You add some integration tests and then clean the camp site for a while. Once you are happy that it is clean enough to work with you can add your new feature and it’s associated unit tests. Now you have a clean place to put it your new code can have proper unit tests. It will take a long time to get your code base clean using this method. Adding a new features will give you good test coverage and make the code easier to work with. You will have well defined methods with good naming and simple logic. You might choose to spend your Friday afternoon doing a bit of camp site cleaning before you clock off. Remember there is no rush. There are always new features to add so you will have plenty of opportunities to clean. Remember well written tests specify what the code should do. If another programmer wants to change your code they should read your tests. If they change the code’s behaviour a test will tell them the impact of that change. Posted in Best Practice, Programming, testing, Tips. # Open Data Internship: How to Gather Data – Mark II These instructions have been deprecated, see the new version available here. This is a quick(ish) instructional post on how to gather open data in Southampton. This is written assuming you’re using the Open Gather tool. It covers the sort of data we’re looking for and how to gather it. Objects we’re looking for: • Drinking water dispensers (Fountains, coolers, etc) • Gender neutral toilets (Toilets a person of any gender could use, e.g most disabled toilets) • Portals into buildings and between them (E.g Doors) • Public showers a cyclist could use • Reception Desks (And any other Points of Service) • Images of University buildings (that we don’t already have) It’s a bit like a game of eye-spy (or Pokemon Go). The aim is to hunt round, find each of the items above and record info about it. For some of them (Portals, Building Images) we know the data we’re missing. For the others, we have no idea, so a little urban exploration may be required! For all of these, we’re interested in where they are. This involves a building number, floor and geo-location for most of the above. Open Gather has a clickable map to allow for precise geo-location. Otherwise, http://lemur.ecs.soton.ac.uk/~cjg/clickymap/ is available. ### How to Gather Data Preparation • If you’re hosting your own copy of OpenGather, make sure to clear any testing data out first! • If you’re recording portals and building images, it can be helpful to plot the things you’re looking for on a map. If you’ve retrieved a list of items using SPARQL, you can use the following to plot the items on the map. • Run a SPARQL query to generate the list of missing items, complete with latitude and longitude (examples to come soon!) • Generate a KML/CSV/GEOJSON file from the data produced by the SPARQL endpoint. • Host the KML/CSV?/GEOJSON file in a publically accesible location. I prefer Git, but Google Drive or an online pasting tool like Pastey also works. • Using umap (http://umap.openstreetmap.fr/en/) as a mapping tool, add a layer, then either import the data from the remote, or in umap, add as a remote data source. • (When using Umap, tick “Use Proxy” to ensure the icons load correctly) • Print off a copy of the map. UMap doesn’t work very well in mobile browsers. • Print off Univeristy of Southampton photograph consent forms. These are needed to use photos with people’s faces in. • Make sure your phone and camera have adequate amounts of battery (ideally full). • If your camera and phone are different, check the two clocks show the exact same time. This makes matching images and data far easier. Overall Process 1. Pick a location on the map and decide which buildings to gather data from. 2. For each building, gather the data needed, using the instructions below. General Method This is the quick-and-easy summary of how to record data. More specific information is available below. 1. Open the OpenGather tool in your browser. 2. Select the type of object to record. 3. Fill in the appropriate fields. 4. Submit the data. 5. Take a picture of the object. Taking a Building Image 1. Using the open data tool, select the category “Building Image”. • Fill in the “Building Number” field. • Wait for the GPS to update to the current location. • If the accuracy is low (say, less precise [higher] than 6m), click/touch the map to mark a more accurate position. 2. Take a picture of building, attempting to get as much of the building in frame as possible. • A good photo will make the building easily identifiable as you walk past it. The geo-location data isn’t necessary for buildings that are already marked on the map, but it helps automatically match images to names later on. Gathering Portal Data Walk around the building looking for entrances. Try to identify all entrances that aren’t fire escapes (which we aren’t permitted to gather as of 14/07/2016). For each entrance that you find: 1. Select the category “Building Entrance” in the OpenGather tool. 2. Record the geo-location of the entrance. This can be done by tapping on the map in the OpenGather tool. 3. Fill in the fields using the tool. • “Building Number” – Number of the building the entrance is attached to • “Entrance Label” – An arbitrary letter to identify the entrance. Typically starting from ‘A’. • “Description” – A brief description of the entrance, such as “Staff”, “Main”, “Side”, “North-east”. • “Access Method” – Is a card or key needed to get in? • “Opening Method” – How do you physically open the door? This is used to determine disabled accessibility. 4. Submit the data 5. Take a picture identifying the entrance. A good photo will make the entrance easily identifiable as you walk past. 6. Follow the procedure for getting consent, if any people are in your photo (an ideal photo has no people). Recording a Drinking Water Source Should one of these rare and majestic creatures be spotted: 1. Throw a greatball at it 2. Select the category “Drinking Water Source” in the OpenGather tool 3. Fill in the fields using the tool. • “Building Number” – Number of the building the water source is in • “Floor” – The floor the water source is on, level 1 is usually the ground floor. 4. Record geo-location using the map. Zoom in on the building you’re currently in, and try to mark your position in the building by clicking on the map. 5. Submit the data 6. Take a picture of the water source. Ideally, this will make it clear where the water source is located in that part of the building. 7. Follow the procedure for getting consent, if any people are in your photo (an ideal photo has no people). Recording the location of Public Showers 1. Select the category “Public Showers” in the OpenGather tool 2. Fill in the fields using the tool. • “Building Number” – Number of the building the water source is in • “Floor” – The floor the water source is on, level 1 is usually the ground floor. • “Room Number” – Room number of the shower, if it has one. 3. Record geo-location using the map. Zoom in on the building you’re currently in, and try to mark your position in the building by clicking on the map. 4. Submit the data Reception Desk (Point of Service) 1. Select the category “Point of Service” in the OpenGather tool 2. Fill in the fields using the tool • “Description” – What the Point of Service is. For example, “Library Reception Desk” or “Student Services Information Desk” • “Building Number” – Number of the building the point of service is in (assuming it isn’t a standalone service) • “Phone” – A phone number for contacting that point of service, if available. • “Email” – An email for contracting that point of service, if available. • “Opening Hours: Mon…etc” – Opening times for the point of service for that day. E.g “9:00-18:00”. 3. Record geo-location using the map. Zoom in on the building you’re currently in, and try to mark your position in the building by clicking on the map. 4. Submit the Data 5. Take a picture of the desk. It’s nice to have a friendly receptionist in the photo if possible, but don’t force anyone! 6. If anyone (including any member of staff) is in the picture, follow the procedure for gaining consent. Requesting Consent Attempt to get nobody in the shot, unless you’re taking pictures of a reception or Point of Service stand, where behind-the-counter staff can make it look friendlier. If people need to be in the shot: 1. Verbally ask permission before taking the picture, explaining that you represent the Open Data Service, and what that is. Ensure they’re okay signing a consent form. 2. Take the photo. 3. Ask them to fill in an entry on the consent form. Cross buildings off as you go, to mark them as completed. Posted in Uncategorized. # Open Data Internship: Task Lists, Links and a rant on PHP Week 5 already? It feels like the time has flown by. This is a particularly interesting week for me, as Chris is away story-ing. I have plenty to do, but it all needs doing under my own steam. That’s why this is a fantastic time to talk about time management. More specifically, how woeful I am and how to improve. Thoughts on Time To begin, I want to talk about task lists. An easy-to-use task list has single-handedly made the biggest difference to my time management. It’s done that by ensuring I never forget a task. I can always see the tasks I have to do, prioritise them and set deadlines. By prioritising them, I can always work on the most important thing. The trick to using a task list effectively is to put everything on it. Every little job, no matter how small, needs to go on it. It needs to be an authoritive list of everything that needs doing. When the list contains everything, you can work purely off the contents of the list. You just pick off the most important task you can have time to work on and get to work. My two biggest time management issues are tunnel vision and getting distracted. The former is where you get caught-up working on one thing. The mistake is not stepping back and asking “What’s the right thing to be working on?” periodically. As for getting distracted… it’s hard not to, in a lab filled with shiny objects and fantastic people. Hopefully, these will be solved in later blog post. Work for the Week This week, I’ll be mainly working on creating a SPARQL URL checker. This will build a database of all the URLs a SPARQL endpoint knows about (URLs in this context meaning website addresses, rather than RDF URIs). It will then launch requests to each of those URLs, reporting the status of each. The aim is to identify any broken links that need repairing. Chris and I spent Friday planning the system, which should look something like this: A high-level plan of SPARQL-Detective The code will be available at https://www.github.com/Spoffy/SPARQL-Detective in the near future. I’ll also be working to implement some of the changes to OpenGather I mentioned in my previous blog post. The main focus is to implement a schema for each type of data to be gathered. PHP – Or as I now call it, “Interpreted C” As promised, a short rant. So, this internship has been my first time working with PHP. I used it for the OpenGather tool, and up until now it hasn’t been so bad. However, I have since been introduced to the joys of the cURL library. Here’s a short snippet to query a URL in cURL. $curlHandle = curl_init($url); //Force it to use get requests curl_setopt($curlHandle, CURLOPT_HTTPGET, true);
//Force a fresh connection for each request. Not sure if this is needed...
curl_setopt($curlHandle, CURLOPT_FRESH_CONNECT, true); //Get Headers in case we need Location or other. curl_setopt($curlHandle, CURLOPT_HEADER, true);
curl_setopt($curlHandle, CURLOPT_FOLLOWLOCATION, true); //Do we care about SSL certificates when checking a link is broken? //...Possibly if there's SSL errors. V2. curl_setopt($curlHandle, CURLOPT_SSL_VERIFYPEER, false);

//Don't actually care about the output...
ob_start();
$result = curl_exec($curlHandle);
ob_end_clean();

$link_status[$url] = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE); curl_close($curlHandle);


Oh, sorry, did I say short? I lied. This bit of code summarises my thoughts on PHP perfectly. The code is verbose, unnecessarily so. It’s missing high level abstractions (You have to manually parse the returned string for header data). And it all somehow feels… clunky. Thankfully, at least memory management isn’t a problem.. right?

For comparison, here’s the same snippet in Python.

link_status[url] = urlopen(url).getcode()


….The joys of cURL!

Posted in Uncategorized.