Skip to content


We need to talk about online advertising

We are getting very used to seeing obvious lies in advertising on reputable websites.

This worries me.

It’s much more serious than mere “clickbait”.

Some of the biggest lies are the ones which use your IP address to guess your city or country. This started out on more “disreputable” sites, where I would see adverts that “Women in Reading want to meet for sex”. Which was odd, as I was in Southampton, but my ISP was giving me IP addresses coded to Reading. It’s clearly a lie, but we accept it, and that acceptance is risky.

These days most websites don’t sell their own advertising, they use companies like Outbrain or Taboola to provide the adverts. The adverts you see are custom to you, based on both location and your own browsing history.

Advert in context on the page

If the Telegraph newspaper carries an advert that has a lie in, it’s relatively easy to complain about. However if the Telegraph website carries a false advert, who’s responsible? The advertiser? The telegraph? The advertising platform? There’s no guarantee these are in the UK, and it’s not clear who’s responsible.

The most alarming advert I’ve seen is one for a “tactical flashlight” which has the text something like “Police in Southampton recommend everyone carries one of these”.

Close up of the advert which appeared on the Telegraph website (also Daily Echo)

It links to a fake news article with $cityname in the URL to tell me that there’s been a rise in violent crime in $cityname and hence the police say you should buy the product. This is beyond immoral, it’s dangerous and almost certainly illegal. The fact that this advert still appears in various forms scares the hell out of me. The fact that some newspapers now muddle the advertising with the “other stories on this site” makes it harder to evaluate the source of information.

Seemingly “legit” news articles as advertising

If you visit “disreputable” parts of the web (porn, piracy etc.) you will get very used to popunders advertising a mix of sex sites, gambling, malware and financial scams (“The Brit Method”) etc. What I’ve noticed in my “research” looking at such sites, is that sometimes the pop-under window is just a news site with a story on. What the hell is going on there? Why is ibtimes.com trying to open innocuous windows on my browser with random stories from their site… are they hoping that getting it in my eyeline will get some social media links? I don’t know.

Your filter might be racist

There’s been some worrying reports of some of the targetted advertising on Facebook being used to offer something only to certain communities. Or target political messages at certain ethnic groups. Also, remember, that location can be a proxy for race. If you’ve rough data on where different races live, filtering by postcode or even town can be quite creepy. Adverts don’t tell you “You are seeing this because you live in the white middle class part of your city”.

It’s a trap

Some online advertising is for flat out scams. The “get rich quick scheme” is alive and well in 2017. The first image on this page is a good example, but I’ve hit reload a bunch of times trying to find one today and for some reason can’t.

What I did find is adspider.io which looks like an interesting tool which tracks online adverts and what sites are showing them.

Here’s a nice example of a site with the hallmarks of a scam (fake comments, fake location heading). For fun I’ve linked to it with $cityname. I found this from an advert on a website for sport.

Possible remedies for issues with online advertising

I’ve been trying to think what to do about this. What should lawmakers, ISPs and media be doing?

First of all, lets put “out of scope ” the ads on on “disreputable” sites, they have no license or “good name” to threaten.

But what can we do about ads on Wired, the Telegraph, or the Daily Echo (our local paper).

Idea one: Advert identification codes

Every unique advert shown on a website should be assigned a unique code for that website or advertising platform. This would let people complain about something more concrete, rather than something entirely ephemeral.

Idea two: Personal advertisement log

A user should be able to click a link near the advert to get a list of every advert they have been shown from this source (website or ad platform) for the past N days. N negotiable, but I’d suggest 90 days minimum. Each advert in this review will also tell them the data used to make the decision to show this advert. Actually this would be nice anyway. Do you ever see a really cool ad on Facebook and then a window pops up over it and by the time you sort that the advert is gone forever. Every instance of every advert should have a unique URL which is visible to anybody.

Idea three: Public advertisement log

This is more hardcore; but I think that EVERY advert shown to EVERY user in the past X days, along with the logic used to create/show it, should be made available to the public.

Idea four: Sort out how to complain and escalate complaints

Who should be held responsible when I visit a site of a UK company, hosted in Germany, using a USA advertising platform showing an advert for a Chinese company? This is tough, and I just don’t know, but we need to find a solution for this questions.

Right now, it’s too easy for a local news site to wash their hands and take no responsibility for the bad behaviour of the advertising platform they use. The best idea I have is that a UK standards agency could ban the use of non-compliant advertising platforms by UK companies.

Problems with these suggestions

Advertisement 72B0391CF0F shown because: Viewer in UK & Viewer searched for “Impotence cure”

It’s very hard to define who a user is in a way that requires them to be able to see their advertising history, even though we know that these companies know exactly who we are… If I search for something on the John Lewis website I see related adverts on other sites the next day.

Another issue is that an advert could be made to contain information that could not be made public because it contained identifiable personal information. If the advert image contained the target viewer’s real name, then you couldn’t publish it to the public along with the reasons it was shown. This could be used as an excuse not to make it public, and “people want personalised adverts” would be an excuse to make it impossible to disclose adverts without breaching someone’s privacy.

Conclusion

The above is based on my own experience browsing the web. Maybe you see different adverts to me? How would I know.

Anyhow, the current situation needs to change, and to do so we need concrete things to ask for. Am I unrealistic or am I not going far enough? What do you think?

Posted in Advertising.


50th years since the “Mother of All Demos”: What’s that got to do with the price of fish?

Demonstrating a user interface to manipulate structured data

So, we’ve been discussing ways to mark the 50th anniversary of the Mother of all Demos [Youtube, Wikipedia]. In this demo, Doug Englebart demonstrated the tools that he and his team had built to make themselves smarter and more effective. Some of these tools would become household items. He was one of the most important inventors in history.

The anniversary is 9th December 2018 (so 13 months from now). There’s some thoughts at http://doug-50.info/

Frode Hegland is head cheerleader for our discussions, and has asked us to think about where we can demonstrate and celebrate Doug’s ideas and vision, and how we can take it further.

So “augmenting human intellect”… how hard can that be?

What I’ve been thinking out is something I don’t have a perfect description of yet. It is about how

Containers being transferred to a cargo ship at the container terminal of Bremerhaven by Hannes Grobe

humans interact with information, researchers and scientists, most of all, but everyone else too. There’s an excellent blog post by Mia Ridge which has 0utlined much of the problem with information in 2017. Our data is anemic. We can move it around the world in moments, and request strings of ones and zeros but we know almost nothing about what these contain.

People are so used to the status-quo that they don’t realise there’s a problem and how much better it could be. It’s like shifting sacks of cargo onto a ship. That used to be “just how you did things”.

The best phrase I’ve got for this idea, so far, is “Intermodal information”. I’m stealing the idea from the freight industry. While I’m stealing, I’ll steal the whole definition from Wikipedia.

Intermodal freight transport involves the transportation of freight in an intermodal container or vehicle, using multiple modes of transportation (e.g.,rail, ship, and truck), without any handling of the freight itself when changing modes. The method reduces cargo handling, and so improves security, reduces damage and loss, and allows freight to be transported faster. Reduced costs over road trucking is the key benefit for inter-continental use. This may be offset by reduced timings for road transport over shorter distances.

The introduction of containers that worked between trains, ships and trucks changed the economy of the world for the better. We’ve already experienced something similar in data. thrice. Storing information digitally was the first. The advent of the packet switching network (IP) meant we can now move data over networks from any computer, to any computer. The IP network sends out packets of data and those packets move over wifi, wires, fibreoptics… even via satellite. The Web (HTTP) was a second revolution in the interoperability of data. Now we could request computer files all over the world, and get them with some basic metadata (mime types tell us a little about how they should be interpreted), and the URL system means we can link to computer files, and talk about them.

It’s no secret that this has changed the world and our species relationship to data.

So what’s the problem?

Data is great but it’s the start of the story, not the end. When you download a webpage it has some “MIME” header bit that is distinct from the file you are downloading. The bit that tells your computer how to interpret the file is called “Content-type”. The value of this is called a “MIME Type” and is generally something like “image/png” or “text/xml” or “application/vnd.google-earth.kml+xml”. Sometimes there’s a character encoding bit as well, eg. “text/html; charset=utf-8”. MIME types work almost the same as the “suffix” on the end of a filename, eg. badger.png, or secrets.html. It’s a lot more useful than just guessing what the file is, but not much better than the filename on a hard-drive.

What I hope we can achieve is a way to better describe the contents of files. There’s different ways to interpret the same file. A KML file is also a valid XML file which is a valid text file, which is a sequence of bytes, which is a sequence of bits. None of that tells us that the KML file describes the locations of park benches in Southampton.

Datasets come in many forms… except they don’t really. On computers, data files are usually structured as either trees of information, where each thingy, has zero or more subthingies, or tabular data where information is organised into sets of homogeneous records where each record has information in more or less the same shape. CSV, Spreadsheets and stuff like that. There’s also “graph” data but that’s less common.

What’s that go to do with the price of fish?

Maine Avenue Fish Market (Bien Stephenson)

What interests me is for our tools to be able to record, transmit and understand the structure and meaning of a file.  This is a distinction between data and information. All mime types tell us is roughly what tools can read a file, but no more. Let’s take a very simple example of a spreadsheet containing a list of prices of fish. All we get from MIME is “application/vnd.ms-excel” which just tells us we can read it in Excel. We know it’s going to have one or more worksheets each with tabular data, but it would be helpful to know for sure that the first worksheet is the one of interest, that it is structured in rows with one row per record and the first row is the headings, that the sheet represents a list of products and their prices. Going further it would be helpful to know it’s about fish, relevant to a certain vendor, that we can validate the vendor really provided these prices and the timescale and audience for which it’s valid. It would helpful to link it to product categories, weight, specifications, species… and to have all those things done automatically and unambiguously with no extra work to anyone.

Hafenarbeiter bei der Verladung von Sackgut – MS Rothenstein NDL, Port Sudan 1960

This is not easy. But, it will happen eventually, somehow, and when it does we’ll look back on this as the olden days and think of the computer files we use now the way we look back on sacks of cargo loaded by gangs of stevedores. We can’t get there in one big jump, but it’s where we should be aiming for. Our data should just work, and get out of our way. Not just open data but all our data.

This is a bit bigger than I usually aim, but the brief of celebrating and extending the work of Doug Englebart is an unreasonable one, so maybe we need to starting thinking beyond what is reasonable…

And for me that’s “Intermodal information”. Hopefully we can come up with a catchier name.

Posted in Doug Englebart, Research Data.


Little bugs, hidden features, and lots of chatting – Week 11

This week has been full of lots of odd jobs for KnowledgeNow. I’ve fixed up the breadcrumbs, added a visual response for a failed feedback submission, cached the navigation data, split the search box out into a partial layout for re-usability, and submitted the project for a code review. The feedback I got from Andy was really useful, and it was reassuring to have somebody else more experienced check over my work.

I think what’s surprised me most about this week has been the amount of talking I’ve done. I’ve discussed how we’ll integrate the website into the existing iSolutions site with Pat, I’ve discussed navigation and template issues with Graeme, and I’ve talked with the Service Management team about integration with Service Now and how we plan on logging search queries and view counts. I’ve decided to even hold a demo of KnowledgeNow for the Service Management team and some of the interns next week to get feedback on it. It will only be a couple of days before I leave, so there probably won’t be enough time to take into account everyone’s feedback but it will be nice to know what people think and to leave a future plan for the team to further develop this.

The information we plan on logging goes beyond my current knowledge of web development, with recording IP addresses, user agents, and search queries along with the id of the article being viewed. It will involve using sessions, something that Martin tells me is trivial but I’ve learnt not to underestimate seemingly small tasks like this, and I look forward to tackling this first thing on Monday morning.

It seems the closer to completing this project I get the slower progress is, things like logging and custom error pages that I expected to be relatively straight forward aren’t, and there aren’t so many easy jobs to fill my time waiting other people or to tackle when my brain gets tired. The learning curve continues, and the list of little jobs to do before going live keeps going, with a new item popping up every time I cross one off. My final week of working for iSolutions will most likely be a frantic race to get my project well and truly production ready, and a battle to prove to the people in charge that they should put it out there.

Posted in Programming.

Tagged with .


Open Data Internship – Week 12 – A retrospective

As I come to the end of my time here I am finding that I am looking back on what I have done. So here I have collected my actions into one place for the purpose of giving next year’s intern an idea of what they should do and where to start. So…

What Have I done?

To start with, I read all the blogs written by last year’s intern.

After doing that I started looking into SPARQL, and eventually put together this query for finding all building without images.

PREFIX soton: <http://id.southampton.ac.uk/ns/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?building (SAMPLE(?label) AS ?Label) (SAMPLE(?lat) AS ?Lat) (SAMPLE(?long) AS ?Long) WHERE {
?building a soton:UoSBuilding
OPTIONAL {
?image a foaf:Image ;
foaf:depicts ?building
}
OPTIONAL{
?building rdfs:label ?label
}
OPTIONAL{
?building geo:lat ?lat .
?building geo:long ?long
}
FILTER (!BOUND(?image))
}GROUPBY ?building

It works by getting all buildings, then optionally finding associated image, label and geodata. Then removing all data points, which do have an image.

Then went out into the world armed with a clipboard, list and camera to take photos of missing buildings. It is best to do this in good weather so planning is vital. I would advise the next person to set this up very early in their internship. Then when the weather is good, go out and gather the data.

The next thing I did was make a map showing building internals. This was my main project for the course of my internship, the aim of which was to make an app, which people could use to navigate around building 37. The source code for this is available here primarily in the map.js file. This file uses leaflet js and leaflet-indoor to draw a map using the university’s curated map tiles, then drawing polygons representing the rooms on the map, with leaflet indoor handling the rooms being on different levels.

I drew these rooms is QGIS, by tracing floor plans given to use for the purpose of this project. These were stored as geojson polygons, with some information about the room, typically type of room (office, way, stairs, lift); level; and a label giving the room number.

The next part of this project was to allow a user to navigate the map. To do this I decided to take a graph based approach. This meant that a user would navigate between nodes via edges. This means that A user would start at one node, which would be connected to another node by one or more edges. A user could then move from one node to another via an edge. The nodes would be placed in rooms as destinations and in corridors and doors to allow for model of travel balancing complexity of the situation and simplification for computational reasons. I started to do this with a breadth first search (BFS), but this quickly become too unwieldy to use as the computational complexity of a BFS is branching factor *depth, which happens to total the number of nodes in the graph, as such the complexity is O(n). So I decided to move to an A* search, using the following heuristic:

Distance travelled along edges + distance directly To End + (0.35 * levels changed). This means that I would sort all nodes on the graph I had travelled to by this value and expand the node that had the smallest value first. Now whilst the complexity of this should still be branching factor *depth, a perfect heuristic gives a branching factor of 1, as you assume you will always take the best route, as for depth this is slightly harder to justify. However, our map is finite and currently to one building, with hopes of eventually the whole campus, the depth would be finite and not directly tied to the number of nodes in the map, so that will be constant a constant value too. Based on these two points the complexity of the new navigation method is O(1).

The first time I drew the nodes to navigate; I drew them in QGIS, and manually added the edges. However, I eventually built a webpage that would allow me to add nodes and draw the edges automatically. This is available here in the edit.html and edit.js files.

I spent a sizable amount of august making a json editor in python, available here. I was mildly proud of this as a vast majority of the functionality is dynamically generated from a schema for the data. However, due to a lack of foresight and testing it was decided that this was not the best way to add/edit map data, so I made the edit.js code instead.

The other thing I did during my time here was to look through the work done by the previous intern. This comprised of looking through an app they had written for data gathering purposes and some data they gathered which had not made it onto the open data site. Once I found the data that had not been added, I worked out what it was for and got it added to the website.

Posted in Geo, Javascript, Open Data, Programming, python.


Recaptcha, routing, and more testing – Week 10

The slow and steady progress towards having a web application ready to go live continues. This week, I’ve added recaptcha to form submissions and cleaned up routing to allow for urls such as /KnowledgeBaseArticle/ArticleDetails/KB0011703 as opposed to /KnowledgeBaseArticle/ArticleDetails?query=”KB0011703 “. The branding is also in the process of being redone, being carefully designed to fit seamlessly with the existing iSolutions website. It will, as far as the users are concerned, simply be a new section of the website, with all of the menus and navigation options remaining identical. This has proved more difficult than I would have hoped simply because the templates and standards I expected don’t exist. I was provided with a link to a json file that is supposed to represent the core university and iSolutions navigation bars, but on further inspection, they don’t quite match up to what’s displayed on the university website and one of the navigation bars isn’t represented in the json at all. When I asked about this, one of the responses I got from another iSolutions web developer was along the lines of “you can always add static links, just keep an eye on them, I’ve done it for projects before”. And when I explained that I don’t want my website to fall behind and no longer work properly after I leave, I was told “that’s normal, all websites stop working at some point”. As an inexperienced web developer, perhaps I came into this field too optimistically, and expecting more from existing systems than is entirely reasonable. Perhaps a future project, for either a full time employee here or another intern, would be to standardise the navigation across the university website and to provide a template to the entirety of iSolutions that can be updated in one place, instead of getting people all across the company running around trying to clean up the fallout whenever a minor change is made to the core website. We all work to avoid code duplication in our own projects, but if we could work to avoid code duplication across all projects then our systems would work better, and we’d all spend a little less time reinventing the wheel and a little more time on the interesting innovative work that really matters.

Moving on from Knowledge Now, this week Pat and I took a trip up to Highfield Campus to meet with the open data team to discuss student engagement in the run up to the new academic year. We started off by talking about the different kinds of users who could possibly be interested in the open data service, and then moved on to looking at the open data website and considering how it could be improved for each of these kinds of users. Having found interaction with the open data service frustrating in the past, it was interesting to talk to the people behind making and maintaining the website. The first point we came across immediately, was the state of the navigation menu on the website. It’s thoroughly confusing and the entire purpose of this service isn’t immediately clear. (Not to mention the lack of university branding, if only there were a shared university template they could have used.)

opendata

The Open Data Service Homepage

I was strongly in favour of splitting the services provided using open data and access to the open data itself in order to allow for non-technical students to use the services more easily with less risk of getting scared away by technical jargon and data they have no interest in using. At first this idea seemed to be seen as too much work for too little value, but in my mind it was simply a case of moving links around and bearing in mind whether a page should be targeted at technical individuals or not. By the end of the session, it was agreed that the navigation menu should be split between technical and non technical details, and that links between the two should be minimal. I’m glad that I managed to get my point of view heard, and I really think it will help with engaging technical and non-technical students alike. Even computer scientists like things to be organised every now and then.

That wasn’t our only field trip this week, we also had an intern-wide trip to the data centre. It was really interesting to see where all of the university systems are running, and the technology involved is incredible. The entire building has been kitted out specifically to deal with super computers and server racks, with underfloor and overhead cooling, and the most thorough anti-fire precautions I’ve ever seen in my life. As cool as it was to see this kit in action, I felt thoroughly out of my depth. I’m not a hardware nerd, and when people describe engineers as people who have been taking apart machines since they were children I never feel that that really describes me. I’m a software engineer through and through, I preferred decision maths to physics at school and nothing’s changed. I’m grateful to have been able to seen this kit, but the talk we were given about the specs of various machines and the mechanics of the system supporting them all went over my head.

Data centre tour

Data centre tour

Posted in Data, Open Data, Programming.

Tagged with .


Off with the training wheels – Week 9

Last week, I spent the majority of my time refactoring and restructuring KnowledgeNow with Martin’s help, either pair programming or checking in with him for what I should be changing and how, then going away and coming back when it’s done. Martin left for two weeks holiday at the beginning of this week, but I was under the impression last Friday that there really wasn’t much left to be done, and that I’d be able to polish it off nice and quickly all by myself. It turns out I was wrong, and one week later if feels like I’ve hardly dented my “polishing off” checklist. The missing tests turned out to be trickier to write than expected, and the extra features that needed adding were a lot more involved than I gave them credit for. The team were really supportive, and were happy to help me crack any particularly tough problems, but not having anybody else working full time on this project with me was a bigger shock than I was expecting. Suddenly I was captain of my own ship, and it was a lot more work than I realised. It’s been a really fun experience though, and the satisfaction of having my web application now report feedback back into service now, seeing those results in the ServiceNow web app itself, was fantastic after all those hours of work to make that happen.

Adding the “Was this helpful?” functionality involved adding methods to the ServiceNow api wrapper (originally written for Fast Track Tickets) that allow new items to be posted to the Knowledge Base Feedback table. This now means that users can report whether or not they found an article helpful and are then presented with a comment box to add extra information if they wish to.

hepfulcomments

The rest of my week was spent cleaning this code up and adding tests, something that I always seem to underestimate in terms of the time and effort required. Crossing the 50% line for test code coverage shouldn’t have been a huge achievement but it felt like it. My plan for the coming week is to keep the momentum going with writing tests and to try to cover the entirety of the project as thoroughly as possible before refactoring again and requesting a code review from the team.

Posted in Programming.

Tagged with .


KnowledgeNow and the major clean up operation – Week 8

As the buzz of having such a successful meeting last Friday wore off, the reality of turning a mock-up into a fully functional project started to set in. Almost the very first thing I did on Monday was create a code map from Visual Studio for the project. Trying to explain the connections between classes and functions to other members of the team showed me just how little I understood my own work. This was a jumble of lines from tutorials and stack overflow articles that were somehow leaning against each other in such a way that they didn’t crumble to the ground, and in the relatively safe and stable environment of short presentations that was fine, but the real world wouldn’t go so easy on my code so something had to change.

Original code map for KnowledgeNow

Original code map for KnowledgeNow

The code map that was generated is pictured here, not for anyone to analyse and understand but more to illustrate what a large amount of code needed changing. Attempting to write tests for this mess before digging in and moving things around, I found that badly structured code is naturally harder to test than clean code. The terrible irony in this is the fact that the code in most need to thorough tests is the hardest to write them for. A lot of my classes got restructured and refactored without any underlying tests, we just relied on the fact that we could revert to an older version if anything went terribly wrong, and then we wrote tests for the cleaner code afterwards.

The seemingly insurmountable hill has certainly been an uphill battle, but it’s actually been quite fun at times. The satisfaction of taking something messy and working at it like clay until it’s something clean and logical can’t be underestimated. We’re at the point now where the structure is actually making sense, some tests have been written and a bunch of empty test cases have been set up ready to be filled in. Soon, I can turn my attention to the requested changes to the system and features I hadn’t included in the original mock-up.

KnowledgeNow Newer Code Map

KnowledgeNow Newer Code Map

Profiles.soton has also been going along in the background. I attended TAG on Monday, a relatively frustrating experience at first, being told we hadn’t planned this well enough of done a vigorous enough requirements capture. The meeting soon got back to the original purpose though, and architecture options were talked about. Seeing documents I’d created on the TV at the front of the meeting room and having people analysing them there in front of me was a little nerve wracking, but it was really interesting to hear what people thought of the options. My favourite option, of adding everything into the Active Directory database and encouraging people to use that as their go to for information on users, was immediately discarded. My understanding of the purpose of Active Directory was clearly a little off – it seems it’s designed to be a very slim database only used for information such as roles and permissions. The committee seemed to quite like my second favourite option of the three, adding this information on top of IDM and reading it straight out of that system, rather than sending it downstream to Active Directory.

We decided to go away, do some more research into existing profiles systems, and will return next week to present our ideas again. I spent a little bit of time investigating existing profiles systems at the university such as efolio, pure, sharepoint, and even blackboard. None of them quite meet the requirements we’ve gathered, but the idea of building another profiles system on top of the large number the university are using without offering some kind of consolidation across the various platforms didn’t seem right. The plan now is to provide information to these systems and have a central system for updating the information, disabling any editing in the other systems being used. This allows current users of these systems to still display their information there, but centralises the process of updating it, ruling out any chance of discrepancies between systems.

Once I’ve finished working on KnowledgeNow, I’ll be helping with the development of profiles.soton, so it’s been good to keep up to date on what decisions are being made about it.

Posted in Programming.

Tagged with .


Open Data Internship – Week 8 – Life Serial

Excluding merges, 1 author has pushed 18 commits to master and 18 commits to all branches. On master, 5 files have changed and there have been 365 additions and 435 deletions.

The last week has flown by and whlist I have been working I cant pin down anything of value to write about, so instead I shall run through the boring stuff. I have continued working on my data maintenance tool, implementing sorting, filtering and saving, as well as a number of validation checks. I have tied, commented and generally made nice the code, keeping it as data-independent as possible. As a result the tool is basically finished now, there exist two issues, both of which arise from outside my code and fixing them looks like it would take more time and effort than is reasonable.

In addition to this there was an open data meeting where the issue of how to encourage greater use of the data and services provided, no real progress was made on that issue yet. However this week there will be a meeting of all the open data people, me and another intern to discuss how to implement some ideas they have been working on for a while. Specifically how to implement these ideas to have the greatest effect on engagement of undergrads with the open data service.

Posted in Open Data.


Presenting my proof of concept – Week 7

Coming back from my week away, This is Malware had moved along and was much closer to being released than when I’d worked on it two weeks ago. There were still a few features that needed adding and I got to help out with some of those, mostly on hiding UI elements depending on user privileges. My next small helpful task was to test out some of the online eprints functionality after changes had been made to the plugin structure, reporting bugs back to Patrick who then fixed them.

I also got to help write up the documentation for the profiles.soton architecture options. Profiles.soton is a profiles system that is in the early stages of development. The documents and diagrams I helped put together are going to be taken to TAG, the Technical Architecture Group, on Monday morning. I’m hoping that my diagrams and analysis of the options will encourage the group to go for the cleanest solution in my opinion, which involves the most changes to existing systems and storing more information in the Active Directory but results in a cleaner more easily maintainable overall system. Moreover it will allow for other systems to easily follow suit once that extra information is easier to access all from one source. Systems currently gather this information from various disparate sources that could easily lead to inconsistency.

Next I carried out a small manual data management task for the eprints system. Since updating to Pure, the eprints “shelf” system broke, and instead articles now need to be added manually to a project. This was a simple task of copying and pasting publication ID numbers into the web application and adding the result to a project.

The most important part of my week, however, was preparing to present my proof of concept web application to Service Management on Friday morning. I’ve worked on this web application on and off for a few weeks, and this meeting originally seemed like the final full stop on the project. The original plan was that if Service Management wanted a system like this to be built that they’d request it from the team and have a project added to the pipeline for development in a few months, long after I’ve left.

Presentation Notes

Presentation Notes

However, the demo was met with a more positive response than I could have ever hoped for, and they asked if we could make this live as soon as possible. Suddenly, the code I’d thrown together in a language I’d not touched in years using technology I hadn’t really thoroughly researched was being added to the team igit, and I began kitting it out with a test suite. In hindsight, I should have designed the system more carefully, and it may be the case that a lot of it gets rewritten to make it suitable for actually going live. I believe by the time I’ve got this project ready for deployment, I’ll have really thoroughly learnt to always keep my code clean and maintainable regardless of it’s purpose and scope. As difficult as it will be to clean this up and get it production ready, I’m thrilled that something I’ve worked on during this internship could be live by the time I leave. My goal is to have it ready in time for Fresher’s Week so that new students adapting to life at the University will be able to take advantage of an open more accessible knowledge base.

 

Posted in Programming.

Tagged with .


Open Data Internship – Week 7 – In the name of progress

“OK, so the plan is simple, we cut through the vent, squeeze down the air duct, take a left and avoid the IR lasers, before carefully dropping down onto the desk so as to prevent setting the pressure sensors off. Before diving though the door and into the office. So any questions?”

“Yes. Why don’t we just go in through the door behind you, which is wide open?”

 

The past two weeks have been quiet, and for quite a few days I was the only person in the office and as such have been left somewhat to my own devices. This has meant that I have primarily been working on tools for maintaining the data behind the campus navigation map, which is currently a large JSON file.

Whilst developing these tools I have been considering ways to improve the life span of the programs I am developing. The main way I have been working on this is by trying to keep the application as far abstracted from the data as possible. To achieve this I decided to create a schema for the data being used in the map, which is used to create the interface for the data view/editor that I have been building. A frame for the data is created from the schema into which the data is placed, this allows the data structure to change as a user could change the schema and the data would still display, albeit with holes where new unfilled fields exist.

To achieve this I created a text file which was essentially a JSON file with the ‘{‘ and ‘}’ characters replaced with tabs. Unfortunately at the time I did not realise that was what I was doing and went on to write a not insignificant amount of code to read this file into a python dictionary. Mildly impressed with what I had done I showed it to a friend who pointed out that I was essentially reading in a dodgy JSON file and therefore if I changed the format to JSON I could read the file in using about three lines of code. The following morning I spent a good five minutes trying to justify to myself a way to keep the work I had done in my project, as it felt wasteful to delete it. However, in the end it was axed. This did get me wondering about things such as pair programming, which I usually dismiss as a waste of time and resources, as since only one person can type at a time, and people have the annoying habit of having unique(ish) thoughts, which could lead to conflicts over how to achieve tasks. However here it would have saved me half a day’s work. Eventually I reconciled these two views with a compromise, which I think balances efficiency with… well efficiency, that is with regular code reviews. As you still get two people programming at the same time, but also help to prevent programmers from running too far down the wrong path on how to solve a problem as the reviewer can point out such errors, as happened in my case. This would further help with maintainability of code as if you have multiple people regularly reading the code then the comments on such code should be useful.

Posted in Open Data, python.