Skip to content


Rayna Episode 6: I’m a Web Developer

Weekly Rayna!

This week I developed some web. I’ve come such a long way that I barely remember the week.

Kanban Chart

Before this week my development process was none. I was simply trying to accumulate as much functionality as possible by stacking chunks of code on top of each other. This was making me anxious, as you might have noticed in previous episodes of the blog. It was time to establish some structure!

I was introduced to the wonders of SourceKettle and Kanban. I used to think SourceKettle was a GitHub alternative similar to iGit. Now I’ve discovered that it has advanced task management features and the draggy feature (dragging tasks between columns!!). You could say I’m a Kanban fan.

My Kanban chart was quickly populated with all the tasks that would make my system basically usable. This was going to be my first sprint which was estimated to take 2 weeks.

I think I’m almost done with this sprint so I might have some time to spare. Woooohoooooooooo!!!

Web Dev

How do I explain what I did without making it 100% boring… I made the back-end for asking questions, answering questions etc etc etc, then I made pages for asking questions, answering questions etc etc etc.

On the front-end, I am now using the University template which uses Bootstrap 3. It doesn’t exactly inspire my artistic senses, but it’s starting to look decent. I still have the option to abandon the UoS theme and plop all my stuff onto a blank white page and I think it would look better. I hope I’m allowed to do that. 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂 🙂

Screenshot dump

Lecturer module page – split into Unanswered questions & Answered questions (ignore the ANSWERED labels – I was playing around):

Student module page – split into My Questions and Public Questions:

Answered question page:

Unanswered question – lecturer view:

Ask a question page:

Next Week

Next week I will keep going until I finish this sprint and I will start the next one. Yay!

I only have 4 weeks left :'(

Posted in Uncategorized.


Getting a Blessing from ECS & Starting Development – Rayna’s Weekly Intern Blog #5

Heyyyyy there, it’s time for another weekly recap of what I’ve done at work!

Getting a blessing from ECS

On Monday morning Patrick and I went to Highfield to meet up with Nick Gibbins, Director of Programmes for Computer Science, who has been the primary voice of the lecturers in this project. I expected him to mostly disapprove of my proposition, given that he was a fan of the Stack Overflow idea where students can answer each other’s questions. To my surprise, he agreed that a system where lecturers are the only ones to answer questions is feasible and would be useful. He also agreed that for Stack Overflow to work, students would have to all jump ship from their current platforms at the same time and migrate their communication on there. The only compromise we had to agree on was to drop the anonymous questions idea, because it would cause problems however we decide to implement it.

This approval of my system idea gave me internal peace after a long phase of uncertainty over the project. Phew!

Starting development

I finally started coding. I hadn’t started doing that beforehand because of my anxiety over the concept – I didn’t know if anyone was going to want the system I had in my head. I also didn’t know what the exact exact requirements would be.

I used my previous dummy Laravel project (as seen in Weekly Blog #2 I think) as an example and adjusted it to fit the new system. I quickly ran into things I didn’t understand or didn’t know how to do – e.g. many-to-many relationships between models. Due to the fact that I’m new to Laravel, I was doing things in a random order to accumulate enough code to get it to do something. The system actually has some complexity to it so it took me a while to get it to a semi-semi-semi-usable state where things are hooked together.

So far I’ve got it to a point where:

  • users can be created (through the command line)
  • modules can be created (through the command line)
  • courses (instances of modules) can be created (through the command line)
  • users can ask questions through the web interface!
    • and it populates the list of courses in the drop-down with the user’s modules
  • courses can be viewed through the web interface, including all their questions
  • questions can be viewed through the interface
    • if it isn’t answered, you can answer it
    • if it is answered, the answer is displayed

Testing

I was instructed that I actually should be doing test-driven development, aka I should be making tests before I develop the respective features. I knew that!

Kev was very helpful and he explained a lot to me about how testing works in Laravel and PHP in general. It was a lot to wrap my head around but I’ve started creating some tests and I’ve made a long long list of tests that I need to write. Wooooooooooohooooooooooooooooooo

Next week

  • Make a KANBAN CHART!
  • Finish my tests such that they cover my currently developed functionality
  • Convert my process to test driven development from now on

See you then!

Posted in Uncategorized.


More Selenium & More Project Planning – Rayna’s Weekly Intern Blog #4

Hey everyone and welcome to your favourite blog!

This week has been pretty uneventful and honestly not my most productive week. I’ll try to pick up the pace next week and really do a lot of hands on stuff.

Selenium continued

Recap: 7000 records on Pure have a link that needs to be removed; we are only allowed to do it through the web interface. Last week I made a script that more or less does the job. I tested it on 50 records and it worked. I spoke to the relevant person and they came this Monday to see it in action. They were very impressed because it looks like a phantom that uses the internet.

Here’s what needs to happen from now on:

  • Patrick needs to approve the code
  • Appsman team need to confirm that we’re allowed to run Selenium on Pure
  • A person from Pure needs to confirm the same thing
  • The code will then be tested on the dev version of Pure where the database has been copied fairly recently
  • If all goes well, it can run on the live Pure :O

After I showed the script to the person I’m making it for, I worked on cleaning it up and making it more aesthetically pleasing to the programmer.

Q&A System Planning Continued

Last week I emailed Nick Gibbins with an idea for a system which allows students to send questions to lecturers anonymously. It also allows lecturers to share their answers with everyone so that people don’t keep asking the same questions. His feedback was that the anonymity feature should be dropped. I personally am quite fond of that feature because it has no equivalent currently. So I decided to propose a system with optional anonymity. I am having a meeting with him on Monday to see what he thinks of my idea.

Here are the features I’ve settled on:

  • Website linked on the module pages of the ECS Intranet
  • Students can send questions to the module team regarding the module
  • Students can tick a box which makes them “anonymous” i.e. hides their identity from the lecturers
  • Lecturers can answer a question
  • Lecturers can make an answer public which shares it with everyone on the module
  • Lecturers can dismiss a question if it isn’t worth replying to
  • Lecturers can report a question which temporarily bans the sender from using the system
  • Lecturers can tag a previously answered question in their reply to a question
  • Students can tag a previously answered question in a new question
  • Students & lecturers can control what email notifications they receive

I’ve tried to achieve a balance between students and lecturers so that my system doesn’t get completely rejected by teaching staff. Here are some of my arguments as to why:

  • Appealing to students because
    • It sends questions directly to lecturers, which is valuable
    • It allows anonymity, which means shy people are more likely to ask a question
    • It keeps track of all the questions that have been asked and answered in the same place
  • Appealing to lecturers because
    • It allows them to share their answers with everyone so that they don’t have to answer the same question multiple times
    • Pages for modules will be set up with no action needed from lecturers
    • It isn’t truly anonymous so offensive users can be identified
    • It can be switched off entirely by the module leader

I also made a big set of interactive wireframes which I highly recommend checking out.

>>>WIREFRAMES

PRO TIP: click on the play button on the top right for immersive experience

I also started writing down the formal requirements (user stories) in a word document by following another requirements document from our team. I did the wireframes first to make it clearer for myself what I wanted to do.

Tiny bit of Laravel and mod_auth_mellon

I also watched a couple more Laracasts which were on authentication. It looks like Laravel makes it really easy to set things up.

On Wednesday, me and Kev had a meeting with Clayton who explained to us what mod_auth_mellon is and how we might use it to authenticate our users. Kev understood everything and I understood almost nothing due to my lack of prior knowledge about that stuff. Oh well! I’ll learn it as I go.

Staff Party

I was the only person from TID Web who attended the Staff Party on Wednesday. I went with a bunch of interns and we had a good time. I got two ice cream cones because the second time I was able to cut the line to where my friends were. The lines were crazy long – I guess it was hard to find a second ice cream tricycle for the occasion.

THE END – SEE YOU NEXT WEEK

 

 

Posted in Uncategorized.


Questioning Everything & Selenium – Rayna’s Weekly Intern Blog #3

Recap

Last week I did some hardcore data gathering. I interviewed 8 students and 3 lecturers to see if I should make a system for asking questions anonymously in lectures. Then I discovered Meetoo – a system already purchased by the university which does exactly everything that I was about to implement.

I figured another worthy idea was to make something like Stack Overflow but simpler and more geared towards students. Lecturers and students were fans of that idea generally, but I didn’t gather quite enough evidence that it was worth making.

At the end of the week that was pretty much the only idea I had for my project, and I wasn’t exactly sure of it either. Oh well!

This week – questioning everything

On Monday and Tuesday me and Patrick were trying to figure out what I should do. Because most of my data gathering was focused on something else, I didn’t really have enough evidence to back up any project idea, so we were questioning everything. Here are the main points that we came up with in our discussions:

  • The Stack Overflow idea has been attempted before in different ways, mainly STACS Overflow, which was ECS-exclusive and was discontinued due to lack of interest from students
    • It likely won’t be successful this time around either
    • The fact that students can answer questions means that lecturers are less likely to engage with it
  • Making a system for ECS students is more likely to be impactful than trying to make a system for the whole university
    • Because we can integrate it with the ECS Intranet where students will find it
    • Because I have way more ECS friends that I can interview (including interns)
  • Alternative idea – questions are sent privately to the lecturer and are only posted publicly if the lecturer chooses to publicise them
    • Forces lecturers to engage
    • May fail for that reason
  • Anonymity => students say mean stuff anonymously

After MUCH debate we settled on the following idea:

  • Tab on the module page on secure.ecs
  • Send a question to the lecturers for the module anonymously
  • Lecturers can choose to:
    • answer
    • dismiss
    • report
    • if answered, they can choose to post the question & answer for everyone to see
  • If a lecturer reports a student, the student receives a ban from the system
  • The module leader can choose to disable the Q&A system altogether

I made a very detailed document describing this idea on Google Docs, but it seems to have been reverted to a very old version. How?? Anyway..

I emailed Nick Gibbins (Director of Programmes for Computer Science) to ask for his opinion on this idea. He had some criticisms, but generally wasn’t 100% opposed to it, to my surprise. He was more of a fan of the Stack Overflow idea, but if I have to listen to the facts, it’s been done before and it wasn’t popular, so my hands are tied. 🙁

He also proposed to drop the anonymity part, because if someone abuses the system very badly a la Professional Development snafu, they would have to be de-anonymised to face justice, which might defeat the purpose of the anonymity. I think optional anonymity (tick box) is a good option to encourage people to not be anonymous.

It seems that the idea as a whole could work. Yay!

Selenium

So there’s this problem on Pure where 7500 records point to a link which no longer exists. We aren’t allowed to fiddle with the database, so they have to be removed by hand.

I timed myself doing it by hand and it takes 20 seconds per record at the fastest. That means 38 hours in total of intense clicking.

Thankfully Selenium exists and I had explored it a little bit while working for the Quality & Test Team. As far as I’m aware, Selenium should be allowed, given that it’s the same as a super fast person clicking around.

I was given access to a sandbox version of Pure to play around in. First I made a program which creates records, because that part is even more tedious and time-consuming. I left it cooking overnight and I had ~500 records the next day. I then made a script which removes a link that partially matches a string. I tested it on 50 records and it worked flawlessly.

The only thing left to do is to modify it so it works with the live version of Pure (slightly newer) and the real records. Oh and also getting approval from the people that have given me this task.

Next week

  • Meeting people who will hopefully approve my Selenium script
  • Making a start on the Q&A project in Laravel
  • Learning what an authentication melon is (mod_auth_mellon)

Thanks for the attention – like, comment & subscribe!

Posted in Uncategorized.


Talking to People in Highfield – Rayna’s Weekly Intern Blog #2

Recap

Last week was my first week as an intern. I spent most of it learning Laravel from scratch. I also planned my requirements gathering for my project, which would involve 2 focus groups with students and interviews with as many lecturers as possible. These were meant to be done this week.

My project was going to be a system for asking questions anonymously in lectures, which addresses the intimidation of raising a hand in a lecture to ask a potentially dumb question.

Focus Groups – gathering info from students

Focus Group 1

My first focus group had 3 students from archaeology, English-and-film and computer science respectively. We brought pizza as an incentive for the students to participate.

We learned the following things:

  • Students are, in fact, intimidated and not likely to ask questions by raising their hand in a lecture
  • Another problem is when lecturers ask students a question – students are too scared to answer for fear of being dumb
  • Therefore, students would like to be able to ask or answer questions anonymously online during a lecture
  • Another useful feature would be to indicate that they don’t understand at a certain point
  • Paying attention in a lecture is crucial, so a distracting app might do more harm than good
  • The key features that students value are anonymity and getting answers from the lecturer

The full notes of what they said can be found here, and the report (which includes their wireframes) can be found here.

Focus Group 2

The second focus group had 4 students all from computer science, all Bulgarians – it was a Bulgarian programmer pizza party.

They provided the following extra information:

  • They too never ask or answer questions by raising a hand (with small exceptions)
  • They agreed with all the previous points, except the below point
  • They thought in-class quizzes were unfeasible because they’re extra effort for lecturers that they don’t need to put in currently
  • As computer scientists, they were leaning towards a Stack Overflow style question and answer platform, where students can answer and upvote questions and answers
  • They suggested that students shouldn’t be able to downvote questions, as there is no reason to discourage people from asking questions

Full notes of focus group 2report of focus group 2

Discovering that an app like this already exists

It had been said to me by some students that an app similar to what I’m planning to make is already available on the MySouthampton app, called Meetoo. I kept that in the back of my mind with the intention of researching it later.

Well, maybe I should have researched it earlier because once I looked into it, I discovered that it had 90% of the main features people asked me about:

  • People can ask questions (potentially anonymously) during a lecture
  • People can upvote questions to push them to the top
  • The lecturer can launch a poll with a question and people can vote anonymously on their phones
  • Stats from the poll can then be displayed immediately afterwards

They even had a video on YouTube of Southampton students praising the app! Ugh!

With this in mind, I went to do my lecturer interviews with a somewhat reduced level of confidence.

Interviews – gathering opinions from lecturers

It’s not that I didn’t try to find any lecturers from outside of ECS – the 6 lecturers that I emailed just didn’t reply to me. I don’t blame them – they have no connection to this so they probably wouldn’t prioritise it over other tasks etc.

I interviewed 3 lecturers from ECS – Nick Gibbins, Kirk Martinez and Klaus-Peter Zauner. I showed them wireframes for 3 different ideas for systems:

  1. A system for asking questions anonymously during lectures (like the one that already exists, sigh)
  2. A system for asking questions anytime anonymously online (like Stack Overflow but simplified and mobile friendly)
  3. A system which only has a button which says “I don’t understand”

All wireframes of the system can be found in the report linked at the end of this section.

The opinions were pretty consistent and can be summarised to this:

  • It is hard to see how System 1 can be done in a way that’s convenient for the lecturer
  • It is preferred that the students ask questions by raising a hand
  • System 2 seems like a good idea if executed correctly
  • It should encourage students to answer each other’s questions
  • System 3 could be useful but faces some of the challenges of System 1, mainly having to adjust the lecture flow to fit it
  • System 3 might put too much pressure on the lecturer & give too much control to the students that might not know what’s best for them

I could not get a definitive answer of whether System 2 is worth implementing, but the general opinion of it was positive.

The detailed report of the lecturer interviews can be found here.

Conclusions from the data gathering

  • The initial idea exists and is provided by the university (Meetoo) and is satisfactory as it is
  • There is some interest in the idea of anonymous questions anytime with written answers (stack overflow style)
  • I’m not sure if I need with more data gathering

I’ve learned a lot but I am confused. Maybe this is what wisdom feels like.

Bonus – Accessibility workshop!

The team went to an accessibility workshop in Highfield, where they taught us some stuff about accessibility. Yay!

I learned that there are some handy guides for accessibility online, there is incoming legislation which will make it more mandatory, and that there is no way to 100% assess accessibility in an automated way, although tools exist for it.

Next week

I’m not really sure what I’ll do next week. Possibilities include:

  • Another focus group with students with a focus on the stack overflow idea
  • More interviews with lecturers, hopefully from outside ECS
  • Writing up requirements properly
  • Trying to make a Hello World in Exchange Web Services
  • None of the above

Have a nice weekend!

 

 

Posted in Uncategorized.


Getting to know Laravel – Rayna’s Weekly Blog #1

Introduction – about me

Hi, I’m Rayna (pronounced similarly to China) and I am the new TID intern. I just finished my 2nd year studying Computer Science and I am an aspiring web developer, as demonstrated on http://bit.do/raynaslinks – the best website in the world. Last summer and through the academic year I worked for the Quality & Test Team in iSolutions, so I’ve been here for a bit.
My main project for these 12 weeks will be a system for anonymously asking questions in lectures. Other tasks of mine will include manually removing an obsolete link from thousands of records in Pure, as well as anything else my team wants me to do.

What I did in my first week

My first task was to set up Linux alongside Windows on my work laptop (because I wanted Linux). It took a few hours of mainly waiting and unsuccessful attempts at making a bootable flash drive. In the end I had a healthy new Ubuntu and unaffected Windows. Then I had to set up Laravel, which was straightforward.

I spent most of the week watching and following the official Laravel tutorials (Laracasts) to familiarise myself with Laravel. I developed some understanding of it and I was able to produce a stub of my intended project with very limited functionality.

The screenshot below shows the initial screen, where you can type in the “lecture code” (the lecture’s number in the database) and it opens the lecture’s page.

On this screen you can see the title of the lecture along with the lecture code, as well as all the questions asked so far. You can also add a question. There are no users and logins – the prototype is very simple.

I also planned my requirements gathering process (described in the next section), contacted people to participate and prepared questions to ask them & wireframes to show them to spark a conversation.

What I will do next week

Next week the requirements gathering will begin. We’re putting together a focus group on Monday with students from different disciplines (excluding ECS). The focus group will aim to capture their current experience asking questions in lectures, their experience with any existing solutions for anonymous questions, and their expectations from an ideal system with that purpose. Later in the week another focus group will (probably) happen, where ECS students will be interviewed. The reason for the separation is that ECS students tend to have loud opinions on technology which may overshadow or contrast with the views of less tech-oriented students. Some students might be asked to participate in individual interviews later in the week also.

The information gathered from students will be used as a basis for interviews with lecturers (assuming students will agree with the idea of the system). We will aim to interview lecturers from different disciplines (e.g. heath sciences + ECS + humanities) to capture a more broad set of experiences and requirements. Lecturers will also be shown wireframes and will be asked for any comments or criticisms.

On the technical side, I shall be trying to “help” my teammates with their endeavours which involve:

  • Automated Testing
  • Pure
  • Team City

Another big task will be analysing the data from the focus groups and interviews and writing up formal requirements, as well as making wireframes of the design of the “final” system.

Posted in Uncategorized.


Meepcraft Minecraft Cafe Checklist

The University of Southampton kindly lets me sign out laptops so I can run a Minecraft cafe for “The Big Day In“, which is an autism-friendly family event. This morning I’ve just been preping the laptops and I thought it might be helpful for other people running a similar event. I’ve learned lots of things that will help make this go more smoothly.

Meepcraft setup

  • Whitelist our accounts in advance

Laptop setup (from fresh install)

  • Connect them to my home wifi
  • Disable the trackpad feature which disables the trackpad when using the keyboard… MC needs you to do both at once.
  • Open a web browser
  • Search “Minecraft Download” (It’s easier than navigating from the MC homepage)
  • Download Java edition installer
  • Install it with the default options
  • Tell it: Yes I want to let it make changes to my system
  • Tell it: Yes now finish and open minecraft
  • Log in with a unique account
  • Wait for more stuff to download
  • In Minecraft
    • Set Master volume to 50%
    • Set Music volume to 0%
    • Set Weather volume to 50% (or 0%?)
    • Open multiplayer and tell the OS it’s OK for it to be connecting to the network.
    • Add Meepcraft to the saved servers and check it worked.
  • Bonus; Some kids want to see mods, so getting technic launcher, FTB or something set up would be good, but some of these have merged into Twitch and now require accounts set up and other bother.
  • Download any maps we might want on the local machine (London, Southampton, Ventnor, Redstone)

Accounts

  • The first MC event I ran, I logged in once then disabled wifi to stretch a single login, but this is a bit naughty. And you can’t go online or their servers will spot multiple logins and that’ll be a bother.
  • The next few we “borrowed” logins from friends who let us use them for the day which was better, but a pain to organise.
  • Now we have a set purchased for outreach events, with logical user names & email addresses which makes things much easier. Even if Southampon04 is a boring name.

Stuff to bring

  • Laptops are heavy and it’s easiest to be self-sufficient, so a rucksack is good.
  • Pack the plugs in a cotton bag to stop them scratching other things.
  • USB mice as most kids don’t seem to like trackpads as much.
  • 3D glasses  – these are fun for “anaglyph mode”. They are dirt cheap too so an easy give away.
  • A bag of cardboard Minecraft blocks, which aren’t essential but as I’ve already got them they can help decorate the space, and sometimes distract annoying little sibling.
  • Wetwipes. I’ve never actually needed them but I’m always aware of the risk of jammy fingers on laptops I don’t own. The fact it’s not been a problem suggests that most kids are already used to not getting jam on iPads.
  • Child-friendly laminated explanation on how 3D anaglyph works.
  • Booklets on educational maps like London & Redstone. These include teleport locations of landmarks for London & Southampton. These are from the University Science and Engineering Family Day, and I don’t expect to need them but it can’t hurt if someone’s keen.
  • A memory stick for moving data around.
  • Power supplies for all the laptops.
  • A power bar with enough sockets for all the laptops.
  • Additional sockets for people who bring their own laptop.
  • A mains extension cable.
  • Tape to make the cable safe in case the cable crosses a walkway.

Ideally the electrical equipment should have an in-date “PAT” test.

For this event I’m not planning a mechanism to limit how long a child uses a laptop. With the science and engineering day I tried timers (some kids cheated on them), so the next year I just hit a big gong every 15 minutes which worked OK until I thoughtlessly bonged it with someone’s elderly gran sitting right by it!

 

Posted in Minecraft.


GDPR preparations

I am not a lawyer. This blog post represents my own understanding and not an official view of my employer.

The key thing with GDPR is that we’re not going to be perfect, but we can certainly improve. I’ve been working on my team’s GDPR activities since January, and as our team deals with many small, innovative and legacy databases this was… interesting. I’ve audited 88 “information assets”, most of which contain information about people in some way or another.

The single most useful thing we’ve done so far is turn off (and erase) a few datasets and services that were not actually required. I’ve also identified about 4 datasets that I was pulling from source and pushing into systems that no longer exist or now get their data elsewhere.

I’ve been trying to boil things down to some places for people to be thinking about. This isn’t a complete list, it’s more about how to spend limited resources in the most appropriate way.

Why bother?

The obvious answer is “to avoid fines”, but the better answer is “to do right by our users”. Data breaches should be minimised, but we shouldn’t just concentrate on avoiding them, but also minimising the impact if they do happen. Not holding data inappropriately, or having inappropriate locations or access to it can go a long way to reducing the actual harm caused. Keep this in mind and you’ll be on the right track.

Audit all the things!

We need to be aware of what systems we are responsible for that have personal data, what data it contains and why it’s used.

This chore is largely done for our team.

Start with high risks

A common problem I have with addressing GDPR is “what about… <obscure system that has a users table>”

Many of our systems have records of usernames or other IDs that caused actions, eg. approved something, updated a wikipage, updated a webpage. While these are technically in scope of GDPR, they are at the bottom of the TODO list.

For example; we’ve a few dozen sites using the Drupal CMS. Each site has under ten users. The user records may contain a person’s name, they do contain their email address. The contain a list of what that person has edited on the site. It also has the implicit metadata that this list of users are the editors of this website. However, the risk of breach is low to medium. Drupal is a target for hackers, but generally not to steal this kind of data[citation required] (this is an admitted assumption on my part). The damage done by such a breach is also relatively low compared to leaking a larger list or more detailed information.

I find making this calculations difficult, because it feels like I’m saying such a breach doesn’t matter. It does, but other risks matter more and should receive more investment of resources to both prevent happening, and mitigate the consequences if thy do happen. Which brings me nicely to:

Shut off anything unnecessary

This is work we should be doing right now!

Any system which is no longer needed should be shut off.

Systems that have been shut off for over a year should probably have their data erased entirely. It’s tempting to keep things for ever “just in case”. We have to stop doing that. If in doubt, agree it with management and/or the person doing the role that owns that data.

Some systems have unused tables or fields with data about people. These should just be removed.

Some systems have data feeds to/from other systems which provide more information about people than is required. Any reinquired fields should be identified and removed from the feed rather than just ignored by the target system (which is what we sometimes lazily do now).

Remove unused cached copies of personal data.

A more subtle thing is also who has access to data about other people. It’s easiest to remove data entirely, but if that’s not possible then consider how to restrict access to only people who need it. Does everyone need to see the fields with more sensitive information.

It’s worth telling our GPDR team when work like this is done, so they can note that the work was done, or note the system is now off/erased.

Reminder; sensitive data

Breaches regarding data which can cause additional harm to the subject are treated as more serious. The official list is as follows

  • the racial or ethnic origin of the data subject,
  • their political opinions,
  • their religious beliefs or other beliefs of a similar nature,
  • whether they are a member of a trade union (within the meaning of the Trade Union and Labour Relations (Consolidation) Act 1992),
  • their physical or mental health or condition,
  • their sexual life,
  • the commission or alleged commission by them of any offence, or
  • any proceedings for any offence committed or alleged to have been committed by them, the disposal of such proceedings or the sentence of any court in such proceedings.

I’ve been checking for any unexpected issues and found a few surprises:

  • membership of certain mailing lists could indicate trade union membership, sexuality, ethnic origin, religion.
  • “reason for leave”, ie. why someone was off work can include physical and mental health info.
  • I also read through every reason a student has had an extension to a coursework deadline, as this is recorded in the ECS Handin system. It’s a free-text field, but thankfully people have used it responsibly  and just list where the authority for such an extension came from. Although there’s a batch that just say “volcano” which is the coolest excuse for handing your coursework in late!

Data protection statements

This is another thing we should already be doing.

Any service which collects information from people who are not current members of the university should have a statement clearly saying how that data will be used. If you think you might want to use it for another purpose (eg. analysis) later, say so, but don’t be too vague. eg. If someone signs up for a university open day, are we going to add them to our database to send them more information on related topics, or keep this request on file for analysis? We probably are so we should say that.

EPrints lets people request an unavailable paper, and that logs the request and their contact info and passes it on to the creators of the paper. You know what? We probably do want to do some analysis on how that service is used, so we should say so up front. I’m thinking something like

“We may also use this information to help us understand and improve how this service is used. Other than passing your information to the creators of this work, we won’t share individual details outside the University of Southampton, but data aggregated by internet domain, country or organisation might be shared or published in the future.”

While most of the information our staff and students submit into our systems probably doesn’t need additional data protection sign-off, it still may  be required if we’re going to use that data for something unexpected or not to do with their relationship to the university. eg. If we collected data on how email was used by our own members for service improvement, that’s probably not needing a specific statement. If we were using it for a research project, then consent would be required. If in doubt, ask the GDPR office.

Data retention periods

For all retention, it’s a trade off. It may harm people to keep the data to long. It may harm people not to keep it long enough.

The university has several key sets of people we have data about:

  • Students
  • Staff (and “Visitors” who are treated like staff on our IT systems)
  • Non staff and students who interact with us.
  • Research subjects (eg. people a research project collected data on)

Research project’s data retention is usually very clear, and handled as part of ethics. As GPDR beds in, the GDPR principles should be incorporated but consent is generally already given.

Data on interactions with the public (eg. open days, logged IP addresses, conference delegates) will all have an appropriate retention period but it’s not yet clear what these will be.

For data about staff and students the retain period will either be years since the data was created, or years since the student graduated or the person left the university. Possibly it could be a period after they leave the job post.

What we should be doing right now is have a plan for how either of these retention policies could be implemented. I think it’s more likely that the years-since-data-creation method will be used for most things as it’s so much more simple.

SARs: Subject Access Requests

It’s likely to become more common for people to ask for what the organisation knows about them. No all information is covered by this, but we should be ready for it.

What we should be doing right now:

Document the primary way people are identified in each system you are responsible for?

  • Staff/Student number
  • Email – the university provides everyone with several aliases to make this more complex
  • Username
  • ECS username – ECS had a different accounts system and 180 staff who’ve been here forever have a different username to their main uni one
  • UNIX UID
  • ORCID
  • An ID local to this system (if so, is that linked to any of the above? If not how are we going to identify that it’s the right person?)

Think about how the person, or an authorised person could extract all their information from the system in a reasonable form (XML, JSON, ZIP, HTML, PDF…). For many of our systems this would currently be a set of manual SQL queries, but where possible these should be available as tools for both the person to get their own and admin’s to get anybody’s.

These requests are coming. We don’t know exactly how, but it’s probably some people will make them just for curiosity and ask for everything they possibly can. We need to keep the costs of these down.

Obviously, if responding to such a formal request, ensure that you only pass the data on to an appropriate person in legal handling the request. More formal methods are likely to evolve.

The right to be forgotten

Under the old DPA people have always been able to demand that information held about them is correct. In the new one they can also request to have information about them removed.

It seems very unlikely that current staff or students will make such a request about their current job or course, and unclear if that would be a reasonable thing to request. However someone could ask for information about a past course or post to be purged. It’s impossible to find every file or bit of paper, but quite likely we might be asked to remove them from a given system.

What we should be doing right now; considering how we would do this on systems we run, and if it’s a likely request, start to implement features to enable this.

Finally, email mailto: links

This is a novel but simple way to reduce data breaches caused by people picking the wrong email alias from their address book. When someone clicks on a mailto: link in a webpage, it’s usually just the format “acronymsoup@example.org”. However, you can write email addresses with the display names included, so that when people mail to them, it’ll save the human-readable name into their local address book and ensure they are less muddled about what they are sending to. Sometimes big lists of people have similar strings of characters to emails of a single role or office. This can cause data breaches.

Compare these two mailto links:

Continued…

Posted in Best Practice, Data, GDPR.


Maturation of an organisation’s open data service

Our open data service, data.soton.ac.uk, has been around for a long time now. Most of our sister services at other UK universities have been de-invested and are gone or limping along quietly. Currently, we still have a full time open data specialist, Dr Ash Smith.

What matters to our “decision makers” isn’t open data. It’s reputation, student satisfaction, sustainability and saving money. Our open data service enables most of these, with the possible exception of sustainability. I’ve been thinking about how to reframe what we actually have from these perspectives, rather than from an idealist early adopter viewpoint.

What we’ve actually built is a corporate knowledge graph that only contains public information, and primarily contains current information. Our archives actually have what the menu was for lunch in the staff restaurants, and student piazza, every single  day for the last few years. Nobody cares. It’s just what is for lunch today that’s in the live knowledge graph (aka. SPARQL database aka triplestore).

What has open data done for us, anyway?

Having this information all in one place has enabled some valuable services to be produced by our own team. As it was already cleared as open data, there’s no hassle getting permission to use it to make a new service, even though it contains data from several different parts of the organisation.

The crown jewel of the services it enables is maps.soton.ac.uk, but there’s a number of others. The actual “open data” can be viewed as just another service enabled by this knowledge graph. One of the easily missed, but useful features is the ability to view and download simple lists of things, like buildings or parts of the organisation, and to view a page per, er, thing with a summary of information available about that thing. Of these pages, the most valuable is the pages for rooms used for student teaching. These are linked to by the timetable system so are a part of our information infrastructure now.

The open data has enabled several useful developments. Primarily excellent maps of campus produced by Colin Williams (PhD Student) and Chris Baines (Undergraduate). The problem with these maps is that they were so useful we needed to support them when they left and the best approach for us was to nick all the good ideas but rebuild our map from scratch. The current map.southampton.ac.uk wouldn’t exist without their work, which only happened because the data was and is open, so they could play.

Another innovation Colin inspired, was the augmentation of corporate data. Our university didn’t have a good database of building shapes (as a polygon of lat/long points), reference lat/long per building, photo of each building etc. Colin started work producing these as datasets which augmented the official data we get from Planon and since then we’ve hired summer interns to maintain and improve this. This includes the lat/long and a photo of most building entrances. Which wasn’t that much work for interns to create, and needs little curation as not many entrances move or change each year. Once the entrances have ID codes, we can then start to make more data about them, such as which entrance is best to get to a lecture room.

Where we’ve seen less return on our investment is in providing resolvable URIs that give data on a single entity. These return RDF and the learning curve is too sharp for casual users. I’ve spoken to people using regular expressions to extract data from an RDF/XML page, and that is a mismatch between what our users need and what we provide.

Sadly, organisation open data services have not caught on. Yet. It’s still not normal, and I suspect Open Data is just starting it’s way up the “Slope of Enlightenment“. The recent work on UK Goverment open registers is a great example. It’s simple and knows what it’s there to do. It’s learned lessons from data.gov.uk and gov.uk and it’s built on a really well designed API model, that unless you look you wouldn’t notice how simple and elegant it is. It’s a normal and sensible thing for any government to provide in the digital age. It provides official lists of things and the codes for those things. This is simple and valuable, like having a standard voltage for mains, and the same shaped plus, or train tracks in Southampton and London being on the same gauge.  It’s clearly good sense, but didn’t happen by luck.

Our work on the open data service has also taught us loads and I’m proud to have helped lead a session at Open Data camp in Belfast, which produced a document listing crowd sources solutions to improving open data services, and a few years back Alex Dutton (data.ox.ac.uk) and I produced a similar document listing our experiences in dealing with the challenges of setting up an open data service. I’m really proud of both of those. The meta-skill I’ve learned is to be more introspective, both as an individual, and as a community, so we can work out what we’ve learned and share it effectively. Hey, like this blog post! Meta!

Where are we now?

Where we’ve stalled is that we now have all the corporate data that’s practical to get, so new datasets and services are becoming more rare. One of our more recent additions was a list of “Faith-related locations in Southampton“. Which has value to both current students and students considering moving to the city, but from a technical point of view was an identical dataset to the one listing “waste pickup points” for the university. With the exception that a picture of a place of worship is usually quite nice, and a picture of a bin store is… less so.

Over the summer 2017 we had our intern, Edmund King (see this blog for his experiences) experiment with in-building navigation tools. The conclusion was that the work to create and maintain such information for the university estate was to expensive for the value it would provide. When we did real tests we discovered lots of quirks like “that door isn’t to be used by students” or “that internal door is locked at 5pm”, and these all massively complicate the costs of providing a good useful in-building navigation planner. Nice idea, but it can’t be skunkworks, and that’s a perfectly good outcome.

As new datasets are getting rarer, we’ve been looking more at improving rather than expanding. Part of this has been work to harden each part of the service, and get it running on better-supported infrastructure. The old VMs Edward and Bella have lots of legacy and cruft. The names come from the fact Edward used to do all the SPARQL but then the SPARQL moved to Bella. I suggested Beaufort and Edythe as names for the new servers but that’s mostly got me funny looks.

Another part of our current approach is the shocking move to retire datasets! Now we’re focused on quality over quantity, the “locations of interest for the July 2015 open day” dataset needs to just go away. It’s not been destroyed, just removed from public view as not-very-helpful. There’s also a few other datasets that seemed a good idea at the time but are more harmful than useful as they are woefully out of date, like our “list of international organisations we work with” that’s about 6 years out of date.

Where do we go from here?

The biggest issue is “how do we move forward as a service” or maybe even “should we?”. My current feeling is that yes, we should, but focusing with the knowledge graph to enable joined-up and innovative solutions, with open data as just another service depending on that, not the raison d’être for the project. Open data, done right, will continue to enable our staff and students to produce better solutions than we could have thought of and which we can sometimes incorporate back into our offerings. Last year a student project, on the open data  module, produced a facebook chatbot you could ask questions about campus and it would give answers based on your current location, eg. if you asked it “where can I get a coffee” it would identify that “coffee” was a product or service in our database, look at points of service that provided it, filter out ones that were not currently open, and send you a list of suggestions starting with the one physically closest to you. I investigated the complexities of running it for real, and found it was a bit brittle, needing 3rd party APIs and lots of nursing to understand the different ways people ask questions. Also, there’s big data protection implications in asking where people are and what they want in a machine readable way!

The point is that the open data stimulates innovation. Not as much as we like, and it doesn’t do our job as uni-web-support for us, just helps us find ways to do it better.

Long term I think the service needs to stop being a side-project. We should strip back everything that we can’t justify, and just have a knowledge graph be part of our infrastructure, like biztalk. We then turn the things built on top into normal parts of IT infrastructure. Ideally the pages for services, rooms, buildings etc. would merge into the normal corporate website, but this raises odd issues. We have been asked what the value is in providing a page on a shed. For me, it’s obvious, and that makes me bad at explaining it.

We could keep a separate “innovation graph” database which included zany new ideas, and sometimes broke, but the core graph database should be far more strictly managed, with new datasets being carefully considered and tested that they don’t break existing services.

What does the future hold?

In the really long term, well structured, auto-discoveable open data should be the solution to the 773 frustration. If you look at the right-hand side of that diagram almost everything is lists of structured information. That information isn’t special, either. It’s information many other organisations would provide, and with the same basic structure. One day maybe we can have nice discoverable formats for such information and get over using human-readable prose documents to convey it. We did a bit of work early on about suggesting standards for such information from organisations, but this was trying to answer a question that nobody was yet asking. I still think that time will come and when it does we’ll look back and laugh at how dumb 2018 websites were, still presenting information as lists in HTML. The middle ground is schema.org, with which I have a bit of a love-hate thing going. It’s excellent, but answering the wrong question. It helps you get your data into Google. I don’t want a my data needlessly mediated by corporations, but I get most people don’t really care so much about that.

The good news is once people have seen something done a sensible interoperable way it’s hard to go back. I can’t imagine people buying a house with just “Apple” sockets that didn’t fit normal appliances. Then again, computer systems are less compatible now than 10 years ago, so who knows for sure?

I’m optimistic that eventually we’ll achieve some sea-change moment in structured data that will be impossible to backtrack from. But such “luck” requires a lot of work, and we may fail many times before we succeed.

We didn’t quite change the world with data.southampton, but the by-products are valuable enough to easily have returned on the investment.

Posted in Open Data.


Exploiting the Bitcoin aftermath

I don’t fully understand the ins and outs of Bitcoin. What I do just about understand (correct me if I’m wrong!) is that the amount of processing it’s worth doing per day is a product of the price, and vice versa.

As Bitcoin mining is effectively an arms race to producing the most processing power per unit of electricity/money. To do this there is now custom hardware.  Graphics cards and “normal” supercomputers have their theoretical power measured in FLOPS (there’s other better measures, I think FLOPS is the BMI of supercomputing). The important point here is that it stands for “Floating point operations per second”. A floating point number is basically the same as scientific notation. A number and then the number of times to shift the decimal point left or right. Bitcoin mining uses hardware optimised specifically for integer operations, rather than floating point.

Why does this matter?

Well, it seems reasonably likely that Bitcoin might crash and make the vast amount of bespoke hardware deployed to mine it suddenly unprofitable. The owners will have the choice of scrapping/mothballing it or to try and monetise it in other ways which would mean a sudden glut of availability of processing power for integer tasks (unlike the more common  FLOPS). Maybe that’s something we should be preparing for? Could it have a significant impact on commodity data processing? Maybe there’s going to be a window where   teams with tasks  that can run on such hardware can get very good deals… until everyone else catches up.

I’m not experienced with supercomputing, so this is just a back-of-the-envelope theory.

Is this a realistic scenario? Have I overestimated the significance of floating point vs integer hardware?

Posted in Bitcoin.