Observing The Web: ethics in a data-sharing world
This event took place on a sweltering day in London, in the offices of TaylorWessing. Thankfully, the space was air conditioned – essential on a day where outside temperatures rose to well above 30°C.
Any discussion that involves the ethics of gathering, analysing and sharing data is always fascinating, given that there are so many situations and nuances that even Web Science students like myself haven’t previously considered, and this symposium didn’t disappoint. The proceedings were opened by Dame Wendy Hall (@DameWendyDBE) who presented us with a short history of the web and her role in its development, and touched on the perennial discussion ‘what IS web science?’. I get asked this a lot, and I seem to have a slightly different answer every time. The WWW is often referred to as a socio-technical construction, which is a way of saying that human action shapes technology. The hashtag I was using to tweet the discussion – #dataethics – is a good example of this. The hashtag was created as a way of being able to create a searchable topic of discussion on twitter, and has gone on to become incredibly important as a campaign tool. There is an alternative argument, though, that technology shapes people, which you might subscribe to if you think social media is making people passive and less likely to protest injustice, for example. Wendy suggested that ‘Data Science’ might be a better way to think of Web Science, although as she acknowledged that in itself isn’t enough. What IS data? We have web science students researching things like electronic identity which involves the law, current and future legislation, and issues like security. That doesn’t sound like ‘data’ as most people would think of it.
One thing that is apparent, though, is that the kind of data we might be more familiar with, such as our names, locations, tweets, Facebook posts, shopping preferences, online searches and other activity etc. is now being generated in huge volumes, and gathered by many, many interested parties. Some of those parties will be researchers, but many of them will be commercial organisations who value our personal data for a variety of reasons, not all of which will be reasons we might agree with. Also, they persuade us to surrender our data by clever, and sometimes highly dubious means.
This was the subject of the first presentation by Professor Woodrow Hartzog (@hartzog). He argued for privacy to be built into design, and gave the example of SnapChat. For those of you that don’t know, SnapChat is an app that allows you to send messages or photos to friends that self-destruct after a specified amount of time. Sounds like a great idea and yes, it was mostly used to send ‘risky’ pictures between friends. The problem was, the data was still accessible. You can read about it here. There are plenty of other examples where privacy, if it was thought about at all, was considered as an afterthought and perhaps bolted-on after the damage to consumers had already been done. You can completely understand why he is advocating these things must be built-in at the initial design stage of anything new. Also, Professor Hartzog’s presentation used cartoons. I love that.
Following a break, there was a panel discussion introduced and chaired by Dr Thanassis Tiropanis. One of the first issues highlighted by Professor Jon Crowcroft (@tforcworc) was that of the more rigorous standards applied to researchers who want to gather and analyse data. A proposed study into epidemics was rendered pointless when the ethics committee rules that data from the young and the old could not be used: the very people, it turned out, who were of most interest to the study. Caroline Wilson discussed ethics from the point of view of law, making the important point that such issues shouldn’t be framed as a burden. Libby Bishop spoke about her work at the UK Data Archive. This organisation holds data from various UK government departments as well as opinion poll data from market research companies, among many other sources of data. This data can be shared with researchers, but of course we must always remember that the originators of the data i.e. you and me, didn’t envisage our data being used in any other way other than the original purpose for which it was intended, and therefore access is carefully controlled. One thing I didn’t realise was that data from Twitter cannot legally be anonymised, which is certainly going to impact my research when I come to publication. Finally, Dr Mariarosaria Taddeo (@RosariaTaddeo) talked about the environment in which ethics is embedded. Specifically, she talked about machine learning. Just to explain, ‘machine learning’ simply means that a thing can be given a set of instructions to follow (which is essentially what an algorithm is) but it can add information as it carries out the task and learns from it. So, the vacuum cleaner that cleans your carpet automatically and learns where your furniture is located is a good, if simplistic, example. She stated “while we can’t embed values in machine learning ex ante*, … [we] can keep refining, auditing, testing, checking.” I’m sure there were many more points made, but I’m sorry to say that your blogger was unable to listen and take notes at the same time! That’s the drawback with discussions, presentations are much slower, and there is the advantage of having slides to help you follow what’s being said.
It’s worth mentioning, too, that there were several questions following Professor Hartzog’s presentation, and the panel discussion. The main ones centred on responsibility – who is responsible for considering ethics? Is it the designers? The people who manage the designers? Senior managers? Directors? What about the end user? Or even, as someone suggested, the shareholders? Certainly we’re beginning to teach students in schools all about managing the data they put online about themselves, but they don’t always appreciate the full implications of their actions. Then we have a generation or two of adults who haven’t grown up with technology and are almost entirely unaware of how and why they should manage their data.
The final speaker of the day was the utterly fascinating Professor Mireille Hildebrandt. The title of her presentation was ‘Promiscuous Data Sharing in times of Data-Driven Animism’. Even as I type this, I realise that I’m struggling to capture the essence of what she said. She combined the disciplines of philosophy and law, and walked us through some of the unintended consequences of data sharing with some well-chosen quotes, references and (again) cartoons. I really hope that her presentation slides will be available, because they will communicate her main points much better than I can. However, she pointed out that just because huge amounts of data exist, doesn’t mean that it’s all worth our time analysing. We must have a clear purpose behind our work and, more importantly, that purpose must be made clear at the outset to the person (or machine) generating the data what that purpose is. Furthermore, she pointed out that even though we may think we’re being clever using machine learning techniques to extract patterns (and information) from the data, we should always remember that, because we’re human, all our algorithms will contain bias because it is us who set them out in a particular direction so that learning can happen.
To sum up, then, it was an amazing morning. Web Science is an interdisciplinary field of research that encompasses so many areas, but this symposium really made the point that what it isn’t about is (just) computer science. The technicalities are important, but behind all of the algorithms and the technology and the data and the web sites and the apps and the commercial interests seeking to find ways to monetise what we do are people. We may not be able to articulate our concerns, often because we don’t know what they are until something goes wrong and our private data is violated in some way, but we deserve to be taken into account and it’s people like all those I’ve written about here, and those in the audience, who are trying to ensure that this happens.
And people ask me ‘what do you web scientists talk about when you get together?’…..