WSTNet Web Science Summer School – by Ryan Javanshir
Group Project
On the first day of the summer school, attendees were split into groups and each given a problem to solve during the down-time. Each problem involved the consideration of the issues we had learned during the week, whilst using Open Street Map data and machine learning techniques to solve our specific problem. These problems and the issues they raised allowed us to solidify our understanding of all the issues raised during the week, and allowed us to attempt to solve a real world problem.
Machine Learning
On one particular day, we were given a comprehensive overview of different machine learning techniques, and the process that is taken by a data scientist when delivering a solution to a problem. Although our time constraints meant we could only receive an overview, we were able not only to learn different types of machine learning and when to use them, but were able to experiment with programming in python, classifying offensive tweets out of a large dataset. The exercise taught us how to set up the development environments with step by step guidance provided by the tutor. Overall, my understanding of machine learning and the problems faced by data scientists with regard to choosing an algorithm and data cleaning dramatically increased.
Ethics
Throughout the week, ethics was an underlying theme that framed much of the content. I now have a much better understanding of which questions need to be asked before any system is developed. Where is this data coming from? Who collected the data? What biases exist in the data? What are the effects of using this data? What are the effects of the system overall? These were some of the questions that we considered as we learned about machine learning technologies and their affordances.
One particularly useful and interesting session was given by a representative of the Office for National Statistics. It involved deep consideration of a range of ethical questions such as: consent, storage and security, privacy, transparency, systems maintenance, diversity of data, accountability, law, unintended consequences and many other considerations.
Data Observatory
Later in the week, we visited the Data Observatory at Imperial College London. This was a fantastic opportunity to see not only the impressive 360-degree screen display, but to learn about the research Imperial are doing in machine learning and data science. After we were shown a demo of the Data Observatory and a short talk about the affordances of the system, we were given the opportunity to learn how to display map data from our own laptops onto the 360-degree screens. Although the process was rather complicated and lengthy, it was rewarding to see the content on our laptops displayed on the screens.
Overall, I was really impressed with the summer school, and invigorated to learn more about the topics that were touched upon during the week. We touched upon a wide variety of issues and learned the basics of machine learning and given the intellectual resources needed to explore these topics further. One week is not enough to become an expert, but is a great way of introducing a complicated and large topic to a diverse group who are unfamiliar with the area, giving them a greater appreciation and understanding of not only the technology itself, but the underlying practical, ethical and legal questions that are also raised.