Simon Rogers, editor of The Guardian‘s Datablog, last week posted a top 10 list data.gov.uk datasets by how they could be relevant to people, highlighting a number of very interesting data sets. He featured national transport statistics, a massive data set cataloging not only every bus, rail, coach stop or pier in the UK but every bus, train, tram, or ferry that docked for a week in October; as well as statistics on government spending (COINS), the UK labour market / employment statistics by year, youth perspectives and attitudes by region, and statistics on dog messes by UK region. For each he describes a little synopsis of the contents of the data set, highlights its potential uses, and describes problems/limitations of the data.
This post is of tremendous use not because it merely serves to highlight a tiny, delicious morsel from a rather immense soup of more than 4,223 datasets on data.gov.uk, but because helps make it relevant to people: he describes (in easily human-understandable terms) what is in each data set, why the data is relevant or interesting, and most importantly, ways that it can be put to use.
The chasm between publishing and use is currently large and daunting. Many of the data sets are in “raw” form, Excel spreadsheets created by public servants (using highly specialized government vocabulary) or immense, multi-gigabyte CSV files with little supporting documentation. What we see now is a gold rush (on both sides of the Atlantic – data.gov and data.gov.uk) of citizen-hackers who are downloading this data, writing scripts to parse through it, and generating visualisations and apps that make it possible for end-users to actually use it in various ways.
But the Guardian Datablog highlights that this might be an ideal role for journalism to come in as well – while citizen-hackers have been effective at rolling mash-ups that let everyday people get at the data, it still takes a journalist/reporter to get tasty bits out of it — to transform the raw bits into information – speculation, perspective, and to contextualize it in world/current events, and to weave it into a story that leads people to question what the data say about the ways they live each day.
These data journalists, of course, do not have to come from Big Media (TV, newspapers) as such – the ones that do just happen to be best equipped with the right set of skills. In the future, it would be interesting to see whether the many, emerging sense-making and visualisation tools, such as ManyEyes, Google Fusion Tables , Freebase Gridworks, and our own work, enAKTing’s GEORDI browser (forthcoming), could make data-journalism more accessible to citizens without a background in statistical data analysis or a journalism degree. If so, these tools could unleash masses of newly equipped citizen-journalists on the terabytes of open data now publicly available, so that it can be more immediately transformed into information that can start to make an difference in people’s lives.