50th years since the “Mother of All Demos”: What’s that got to do with the price of fish?

Demonstrating a user interface to manipulate structured data

So, we’ve been discussing ways to mark the 50th anniversary of the Mother of all Demos [Youtube, Wikipedia]. In this demo, Doug Englebart demonstrated the tools that he and his team had built to make themselves smarter and more effective. Some of these tools would become household items. He was one of the most important inventors in history.

The anniversary is 9th December 2018 (so 13 months from now). There’s some thoughts at http://doug-50.info/

Frode Hegland is head cheerleader for our discussions, and has asked us to think about where we can demonstrate and celebrate Doug’s ideas and vision, and how we can take it further.

So “augmenting human intellect”… how hard can that be?

What I’ve been thinking out is something I don’t have a perfect description of yet. It is about how

Containers being transferred to a cargo ship at the container terminal of Bremerhaven by Hannes Grobe

humans interact with information, researchers and scientists, most of all, but everyone else too. There’s an excellent blog post by Mia Ridge which has 0utlined much of the problem with information in 2017. Our data is anemic. We can move it around the world in moments, and request strings of ones and zeros but we know almost nothing about what these contain.

People are so used to the status-quo that they don’t realise there’s a problem and how much better it could be. It’s like shifting sacks of cargo onto a ship. That used to be “just how you did things”.

The best phrase I’ve got for this idea, so far, is “Intermodal information”. I’m stealing the idea from the freight industry. While I’m stealing, I’ll steal the whole definition from Wikipedia.

Intermodal freight transport involves the transportation of freight in an intermodal container or vehicle, using multiple modes of transportation (e.g.,rail, ship, and truck), without any handling of the freight itself when changing modes. The method reduces cargo handling, and so improves security, reduces damage and loss, and allows freight to be transported faster. Reduced costs over road trucking is the key benefit for inter-continental use. This may be offset by reduced timings for road transport over shorter distances.

The introduction of containers that worked between trains, ships and trucks changed the economy of the world for the better. We’ve already experienced something similar in data. thrice. Storing information digitally was the first. The advent of the packet switching network (IP) meant we can now move data over networks from any computer, to any computer. The IP network sends out packets of data and those packets move over wifi, wires, fibreoptics… even via satellite. The Web (HTTP) was a second revolution in the interoperability of data. Now we could request computer files all over the world, and get them with some basic metadata (mime types tell us a little about how they should be interpreted), and the URL system means we can link to computer files, and talk about them.

It’s no secret that this has changed the world and our species relationship to data.

So what’s the problem?

Data is great but it’s the start of the story, not the end. When you download a webpage it has some “MIME” header bit that is distinct from the file you are downloading. The bit that tells your computer how to interpret the file is called “Content-type”. The value of this is called a “MIME Type” and is generally something like “image/png” or “text/xml” or “application/vnd.google-earth.kml+xml”. Sometimes there’s a character encoding bit as well, eg. “text/html; charset=utf-8”. MIME types work almost the same as the “suffix” on the end of a filename, eg. badger.png, or secrets.html. It’s a lot more useful than just guessing what the file is, but not much better than the filename on a hard-drive.

What I hope we can achieve is a way to better describe the contents of files. There’s different ways to interpret the same file. A KML file is also a valid XML file which is a valid text file, which is a sequence of bytes, which is a sequence of bits. None of that tells us that the KML file describes the locations of park benches in Southampton.

Datasets come in many forms… except they don’t really. On computers, data files are usually structured as either trees of information, where each thingy, has zero or more subthingies, or tabular data where information is organised into sets of homogeneous records where each record has information in more or less the same shape. CSV, Spreadsheets and stuff like that. There’s also “graph” data but that’s less common.

What’s that go to do with the price of fish?

Maine Avenue Fish Market (Bien Stephenson)

What interests me is for our tools to be able to record, transmit and understand the structure and meaning of a file. This is a distinction between data and information. All mime types tell us is roughly what tools can read a file, but no more. Let’s take a very simple example of a spreadsheet containing a list of prices of fish. All we get from MIME is “application/vnd.ms-excel” which just tells us we can read it in Excel. We know it’s going to have one or more worksheets each with tabular data, but it would be helpful to know for sure that the first worksheet is the one of interest, that it is structured in rows with one row per record and the first row is the headings, that the sheet represents a list of products and their prices. Going further it would be helpful to know it’s about fish, relevant to a certain vendor, that we can validate the vendor really provided these prices and the timescale and audience for which it’s valid. It would helpful to link it to product categories, weight, specifications, species… and to have all those things done automatically and unambiguously with no extra work to anyone.

Hafenarbeiter bei der Verladung von Sackgut – MS Rothenstein NDL, Port Sudan 1960

This is not easy. But, it will happen eventually, somehow, and when it does we’ll look back on this as the olden days and think of the computer files we use now the way we look back on sacks of cargo loaded by gangs of stevedores. We can’t get there in one big jump, but it’s where we should be aiming for. Our data should just work, and get out of our way. Not just open data but all our data.

This is a bit bigger than I usually aim, but the brief of celebrating and extending the work of Doug Englebart is an unreasonable one, so maybe we need to starting thinking beyond what is reasonable…

And for me that’s “Intermodal information”. Hopefully we can come up with a catchier name.

Posted in Doug Englebart, Research Data.

3 comments

By Christopher Gutteridge – November 12, 2017

There’s a way to make our and foreign data meaningful by adding semantics, and it’s called XML. It’s enough if the MIME type states “text/xml” (of course we have to see if a general XML file handler will respond to it and pass the data to a specific client/agent/application upon payload/content inspection, or if we just add new MIME types), and in the XML itself, the DOCTYPE declaration or namespace states what semantics apply. As it is up to everybody to come up with their own XML-based format and as it’s up to client implementors to decide if they want to interpret the markup according to the official definition, eventually there can be a lot of competing, coexisting and cooperating “standards” (small and big ones), but as it doesn’t make a lot of sense for a format about product prices (fish) to specify location semantics (vendor), they need to be intermixable, which already works today based on the notion of microformats and using namespaces on sub-elements. True, it can get a little bit messy when dealing with such intermixed data (if it is one big hierarchical file containing everything, for example other worksheets of the spreadsheet application we’re not interested in), so we might want to translate those XML conventions to more Doug/Ted/NOSQL-style data structures, where all data stays isolated for itself, and we interlink it via generic conventions that don’t lack a way to bind semantics to the referenced data. This should be easy and fine for mostly static documents, for raw/application data, I’m not so sure as developers may change the semantics frequently without updating some kind of WSDL/XML-Schema for it, which led to approaches like ReST at least in theory. There are plenty of people doing some part of this already, but it’s usually none of the big Internet companies, because if a generic solution would be rolled out at one point, they would have to change their business models as everybody would be able to organize the world’s information as a curator and knowledge worker.

3 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

skreutzer says

There’s a way to make our and foreign data meaningful by adding semantics, and it’s called XML. It’s enough if the MIME type states “text/xml” (of course we have to see if a general XML file handler will respond to it and pass the data to a specific client/agent/application upon payload/content inspection, or if we just add new MIME types), and in the XML itself, the DOCTYPE declaration or namespace states what semantics apply. As it is up to everybody to come up with their own XML-based format and as it’s up to client implementors to decide if they want to interpret the markup according to the official definition, eventually there can be a lot of competing, coexisting and cooperating “standards” (small and big ones), but as it doesn’t make a lot of sense for a format about product prices (fish) to specify location semantics (vendor), they need to be intermixable, which already works today based on the notion of microformats and using namespaces on sub-elements. True, it can get a little bit messy when dealing with such intermixed data (if it is one big hierarchical file containing everything, for example other worksheets of the spreadsheet application we’re not interested in), so we might want to translate those XML conventions to more Doug/Ted/NOSQL-style data structures, where all data stays isolated for itself, and we interlink it via generic conventions that don’t lack a way to bind semantics to the referenced data. This should be easy and fine for mostly static documents, for raw/application data, I’m not so sure as developers may change the semantics frequently without updating some kind of WSDL/XML-Schema for it, which led to approaches like ReST at least in theory. There are plenty of people doing some part of this already, but it’s usually none of the big Internet companies, because if a generic solution would be rolled out at one point, they would have to change their business models as everybody would be able to organize the world’s information as a curator and knowledge worker.

January 12, 2018, 6:44 pm Reply
Gyuri Lajos says

“intermodal information” it is catchy enough for me. I like it very much.

March 15, 2018, 8:39 am Reply

Continuing the Discussion

50th years since the “Mother of All Demos”: What’s that got to do with the price of fish? | Demo@50 linked to this post on November 12, 2017
[…] Christopher Gutteridge’s post is at http://blog.soton.ac.uk/webteam/2017/11/12/50-years-since-the-mother-of-all-demos/ […]

50th years since the “Mother of All Demos”: What’s that got to do with the price of fish?

So “augmenting human intellect”… how hard can that be?

So what’s the problem?

What’s that go to do with the price of fish?

3 Responses

Continuing the Discussion

Authors

Recent Posts

Meta

Blogroll

Tags

50th years since the “Mother of All Demos”: What’s that got to do with the price of fish?

So “augmenting human intellect”… how hard can that be?

So what’s the problem?

What’s that go to do with the price of fish?

3 Responses

Continuing the Discussion

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags