Southampton Web and Data Innovation Team

Ideas and Tips from the Team

Categories:

Advertising
AI
Apache
Best Practice
Bitcoin
Command Line
Community
Conference Spam
Conference Website
Data
- Research Data
Database
dev8d
Doug Englebart
Drupal
Events
Gateway to Research
GDPR
Geo
HESA
HTTP
Internet Archive
Intranet
Javascript
Jisc
Management
- Recruitment
Minecraft
Open Data
Open Source
ORCID
OSX
Outreach
Perl
PHP
Programming
python
RDF
- 4store
- Graphite
- SPARQL
- Triplestore
Repositories
Sharepoint
SQL
Team
Templates
Terms and Conditions
testing
Tips
Training
Tutorial
twitter
Uncategorized
web management
Wordpress

When Linked Data is not Open Data

I made a mistake! Potentially one which could have exposed information to the Internet which should have never left the Internet. It’s unlikely that anything leaked out, and the hole is now closed for good.

IP-range restricted pages subverted by proxys

Here’s what happened: A few years back we set up our first stab at an RDF service for ECS. This only contained information on members who had agreed to appear in our public directory, and never contained information on peoples offices. However, we wanted to play with that data in RDF so we decided to be clever and also create intra.rdf.ecs.soton.ac.uk which would serve such data, but only to our IP range. All was then fine and many 3rd year projects (well, 3 or 4) used the intranet data for interesting demos.

Where things went wrong was when I recently launched my RDF browser which allows you to view RDF documents in a more human-friendly way. All well and good, until I was playing with it later that week and I noticed I was able to browse our intra.rdf server from my home machine. The RDF browser had access to the confidential data as it was inside our network. As soon as I found off I added a rule to block my RDF browser. Then it occurred to me that anyone in ECS could write a web proxy and any intranet information restricted only by IP address could then become visible to the world, including Google!

For this reason we’ve moved to make all our Intranet information secured by username/password rather than IP range. This is a bit annoying, but necessary for data-protection as we’re a research department and we shouldn’t be preventing postgrads building web proxies for fun and experimentation.

However, our cookie based single sign-on is a very ugly way to access an RDF document. So it got me thinking about if we should even have closed linked-data and if so, how it should be handled.

Closed Linked Data

After a bit of a think I’ve decided that there are two very distinct types of closed linked data:

Data about me. For example: my contact details, office location, calendar/schedule/lecture timetable, what modules I am studying.
Data I am authorised to view. For example a list of the grades of my tutees, the list of servers in a server room, the communications budget expenditure details for 2009.

What I should be allowed to do with type (1) closed data is very different to type (2). If I choose, it’s perfectly reasonable for me to give access to a smartphone application to read data about me. I can make my own call about trusting the 3rd party developer. However there’s no way in hell I should be uploading student marks or confidential budgets to such an application. If they are to be trusted should be a decision made by my organisation and they should then be granted access that way.

One of our students wrote an iPhone app. called “iSoton” which you give your username & password and it logs into the main university Intranet portal, and navigates through a couple of pages to get your timetable out as CSV. It’s so popular it’s not got blocked, even though the developer could be harvesting the username/password pairs.

The thing is, there’s no need to use your main username/password to grant access to this data. What I propose should happen for type (1) data is that if you request such a URL/URI without a (valid) username and password it will provide some minimal triples describing how to create a username password. The app can then give you these instructions. Basically, you should log into your university account with your real username and password and ask to create a username/password pair for use by this app to get access to the data you approve of it seeing. Much safer.

Your username: [cjg.............]
Your password: [*********.......]
ID of service: [isoton..........]
Allow Access   [x] Contact Information
           to: [.] Location Information
               [x] Calendar and Timetable
               [.] Allow app to pass your information to 3rd parties?
               [.] Allow app to place any of this information on the public web?
  Expiry date: [2011-07-12] (optional)

Thankyou-- the app may access your contact and calendar information at:
http://cjg+isoton:ybBiebYB3@data.soton.ac.uk/person/cjg

This would be entirely inappropriate for type (2) data but for type (1) it allows all the cool mashups to be done without compromising the password used for email. The “allow app to” options would control what license information was included in the RDF boilerplate. This should also contain info on when the data was generated and for what disposable username, so if it does get released into the wild there is some kind of audit trail.

Desktop Applications for type (2) Closed Linked Data

While you wouldn’t pass your type (2) RDF (stuff you don’t have a personal right to republish), you may well want to use it with a desktop application. In much the same way you might download an Excel file from your intranet and run it on your laptop.

In this case it’s perfectly reasonable to use your main username/password to authenticate. (unless the application is malicious, but that’s a known problem and much easier to cope with on the desktop than on phone apps, cloudy websites etc.

However, as with the type (1) data, if it is provided in RDF it should contain some boilerplate saying when it was generated, who for, and for what IP address. That way if it leaks accidentally it can be traced. Obviously this is not proof against malicious linking, but it should be considered best practice to include such a header plus a clearly NOT OPEN license in any non-open linked data document.

Boilerplate Triples for Closed Linked Data

Here’s a sketch of what I’m thinking. I’m not sure about the create

<> a foaf:Document ;
   dc:license <http://data.soton.ac.uk/licence/our-bloody-closed-eyes-only-license> ;
   xxx:generatedFor <http://data.soton.ac.uk/person/cjg> ;
   xxx:requestedBy "152.78.71.23" ;
   xxx:generatedOn "2010-07-23T12:32:01Z" ;
   rdfs:license "This file contains confidential data and should not be redistributed. \
     If you receive or discover it by accident please notify yikes@soton.ac.uk and \
     delete all copies."

You get the gist. xxx: is for predicates I’m not sure about. There may already be useful ones I’ve forgotten somewhere in the bowls of dcterms.

Posted in Best Practice, Intranet, RDF.

3 comments

By Christopher Gutteridge – July 12, 2010

3 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Mischa Tuffield says

FWIW:

xxx:generatedFor could possibly be “http://xmlns.com/foaf/0.1/primaryTopic”

and xxx:generatedOn could be a “http://purl.org/dc/elements/1.1/date”

— it *could* be but I think those are too wooly. Also the primary topic might well be different to the person the data is being generated for. –c jg

July 12, 2010, 10:03 pm Reply
Kingsley Idehen says

Chris,

Yet another example of the kind of problems resolved by the WebID Protocol via its ACL dimension.

On the FOAF+SSL mailing list I kicked of some Resource oriented ACL demos. Naturally, you can also secure Named Graphs using WebID Protocol.

Links:

1. http://esw.w3.org/Foaf%2Bssl – WebID (nee. FOAF+SSL)
2. http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1625 — recent post with demo links re. WebID Protocol .

July 13, 2010, 1:50 am Reply
Olaf Hartig says

Hey,

You could use the Provenance Vocabulary to represent xxx:requestedBy and xxx:generatedOn. Just write something like:

prv:retrievedBy [
rdf:type prv:DataAccess ;
prv:performedBy _:x ;
prv:performedAt “2010-07-23T12:32:01Z”^^xsd:dateTime ] .

_:x rdf:type prvTypes:DataAccessor .

Now, you only need some property that states the IP address of the data accessor represented by the balnk node _:x

July 13, 2010, 9:44 am Reply

« PHP to clean up iffy XML Open & Linked Data for Universities »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

When Linked Data is not Open Data

IP-range restricted pages subverted by proxys

Closed Linked Data

Desktop Applications for type (2) Closed Linked Data

Boilerplate Triples for Closed Linked Data

3 Responses

Authors

Recent Posts

Meta

Blogroll

Tags

When Linked Data is not Open Data

IP-range restricted pages subverted by proxys

Closed Linked Data

Desktop Applications for type (2) Closed Linked Data

Boilerplate Triples for Closed Linked Data

3 Responses

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags