Sharepoint 2010 – Light at the end of the tunnel?

…or oncoming train?

I recently challenged Dr Kenji Takeda, our local Microsoft fanboy, to see if he could convince me that Sharepoint was a good thing. I come from a LAMP background and have never had a very high opinion of Microsoft products. It’s a tough sell, but it wouldn’t be interesting if it wasn’t.

Kenji is game and we’ve had our first session, so here’s a braindump of what I’ve learned. The key thing is that (he says) Sharepoint 2010 kicks ass and takes names compared to previous versions. Apparently it implements REST which’ll make our LAMP scripting people much happier. The other interesting departure is that it’s browser neutral, working on Chrome, Firefox etc., although amusingly not IE6. I still have not got an install to play with and kick the tires. When I do I’ll make another post. It can apparently allow similar web-based document editing to Google Docs, possibly with more features (want to see that too), and can store your data onsite unlike Google who do somewhere-in-the-cloud or nothing. Google docs can undertake not to take your data outside the EU, I believe.

The thing which worries me is that we’ve got lots of places we can and might use SP in the unviersity:

Intranets for Projects – both research and student learning. It can provide a shared document store, internal blog and wiki. This might be great for some project teams and dreadful for others. One size does not fit all and I don’t think we should try and force it on everyone.
Public sites for Projects – This makes me nervous due to getting locked into the URLs. We should at least try and make sure that it provides good solid cool-URLs for pages and resources. (eg. end with .html and aspx)
Workflows – to manage business processes like expenses claims or Health and Safety forms.
Document Management – I’m not a big fan of our current DMS but I suspect I just don’t like DMSs so am not yet convinced this provide an improvement.
Social Tools – facebook-style profile pages, comment walls etc. These are kinda cool but need to be usecase driven. Where possible they should source all information from existing sources, but give users clear instructions about how to get it updated and what the turn-around will be. For information collected by the system (eg. a profile photo), it should be made available to other systems to avoid many data sources. It strikes me this would be excellent for building database-driven public profile pages too, like ECS has for years.

So far I’m still not convinced! There’s some good noises, but I’ve never actually used Sharepoint.I suspect it has the power to be very annoying if not well configured and lovingly tended. There’s some very positive things (interoperability), but I won’t trust that it really does it until I’ve seen it really working.

I’ll write another update sometime later in the summer, when I’ve learned more.

Posted in Intranet, Sharepoint.

No comments

By Christopher Gutteridge – July 29, 2010

ECS Infrastructure RDF in the Public Domain

We announced on Tuesday (13th July 2010) that all the RDF made available about our school would be placed in the public domain.

Around five years ago we (The School of Electronics and Computer Science, University of Southampton) had a project, led by Dr Nick Gibbins, to create open data of our infrastructure data. This included staff, teaching modules, research groups, seminars and projects. This year we have been overhauling the site based on what we’ve learned in the interim. We made plenty of mistakes, but that’s fine and what being a university is all about. We’ll continue to blog about what we’ve learned.

We have formally added a “CC0” public domain license to all our infrastructure RDF data, such as staff contact details, research groups and publication lists. One reason few people took an interest in working with our data is that we didn’t explicitly say what was and wasn’t OK, and people are disinclined to build anything on top of data which they have no explicit permission to use. Most people want to instinctively preserve some rights over their data, but we can see no value in restricting what this data can be used for. Restricting commercial use is not helpful and restricting derivative works of data is non-sensical!

Here’s an Example; Someone is building a website to list academics by their research area and they use our data to add our staff to this. How does it benefit us to force them to attribute our data to us? They are already assisting us by making our staff and pages more discoverable, why would we want to provide a restriction?. If they want to build a service that compiles and republishes data they would need to track every license and that’s going to be a bother of a similar scale to the original BSD Clause 3.

Our attitude is that we’d like an attribution where convenient, but not if it’s a bother. must-attribute is a legal requirement, we say “please-attribute”. It’s our hope that this step will help other similar organisations take the same step with the confidence of not being the first to do so.

The CC0 license does not currently extend to our research publications documents (just the metadata) or to research data. It is my personal view that research funders should make it a requirement of funding that a project publishes all data produced, in open formats, along with any custom software used to produce it, or required to process it, along with the source and (ideally) the complete cvs/git/svn history. This is beyond the scope of what we’ve done recently in ECS, but the University is taking the management of research data very seriously and it is my hope that this will result in more openness.

Another mistake we have learned from is that we made a huge effort to correctly model and describe our data as semantically accurately as possible. Nobody cares enough about our data to explain to their tool what an “ECS Person” is. We’re in the process of adding in the more generic schemes like FOAF and SIOC etc. The awesome thing about the RDF format is that we can do this gently and incrementally. So now everybody is both (is rdf:type of) a ecs:Person and a foaf:Person. (example). The process of making this more generic will continue for a while, and we may eventually expire most of the extraneous ecs:xyz site-specific relationships except where no better ones exist.

The key turning point for us was when we started trying to us this data to solve our own problems. We frequently build websites for projects and research groups and these want views on staff, projects, publications etc. Currently this is done with an SQL connection to the database and we hope the postgrad running the site doesn’t make any

cock-ups which result in data being made public which should not have been. We’ve never had any (major) problems with this approach, but we think that loading all our RDF data into a SPARQL server (like an SQL server, but for RDF data and connects with HTTP) is a better approach. The SPARQL server only contains information we are making public so the risks of leaks (eg. staff who’ve not given formal permission to appear on our website) is minimised. We’ve taken our first faltering steps and discovered immediately that our data sucked (well, wasn’t as useful as we’d imagined). We’d modelled it with an eye to accuracy, not usefulness, believing if you build it they will come. The process of “eating our own dogfood” rapidly revealed many typos, and poor design decisions which had not come to light in the previous 4 or 5 years!

Currently we’re also thinking about what the best “boilerplate” data is to put in each document. Again, we’re now thinking about how to make it useful to other people rather than how to accurately model things.

There’s no definitive guidance on this. I’m interested to hear from people who wish to consume data like this to tell us what they *need* to be told, rather than what we want to tell them. Currently we’ve probably got an overkilll!

→ rdf:type → owl:Ontology

→ cc:attributionName → “University of Southampton, School of Electronics and Computer Science”

→ dc11:description → “This rdf document contains information about a person in the Department of Electronics and Computer Science at the University of Southampton.”

→ dc11:title → “Southampton ECS People: Professor Nigel R Shadbolt”

→ dc:created → “2010-07-17T10:01:14Z”

→ rdfs:comment → “This data is freely available, but you may attribute it if you wish.␣␣If you’re using this data, we’d love to hear about it at webmaster@ecs.soton.ac.uk.”

→ rdfs:label → “Southampton ECS People: Professor Nigel R Shadbolt”

→ cc:attributionURL → http://www.ecs.soton.ac.uk/

→ cc:license → http://creativecommons.org/publicdomain/zero/1.0/

→ dc11:source → http://www.ecs.soton.ac.uk/people/nrs

→ dc:license → http://creativecommons.org/publicdomain/zero/1.0/

→ dc:publisher → http://id.ecs.soton.ac.uk/UoS/ECS

→ dc:rightsHolder → http://id.ecs.soton.ac.uk/UoS

→ foaf:primaryTopic → http://id.ecs.soton.ac.uk/person/2686

One field I believe should be standard which we don’t have is where to send corrections to. Some of the data.gov.uk is out of date and an instruction on how to correct it would be nice and benefit everyone.

At the same time we have started making our research publication metadata available as RDF, also CC0, via our EPrints server. It helps that I’m also lead developer for
the EPrints project! By default any site upgrading to EPrints 3.2.1 or later will get linked data being made available automatically (albeit, with an unspecified license).

Now let me tell you how open linked data can save a university time and money!

Scenario: The university cartography department provides open data in RDF form describing every building, it’s GPS coordinates and it’s ID number. (I was able to create such a file for 61 university buildings in less than an hours work. It is already freely published on maps on our website so no big deal making it available.

The university teaching support team maintain a database of learning spaces, and the features they contain (projectors, seating layout, capacity etc.) and what building each one is in. They use the same identifier (URI) for buildings as the cartography dept. but don’t even need to talk to them, as the scheme is very simple. Let’s say:
http://data.exampleuniversity.ac.uk/location/building/23

Each team undertakes to keep their bit up to date, which is basically work they were doing anyway. They source any of their systems from this data so there’s only one place to maintain it. They maintain it in whatever form works for them (SQL, raw RDF, textfile, Excel file in a shared directory!) and data.exampleuniversity.ac.uk knows how to get at this and provide it in well formed RDF.

The timetabling team wants to build a service to allow lecturers and students to search for empty rooms with certain features, near where they are now. (This is a genuine request made of our Timetable team at Southampton that they would like a solution for)

The coder tasked with this gets the list of empty rooms from the timetabling team, possibly this won’t be open data, but it still uses the same room IDs (URIs). eg. http://data.exampleuniversity.ac.uk/location/building/23/room/101

She can then mash this up with the learning-space data and the building location data to build a search to show empty rooms, filtered by required feature(s). She could even take the building you’re currently in and sort the results by distance away from you. The key thing is that she doesn’t have to recreate any existing data, and as the data is open she doesn’t need to jump through any hoops to get it. She may wish to register her use so that she’s informed of any planed outages or changes to the data she’s using but that’s about it. She has to do no additional maintenance as the data is being sourced directly from the owners. You could do all this with SQL, but this approach allows
people to use the data with confidence without having to get a bunch of senior managers to agree a business case. An academic from another university, running a conference at exampleuniversity can use the same information without having to navigate any of the politics and bureaucracy and improve their conference sites value to delegates by joining each session to it’s accurate location. If they make the conference programme into linked data (see http://programme.ecs.soton.ac.uk/ for my work in this area!) then a 3rd party could develop an iPhone app to mash up the programme & university building location datasets and help delegates navigate.

But the key thing is that making your information machine readable, discoverable and openly licensed is of most value to your own members in an organisation. It stops duplication of work and reduces time wasted trying to get a copy of data other staff maintain.

“If HP knew what HP knows, we’d be three times more profitable.” – Hewlett-Packard Chairman and CEO Lew Platt

I’ve been working on a mindmap to brainstorm every potential entity a university may eventually want to identify with a URI. Many of these would benefit from open data. Please contact me if you’ve got ones to add! It would be potentially useful to start recommending styles for URIs for things like rooms, courses and seminars as most of our data will be of a similar shape, and it makes things easier if we can avoid needless inconsistency!

Christopher Gutteridge, Web Projects Manager, Electronics and Computer Science (ECS), University of Southampton. July 2010. cjg@ecs.soton.ac.uk.

Posted in Uncategorized.

No comments

By Christopher Gutteridge – July 17, 2010

Data Visualisation

Yesterday I attended a session at IWMW entitled “Getting Awesome Results from Data Visualisation” by Rich Kirk from Chameleon Net.

I produce lots of open data, but I rarely consume it, and I figured this might give me some ideas.

The first bit really put me in mind of the data visualisation software in Douglas Adam’s “Dirk Gently’s Holistic Detective Agency”.

“…you can turn your figures into, for instance, a flock of seagulls, and the formation they fly in and the way in which the wings of each gull beat will be determined by the performance of each division of your company. Great for producing animated corporate logos that actually mean something.”

Rich said people doing data visualisations could do worse than starting with the “Choosing a Good Chart” diagram.

The session inspired me to have a go. We keep a big amount of configuration and monitoring data, most of which needs to remain confidential. However, the number of hits, in June 2010, on each of the 333 virtual hosts on our infrastructure web servers is not a secret so I produce a CSV of site,server,hits and uploaded it to ManyEyes.

I was very easily able to produce this visualisation: ECS Web Traffic for June 2010, which shows very clearly that our sites are spread quite reasonably over our servers. There’s a version of this visualisation with time as a dimension. It would be fun to capture data for a whole year and see sites grow and shrink as you move the slider!

Posted in Uncategorized.

Tagged with Data.

No comments

By Christopher Gutteridge – July 14, 2010

Open & Linked Data for Universities

I’m currently at IWMW2010 which is going pretty well. I ran a 90 minute workshop yesterday on university linked-data for university web managers. My basic points were:

Get your URIs right. eg. http://data.foo.ac.uk/type/scheme/id.format
Start with DC, SKOS, SIOC, FOAF, GEO (I wish I had that list 5 years ago!)
Pick the easy stuff and do that first
Don’t focus on accurate modeling in the ontology — rather think about how people might use the data for something useful
CSV is much better than no raw data
RDFa is not the place to start learning RDF.
A linked-data manager does not need to understand the fine details RDF, any more than a web manager needs to understand HTML & CSS
Build data for your own consumption
You’re already paying someone somewhere to keep much of this data up to date, but they are just failing to share it in a useful way. Turning it into open data should save lots of people around the university recreating existing datasets.
Use a tool to check your RDF says what you meant
Don’t worry about OWL & ontologies. You are better off (initially) writing your ontology for humans to understand, rather than machines.
I am not my homepage! — as RDF uptake increases there will be more people confusing URLs of documents with URIs for things. We’ll cope with that from the great unwashed linked data sources, but there’s no reason to do it when you know better.

For my talk I did something which I wish could become standard practice. Make a web page containing all the links in the slides and give the audience a tiny URL to write down to save them trying scribble every URL down. I also owe thanks to Dr Nick Gibbins, who provided me his Intro to Linked Data slides and saved me a hell of a lot of work.

I’ve spent the past few months working on a brainstorm of all the possible datasets a university could consider publishing.

University datasets mindmap

Over the session, and talking to people at IWMW, I’ve added a few new entries such as Reading Lists and data for accountability. Suggestions from the peanut gallery are positively encouraged!

I’ve also been busy converting Southampton’s list of “Common Learning Spaces” into RDF, via screen scraping and a whole bunch of RegExp’s. Next I plan to build a dataset of all the university buildings including latitude and longitude. It turns out this is quite easy to do, all I had to do was create a Google Map of University Buildings based on our campus map and then export it to Google Earth, which really produces a KML file which I can easily munge into RDF.

Given that data I should then be able to create a tool which mashes up the two sets of data and can produce a map of all teaching rooms on campus with movable seating, or a smart screen. Hopefully this will be a killer example of how very simple open & linked data can be a win for a university.

Posted in Best Practice, RDF.

2 comments

By Christopher Gutteridge – July 14, 2010

When Linked Data is not Open Data

I made a mistake! Potentially one which could have exposed information to the Internet which should have never left the Internet. It’s unlikely that anything leaked out, and the hole is now closed for good.

IP-range restricted pages subverted by proxys

Here’s what happened: A few years back we set up our first stab at an RDF service for ECS. This only contained information on members who had agreed to appear in our public directory, and never contained information on peoples offices. However, we wanted to play with that data in RDF so we decided to be clever and also create intra.rdf.ecs.soton.ac.uk which would serve such data, but only to our IP range. All was then fine and many 3rd year projects (well, 3 or 4) used the intranet data for interesting demos.

Where things went wrong was when I recently launched my RDF browser which allows you to view RDF documents in a more human-friendly way. All well and good, until I was playing with it later that week and I noticed I was able to browse our intra.rdf server from my home machine. The RDF browser had access to the confidential data as it was inside our network. As soon as I found off I added a rule to block my RDF browser. Then it occurred to me that anyone in ECS could write a web proxy and any intranet information restricted only by IP address could then become visible to the world, including Google!

For this reason we’ve moved to make all our Intranet information secured by username/password rather than IP range. This is a bit annoying, but necessary for data-protection as we’re a research department and we shouldn’t be preventing postgrads building web proxies for fun and experimentation.

However, our cookie based single sign-on is a very ugly way to access an RDF document. So it got me thinking about if we should even have closed linked-data and if so, how it should be handled.

Closed Linked Data

After a bit of a think I’ve decided that there are two very distinct types of closed linked data:

Data about me. For example: my contact details, office location, calendar/schedule/lecture timetable, what modules I am studying.
Data I am authorised to view. For example a list of the grades of my tutees, the list of servers in a server room, the communications budget expenditure details for 2009.

What I should be allowed to do with type (1) closed data is very different to type (2). If I choose, it’s perfectly reasonable for me to give access to a smartphone application to read data about me. I can make my own call about trusting the 3rd party developer. However there’s no way in hell I should be uploading student marks or confidential budgets to such an application. If they are to be trusted should be a decision made by my organisation and they should then be granted access that way.

One of our students wrote an iPhone app. called “iSoton” which you give your username & password and it logs into the main university Intranet portal, and navigates through a couple of pages to get your timetable out as CSV. It’s so popular it’s not got blocked, even though the developer could be harvesting the username/password pairs.

The thing is, there’s no need to use your main username/password to grant access to this data. What I propose should happen for type (1) data is that if you request such a URL/URI without a (valid) username and password it will provide some minimal triples describing how to create a username password. The app can then give you these instructions. Basically, you should log into your university account with your real username and password and ask to create a username/password pair for use by this app to get access to the data you approve of it seeing. Much safer.

Your username: [cjg.............]
Your password: [*********.......]
ID of service: [isoton..........]
Allow Access   [x] Contact Information
           to: [.] Location Information
               [x] Calendar and Timetable
               [.] Allow app to pass your information to 3rd parties?
               [.] Allow app to place any of this information on the public web?
  Expiry date: [2011-07-12] (optional)

Thankyou-- the app may access your contact and calendar information at:
http://cjg+isoton:ybBiebYB3@data.soton.ac.uk/person/cjg

This would be entirely inappropriate for type (2) data but for type (1) it allows all the cool mashups to be done without compromising the password used for email. The “allow app to” options would control what license information was included in the RDF boilerplate. This should also contain info on when the data was generated and for what disposable username, so if it does get released into the wild there is some kind of audit trail.

Desktop Applications for type (2) Closed Linked Data

While you wouldn’t pass your type (2) RDF (stuff you don’t have a personal right to republish), you may well want to use it with a desktop application. In much the same way you might download an Excel file from your intranet and run it on your laptop.

In this case it’s perfectly reasonable to use your main username/password to authenticate. (unless the application is malicious, but that’s a known problem and much easier to cope with on the desktop than on phone apps, cloudy websites etc.

However, as with the type (1) data, if it is provided in RDF it should contain some boilerplate saying when it was generated, who for, and for what IP address. That way if it leaks accidentally it can be traced. Obviously this is not proof against malicious linking, but it should be considered best practice to include such a header plus a clearly NOT OPEN license in any non-open linked data document.

Boilerplate Triples for Closed Linked Data

Here’s a sketch of what I’m thinking. I’m not sure about the create

<> a foaf:Document ;
   dc:license <http://data.soton.ac.uk/licence/our-bloody-closed-eyes-only-license> ;
   xxx:generatedFor <http://data.soton.ac.uk/person/cjg> ;
   xxx:requestedBy "152.78.71.23" ;
   xxx:generatedOn "2010-07-23T12:32:01Z" ;
   rdfs:license "This file contains confidential data and should not be redistributed. \
     If you receive or discover it by accident please notify yikes@soton.ac.uk and \
     delete all copies."

You get the gist. xxx: is for predicates I’m not sure about. There may already be useful ones I’ve forgotten somewhere in the bowls of dcterms.

Posted in Best Practice, Intranet, RDF.

3 comments

By Christopher Gutteridge – July 12, 2010

PHP to clean up iffy XML

OK, this is an icky hack, but it’ll remove stray & from crappy XML:

$data = preg_replace( “/&(?!amp;|gt;|lt;|quot;)/”, “”, $data );

Posted in Uncategorized.

Tagged with hacks, PHP, xml.

No comments

By Christopher Gutteridge – June 22, 2010

Dear Edinburgh Fringe Website

Dear Fringe Website,

Your terms and conditions make me sad.

I work with the Web Science Trust and some of the big names in the Semantic Web and I was hoping I would be able to create “linked data” for the fringe festival. Linked data is the technique being used to publish government data on data.gov.uk and, according to Sir Tim Berners Lee, is the future of the web.

If I was able to do this (which I would happily do for free and with no bother to you), it would result in dozens of websites and phone apps remixing the fringe guide. While I’m sure your own iPhone app will be good (although I have a android phone, so no use to me), it would have been exciting to have 100’s of people providing alternate ways to work with the programme, and far more in the spirit of the fringe. Sadly it looks like the rules have been written from the perspective of advertising revenue and control, rather than fostering creativity and experimentation.

The Fringe will be awesome without linked data, but it could be and should be awesomer.

– Christopher Gutteridge.

ps. You really should rethink the policy “About linking by hypertext to our website” as it is unrealistic and draconian. I broke the terms and conditions by mentioning your URL in an unauthorised tweet.

Posted in Uncategorized.

9 comments

By Christopher Gutteridge – June 16, 2010

Quick and Dirty RDF Reader

I know the world has quite a few RDF readers already, but I wanted something which was friendly and worked the way I do. I built the Q&D-RDF browser for my own benefit, but in the last few days it’s proved so useful I have made the effort to tidy it up a bit and announce it as a service.

http://graphite.ecs.soton.ac.uk/browser/

It’s built on top of the excellent ARC2 library, and my graphite library. I’ve downloaded all the current prefixes from prefix.cc so that it will use short names for namespaces wherever possible. As it’s built on ARC2 it will handle lots of formats including RDF+XML, N3 and friends, and also RDFa and other abominations.

What I’ve found it most useful for is to point it at some RDF I’ve just completed and do a visual check that things look right. I’ve caught several dumb typos this way. The graphite “dump” format also seems to be much less intimidating for people that n3 or XML formats.

Dubious Data

“What RDF have you been writing?”, is what I’m sure you’re now wondering.

What I’ve been doing is brushing up my skills by writing some well engineered but deliberately pointless or inaccurate RDF datasets. However, by accident, one of them is quite useful. The playing cards dataset at http://data.totl.net/playingcards/ [Raw RDF, Browse] actually looks like it might be a good example to use in an introduction to RDF.

Posted in Graphite, RDF.

No comments

By Christopher Gutteridge – June 14, 2010

Using a triplestore instead of MySQL as a backend

I’m still looking at the barriers to using an RDF triple store as the back-end for a website. I’ve discussed some of this back in February already, but the problems remain unsolved.

Our usual pattern, when designing a website, is to identify the various types of entity that will be described by pages on the site. For an academic site we have some of people, groups, projects, publications, events, articles. We then create a database table or tables for each of these and php wrapper functions to get individual records, lists of records and methods to create & update records of each type. In PHP, we have an object representing the set of items (eg. Events) and an object representing each item. The SQL is kept abstracted away as much as possible.

The PHP classes which represent an item or a list of items, has methods for mapping the data into various formats; short HTML summary, an HTML page, RDF, XML, .ics, rss, atom etc. Occasionally some fields may be not shown to the public, for example if we use the same database for some internal administration.

On some sites, we have a table which stores all revisions of each item, and a table which maps each primary_item_id to its revision_id. Previous versions should never, ever be shown to the public as they may have contained errors or information we actively do not want to be public.

What I’m interested in is how normal web developers, rather than researchers, can achieve this.

I am still imagining a system with “classes” of things, like people and events, where the PHP is configured in such a way to be able to create/retrieve/update/delete individual “records”, that each triple will belong to only one record, and that we’ll have PHP functions which retrieve data from a set of records (by abstracting SPARQL instead of SQL)

Unanswered questions:

Internally, do we use our own namespace for the predicates or established namespaces (FOAF, SIOC etc) or a mixure?
If we use our own namespace, do we map into common schemas (FOAF,SIOC…) for the public view of .rdf data? Do we map it on demand, or when a record is updated? Do we expose our internal namespace predicates? I don’t believe just providing a mapping and let people map it themselves is a reasonable option.
Do we expose all of the triples? (what about ones used for administration? do we just make sure we have no secrets in the triplestore?) If so, how do we handle revisions? Have 2 triplestores — One for the public and one for admin? Or can triplestore SPARQL endpoints be configured in fancy ways?
How do we generate brief, unique URIs for items when they are created? In my experience URIs built from any of the meaningful data in an item are a mistake, eg. surnames etc. Using uuid’s are not an option — they are ugly. http://webscience.org/person/6 is better. My previous post suggested some solutions, and Talis have a weird solution using a pool of available IDs, but I don’t regard it as a solved problem. Then again there’s no standard solution in SQL databases.
If using tools to add value by importing/generating additional triples, how do we manage these? For example do we need to erase any of these if the records they refer to are removed or updated?

I think there are probably answers to all of these, but they need to be moved from ‘research’ to ‘development’. I’ll post updates if people solve any of these for me.

Posted in Best Practice, RDF.

No comments

By Christopher Gutteridge – April 16, 2010

MediaWiki Authentication Using Twitter and OAuth

The Dev8D wiki I set up for a recent JISC event uses OAuth to allow people to log in to the wiki using their twitter accounts (or users can register for wiki accounts in the usual way).

As promised in an earlier post, here’s a rough guide to how it was done.

1. Set up MediaWiki

I won’t go into details of how to do this here, but first step should be download and install a recent release of MediaWiki.

For a new wiki, I’d recommend installation of the reCAPTCHA plugin, to prevent automatic account registrations from spam bots.

I’d also prevent anonymous editing/creation of pages on the wiki, by adding the following lines to the bottom of your LocalSettings.php:

# Disable anonymous editing and page creation $wgGroupPermissions['*']['edit'] = false; $wgGroupPermissions['*']['create'] = false;

2. Create a new table in the MediaWiki database

Create a table named ‘twitter_users’ in your wiki database, with the following fields:

CREATE TABLE IF NOT EXISTS `twitter_users` (
    `user_id` int(10) unsigned NOT NULL,
    `twitter_id` varchar(255) NOT NULL,
    PRIMARY KEY  (`user_id`),
    UNIQUE KEY `twitter_id` (`twitter_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Note: If you’re using a prefix for your wiki database tables, this ‘twitter_users’ table will also need the prefix.

This table maps MediaWiki user accounts to twitter user accounts. It’s used to track whether an account on MediaWiki was created using twitter OAuth or not, and ensures only accounts created from twitter can be authenticated against twitter.

Without this, someone could create a twitter account with the same username as a non-twitter based wiki account (such as an admin account), and gain access.

3. Register a new twitter application

Go to http://twitter.com/oauth_clients, and follow the “Register a new application” link.

Fill in the fields as follows:

Application Icon: anything you like
Application Name: anything you like
Description: anything you like
Application Website: http://[your wiki base URL]/
Organization: anything you like
Website: anything you like
Application Type: Browser
Callback URL: http://[your wiki base URL]/oauth/callback.php
Default Access type: Read-only
Use Twitter for login: Yes

After submitting the form, you should get a Consumer key, Consumer secret, Request token URL, Access token URL, Authorize URL (make a note of these, or keep the window open somewhere for now).

4. Setup PHP OAuth library

I used the twitteroauth library for this (.tgz download).

This library requires PHP’s cURL library to be installed (package php5-curl on Ubuntu or other Debian-like systems).

Untar and unzip this into your MediaWiki extensions directory, and rename the directory to ‘oauth’:
cd /[wiki root directory]/extensions wget http://github.com/abraham/twitteroauth/tarball/0.2.0-beta3 tar xzf abraham-twitteroauth-76446fa.tar.gz mv abraham-twitteroauth-76446fa oauth

Recommended: Some of the code from this library needs to be accessible from a browser, so I’d recommend symlinking to this directory from the wiki root:
cd /[wiki root directory]/ ln -s extensions/oauth
You don’t have to do this, but it looks a bit neater than having URLs containing your wiki extensions directory.

Edit the config file in the oauth directory:
vi /[wiki root directory]/extensions/oauth/config.php
Set the ‘CONSUMER_KEY’ and ‘CONSUMER_SECRET’ to the values you got when you registered your OAuth application with twitter.
Set the ‘OAUTH_CALLBACK’ to ‘http://[your wiki base URL]/oauth/callback.php’.

To test that everything’s worked so far, visit:
http://[your wiki base URL]/oauth/
and click the button to sign in using twitter.

You should then be taken to a page on twitter.com which asks about allowing the application access to your twitter account. Clicking on the ‘Allow’ button should then redirect you back to:
http://[your wiki base URL]/oauth/index.php

Refresh the page, and you should see all the information twitter has passed back to the application.

5. Set up the wiki to use OAuth

Download TwitterAuth.php, and put it into the extensions directory:
cd /[wiki root directory]/extensions wget http://github.com/davechallis/misc-scripts/raw/master/TwitterAuth.php

Modify your LocalSettings.php, and add the following lines:

require_once("$IP/extensions/TwitterAuth.php");

global $wgHooks;
$wgHooks['UserLoadFromSession'][] = 'twitter_auth';
$wgHooks['UserLogoutComplete'][] = 'twitter_logout';

Once you’ve added this, and signing in using OAuth worked as in the section above, try navigating to any wiki page. You should now be logged with your twitter username.

6. Additional Setup

Two last things need adding before we’re done:

6.1 Add a login button to the login page

Make a copy of the original, and then edit:
/[wiki root directory]/includes/templates/Userlogin.php

After the line which reads:

<p id="userloginlink"><?php $this->html('link') ?></p>

add the following lines:

<?php
$return = '';
if (isset($_GET['returnto'])) {
 $return = "?returnto={$_GET['returnto']}";
}
?>
<p>Or: <a href="http://[wiki base URL]/oauth/redirect.php<?php echo $return;?>">
<img src="/Sign-in-with-Twitter-lighter.png" alt="Sign in with Twitter" /></a></p>

Change the text/image above to anything suitable for your site (twitter has some preferred button images for this).

6.2 Redirect to the correct page after login

Some code needs adding/tweaking so that a user returns to the page they were on after logging in (the code added above for the login button helps with this).

Modify:
/[wiki root directory]/extensions/oauth/callback.php
and change the line near the bottom from:
header('Location: ./index.php');
to:
header('Location: http://[wiki base URL]/index.php/' . $_SESSION['returnto']);

And finally modify:
/[wiki root directory]/extensions/oauth/redirect.php
Underneath the line which reads:
case 200:
add the following:

    if (isset($_GET['returnto'])) {
        $_SESSION['returnto'] = $_GET['returnto'];
    }
    else {
        $_SESSION['returnto'] = '/';
    }

That’s mostly it! I’ve probably forgotten a few things, and a lot of changes were made at the last minute/during Dev8D, so any fixes/suggestions are welcome.

Posted in Uncategorized.

6 comments

By Dave Challis – April 13, 2010

« Previous Next »

Sharepoint 2010 – Light at the end of the tunnel?

ECS Infrastructure RDF in the Public Domain

Data Visualisation

Open & Linked Data for Universities

When Linked Data is not Open Data

IP-range restricted pages subverted by proxys

Closed Linked Data

Desktop Applications for type (2) Closed Linked Data

Boilerplate Triples for Closed Linked Data

PHP to clean up iffy XML

Dear Edinburgh Fringe Website

Quick and Dirty RDF Reader

Dubious Data

Using a triplestore instead of MySQL as a backend

MediaWiki Authentication Using Twitter and OAuth

1. Set up MediaWiki

2. Create a new table in the MediaWiki database

3. Register a new twitter application

4. Setup PHP OAuth library

5. Set up the wiki to use OAuth

6. Additional Setup

6.1 Add a login button to the login page

6.2 Redirect to the correct page after login

Authors

Recent Posts

Meta

Blogroll

Tags

IP-range restricted pages subverted by proxys

Closed Linked Data

Desktop Applications for type (2) Closed Linked Data

Boilerplate Triples for Closed Linked Data

Dubious Data

1. Set up MediaWiki

2. Create a new table in the MediaWiki database

3. Register a new twitter application

4. Setup PHP OAuth library

5. Set up the wiki to use OAuth

6. Additional Setup

6.1 Add a login button to the login page

6.2 Redirect to the correct page after login

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags