Drupal

We’re pretty busy.

The University is restructuring and so are the local research groups. This means we need to find a practical solution for producing a site for FPAS (our new faculty), for the School of Physics (our new chums in the new Faculty), and for 5 new research groups.

The current ECS research group websites are mostly database driven, and while they look quite different, they all have the same basic structure (news, people, publications, seminars, themes, projects). The Faculty and Physics site will initially just need ‘news’, with more dynamic content to follow.

And it all needs to be launched on August 1st. So the pressure is on!

Which CMS?

(CMS = Content Management System)

I had a good look around the University’s current CMS, Interwoven SitePublisher, and adding quick hookups to data sources involves firing up Eclipse and compiling Java connectors and that would also (probably) need to go through some kind of central approval process (although that’s just an assumption, but iSolutions have a lot of bureaucracy compared with our approach which can be more, er, relaxed.

The University is going to be using Sharepoint as a platform for many things, but the ECS webteam don’t really have any experience there, and again, getting code into it seems to require a huge stack of Windows-only software and it makes me tired and sad each time I try to get to grips with it.

Early Days in Drupal

I’ve never used Drupal before, but it’s a very well regarded PHP based CMS so decided to have a look. My first impression was that it seemed quite limited. Site publisher allows you to add pre-defined components to a page, so once set up it has some flexibility that Drupal doesn’t. Drupal just has a set of data fields for each type of page. You can extend your own and even add new renderers for types of field, and new types of field, but I couldn’t see how to do something rich. What I want is something a bit like the ‘compound’ field in EPrints, where you have a field containing a list of values, and each value is made up of several sub-fields. eg. Name, Email, Homepage. I think that can be done in Drupal using one node/page per item and then making a page which is a view of these, but I’ve not got to views yet.

What is really nice is that you can, for each page type, just create a php template that takes the fields data and prints the central content of the page. What is more confusing is when I want to get the actual values of the fields, eg. to use in a function call. Drupal passes values and sets and lists of values around as a PHP datastructure or object which is a bit confusing to me. I know I can pass it to a function called render(..) and that’ll return HTML, but I can’t find documentation (yet) for the other things you might want to do with it so I’m just poking around inside it which feels like I’m doing the Wrong Thing.

Modules

Drupal is a bit overwhelming and while very documented, the documentation doesn’t make it easy to get the basic concepts explained. The other thing you need to learn is that you will need to install additional modules. The key ones I’ve needed so far are ‘Frontpage’ which allows you to over-ride the default frontpage (which shows a list of recently added or updated pages, which is not what I want for our sites), Wysiwyg which allows you to use any of nearly a different javascript editors. You can also enable and disable the core modules. I found it useful to disable ‘comments’ module entirely and to enable ‘php filter’ which lets an supre user drop raw <?php tags into an HTML field and have them evaluate.

Oh, and the “Zen” template is a great idea, it’s a theme designed for subclassing. Took me a little while to get to grips with the concept but now I have it makes it easy to produce a set of on-brand University templates. yay.

Blocks

Blocks are page elements which appear on many pages. Some are auto-created by modules or the system, others you can create yourself. You can either say which pages (or page types) they appear on. I’d prefer it if some blocks could be selected by the person editing a page, but it’s set up the other way around. Blocks appear in page ‘regions’ which are defined in the page template (which is a template “above” the node template). The content of the page is one of the things that appears in a region, as is menus and login boxes.

I don’t think we’ll use blocks much, but I could be wrong.

Menus

There are several menus by default, Main, Secondary, Navigation and one other. The Main menu seems to be intended as the site “tabs” or similar, and navigation the detailed nitty gritty menu going down to all content. I’m just using navigation for our stuff. What’s weird is that “Add Content” is an option in this menu and when you add a new page type, it appears as a sub option under “add content”, but is only seen if you’re logged into the site. In my test site I’ve moved or disabled the “add content” menu item and it seems to add new “add page type XXX” menu options to the wrong place. My suggestion; leave it the hell alone, it knows what it’s doing. The menus render as just <ul><li>… HTML with lots of handy class tags to aid rendering.

CSS Support

Man, do they not skimp here. All lists have lots of handy class tags like ‘first’, ‘last’, ‘odd’,’even’, plus classes for their specific field types, field names etc. Really uses CSS to good effect. It’s what I was aiming for when we produced the EPrints 3 CSS, but really really well thought out. Might up bandwidth a bit, but that’s a very small price to pay.

Security and Accounts

I considered hooking up the login system to the University Accounts, but ran into some issues. First of all, LDAP is still a bit of a black art to me. There’s a lot of options in the Drupal LDAP module, but that’s no real excuse. A far bigger issue is that university passwords really should go over HTTPS, and that’s tricky for multiple sites on one server. Drupal doesn’t seem to have an easy way to have multiple sites log in via a single https server. The other option would be to serve the pages over https with either a broken cert or a *.soton.ac.uk if someone would let me have such a cert.

For our sites, logging in is only for editing content, so I figure we just make local accounts. It’s not perfect, but it limits the risk. There’s a small risk of someone gaining access to the site by sniffing a password, but most editing will happen from inside our network so I think the risk is acceptable. Having someone’s real university password sniffed would be hightly unacceptable.

Database Intergration

As I said earlier, I’ve got data we want to show dynamically. There’s two cases.

Dynamic content in a page: For example including the latest few news items on the homepage. This can be handled either just by dropping <?php into the page and doing it old school, or if it’s to go in the sidebars, maybe I could make some blocks to do this.

Dynamic site areas: For example the projects section would consist of a list of projects, plus a page for each project. One Drupal approach is to create nodes from the data, but I kind of hate that idea. I’m veering towards creating a custom page type for such areas with minimal data. eg. just page title and ‘research group id’ then using that value in a custom template to do the heavy lifting, and have the project pages being /projects?id=123 – not pretty but it gets us started. Later I can do something clever either in the apache config to alias /projects/(.*) to /projects/?id=$1 or maybe Drupal has a way.

Deploying a new site

This is looking slightly worrying as it’s not as turn-key as I would like to get a new site set up using my templates. There’s a growing checklist of things to do to get the site ready, and they are almost all via the web interface. I can see a few options;

Accept the fact it’s a pain and do it the long way (soooo not our style)
Create a exemplar site and mirror the tables when deploying a new site
Investigate ‘drush’ a tool which lets you poke drupal from the command line.

I suspect the examplar is the way to go, but drush may be handy for some stuff too.

Linked Open Data

Rather than build our sites from SQL databases, I want to have a goal of having them entirely running off Open Data (possibly with caches). The aim is that all the data used in the site will be open, but the open data itself will be used to build the sites. We’ll probably compromise here and there, but it’s a good goal. Dave Challis has been quietly preparing for this for some time. Also it gives me a bit of an excuse to do some Open Data work (which I’ve been missing) while following my orders to focus on faculty websites *grin*.

Drupal for All

What I’m hoping is that we can provide the faculty (and maybe other members of the university) with an easy way to get a content-managed site set up very quickly and already using the university branding. The aim would be to complement the more formal CMS service offered by the central university. Currently the options are absolute freedom (you install it, good luck!) or the very restrictive service from the center. I think it may be useful to offer a middle ground where people want a little bit more freedom, and the ability to do the odd cool hacky thing, but for the most part just want to throw pages onto the web without having to learn much.

Why not Joomla (or CMS-X or CMS-Y)?

Because I was looking for something to fill a set of requirements. Drupal filled them so I stopped looking.

Posted in Drupal, PHP, RDF.

2 comments

By Christopher Gutteridge – May 17, 2011

Data Catalogue Interoperability Meeting

I’ve just come back from a two day meeting in Edinburgh on making data catalogs interoperable.

There’s some interesting things going on and CKAN is by far the most established data catalog tool, but it frustrates me that they are basically re-implementing EPrints starting from a different goal. On the flip side, maybe parallel evolution is a natural pheonomena. None the less the CKAN community should pay more attention to the, now mature, repository community.

Data Catalogs have a few quirks. One is that they are really not aimed at the general public. Only a small number of people can actually work with data and this should inform descisions.

The meeting had a notable split in methodologies, but not an acrimonious one. In one camp we have the URIs+RDF approach (which is now my comfort zone) and on the other GUID plus JSON. The concensus was that JSON & RDF are both useful for different approaches. Expressing lossless RDF via JSON just removes the benefits people get from using JSON (easy to parse & understand at a glance).

My interests here are twofold. One is that I represent data.southampton.ac.uk which is one of the first organisation-level data catalogs. These will be numerous and small. The other is that I hope that repositories (EPrints) will begin to be used to store research data. A dataset repository is almost certainly also a dataset catalogue.

Catalog or Catalogue?

The dataset catalog community has settled on the en-us spelling of the word, but I tend to swing between the two. Apologies for that.

A Dataset by any other Name

A key issue is that dataset and dataset catalogue are very loaded terms. We agreed, for the purposes of interoperability that a dataset record is something which describes a single set of data, not an aggregation. Each distribution of a dcat:dataset should give access to the whole of the dataset(ish). Specifically this means that a dataset (lower-case d) which is described as the sum of serveral datasets is a slightly different catalog record and may be described as a list of simple dcat:datasets.

Roughly speaking the model of a (abstract, interoperable) data-catalog is

Catalog
- Dataset (simple kind)
  - Distributions (Endpoints, Download URLS, indirect links to the pages you can get the data from or instructions of how to get the data)
- Collections
- Licenses

We agreed that DCAT was pretty close to what was needed, but with a few tweaks. The CKAN guys come from the Open Knowledge Foundation so handling distributions of data which required other kinds of access such as password, license agreement or even “show up to a room with a filing cabinet” where outside their usual scope but will be important for research data catalogues.

We discussed ‘abuse’ of dcat:accessURL – it sometimes gets used very ambiguously when people don’t have better information. The plan is to add dcat:directURL which is the actual resource from which a serialisation or endpoint is available.

Services vs Apps: Services which give machine-friendly access to a dataset, such as SPARQL or an API we agreed were distributions of a dataset, but Applications giving humans access are not.

We agreed that, in addition to dc:identifier. dcat should support a globally unique ID (a string which can by a GUID or a URI or other) which can be used for de-duplication.

Provenance is any issue we skirted around but didn’t come up with a solid recommendation for. It’s important – we agreed that!

Very Simple Protocol

At one point we nearly reinvented OAI-PMH which would be rather pointless. The final draft of the day defines the method to provide interoperable data, and the information to pass but deliberately not the exact encoding as some people want Turtle and some JSON. It should be easy to map from Turtle to JSON but in a lossy way.

A nice design is that it takes a single URL with an optional parameter which the data-catalog can ignore. In other words, the degenerate case is you create the entire response as a catalog.ttl file and put it in a directory! The possible endpoint formats are initially .json, .ttl and (my ideal right now) maybe .ttl.gz

The request returns a description of the catalog and all records. It can be limited to catalog records changed since a date using ?from=DATE but obviously if you do that on a flat file you’ll still get the whole thing.

It can also optionally, for huge sites, include a continuation URL to get the next page of records.

The information returned is the URL to get the metadata for the catalog record (be it license,collection or dataset) in .ttl or .json depending on the endpoint format, last modified time for the catalog record (not the dataset contents), the globally unique ID (or IDs…) of the dataset it describes, and an indication if the record has been removed from the catalog — possibly the removal time.

Harvesters should obey directives from robots.txt

All in all I’m pleased where this is going. It means you can easily implement this with a fill-in-the-blanks approach for smaller catalogs. A validator will be essential, of course, but it’ll be much less painful to implement than OAI-PMH (but less versatile).

csv2rdf4lod

I learned some interesting stuff from John Erickson (from Jim Hendler’s lot). They are following very similar patterns to what I’ve been doing with Grinder (CSV –grinder–> XML –XSLT–> RDF/XML –> Triples )

One idea I’m going to nick is that they capture the event of downloading data from URLs as part of the provenance they store.

One Catalog to Rule them All

The final part of the discussion was about a catalog of all the world’s data catalogues. This is a tool aimed at a smaller group than even data catalogues, but it could be key in decision making and I suggested the people working on it have a look at ROAR: Repository of Open Access Archives which is a catalog of 2200+ repositories. It has been redesigned from the first attempt and captures useful information for understanding the community; like software used, activity of each repository (update frequency), counrty, purpose etc. Much the same will be useful to the data-cat-cat.

Something like http://data-ac-uk.ecs.soton.ac.uk/ (maybe becoming http://data.ac.uk at some point) would be one of the things which would feed this monster.

Conclusion

All in all a great trip, except for the flight back where pilot wasn’t sure if the landing flaps were working so we circled for about an hour and at one point he came out with a torch to have a look at the wings! All was fine and the poor ambulance drivers and firemen had a wasted trip to the airport. Still, better to have them there and not need them!

Jonathan Gray has transfered the notes from the meeting to a wiki.

Posted in Repositories.

Tagged with Data.

3 comments

By Christopher Gutteridge – May 5, 2011

A vim one-liner for expanding RDF namespace prefixes

I often make use of prefix.cc to expand RDF namespace prefixes, e.g. expanding
foaf -> http://xmlns.com/foaf/0.1/
dcterms -> http://purl.org/dc/terms/

They’ve got a great API for using the site, and can output the expanded prefixes in a bunch of formats.

I usually edit files using vim, so I figured that being able to use prefix.cc without having to leave the editor or change windows would be pretty handy.

After a few attempts, I came up with the macro below. It binds ‘\p’ (type a backslash then a ‘p’ character into vim from command mode) to a macro that takes the word the cursor is under, queries prefix.cc with it, and replaces the original word with the expanded namespace, or with the text NOTFOUND.

To use it, first bind the macro to a \p by running the following in vim (make sure it’s all on one line!):

:nmap <silent> \p <ESC>:let @p = system("CO=$(curl -s http://prefix.cc/".expand("<cword>").".file.txt);NL=$(echo \"$CO\"\|wc -l);if [ \"$NL\" -gt 1 ];then echo -n \"NOTFOUND\";else echo \"$CO\"\|cut -f2\|tr -d '\n';fi")<CR>ciw<C-r>p<ESC>

Or to use it permanently, remove the leading ‘:’ character, and copy it into your ~/.vimrc file.

To test it:

1. Enter the word ‘foaf’ somewhere in a text file
2. Move the cursor over the word ‘foaf’ in command mode
3. Press ‘\’ then press ‘p’
4. The word foaf should be replaced with http://xmlns.com/foaf/0.1/

Any suggestions for improvements are welcome!

Caveats!

This has only been tested on Ubuntu linux, results may vary on other systems
The macro uses the named buffer "p in vim, so will need changing if you use this for anything important
The macro assumes you’ve got curl installed (it’s used for querying prefix.cc)

Posted in Command Line, RDF.

Tagged with namespace, prefix, RDF, Tips, vim.

No comments

By Dave Challis – April 11, 2011

Filtering and Preprocessing 3rd Party Triples

The data in data.southampton.ac.uk comes from a number of places but here’s the gist;

Data maintained by me (some I hope to hand over to other people in time)
Tabular data maintained by a professional service (ie. a google spreadsheet maintained by catering)
Data dumped from a university database (phonebook, eprints, courses etc.)
Screen Scraped data (the bus data is scraped from council pages, with permission)
Photographs
Tabular data maintained by volunteers (eg. amenities data not yet ‘owned’ by one of the professional services)

While I trust the volunteers not to do anything too silly, they are also constrained by the tool I use to import their data. They can only make certain kinds of statement about certain kinds of thing. There’s still scope for malicious activity, but it’s constrained.

Raw RDF

However, the university Open Wireless club (SOWN) has asked me to import their own RDF feed. This is a difficult decision as it describes wifi nodes of much interest to our members but if I was to import their data as-is then a hypothetical malicious member, or 3rd party exploiting lax security on their website, could add arbitrary triples.

One option is to refuse to accept the RDF data and ask that they provide tabular data (in many ways this is a good option, and may be the default)
Just allow it, keep an eye on it and revoke the privilege.. (I’m not sure this approach would scale to 50 student clubs)
Have a second SPARQL store to contain less trusted data, or clearly mark the graphs to indicate how much control we have over the content (a good idea, but the problem is that we want to display some of this data via our pages.
Manually check the data on import (does not scale!)
Automatically filter their data on import to only allow certain “shapes” of graph.

I really like (5) but (1) is the cheapest and most sustainable.

Filtering Triples

A quick discussion on how to filter a graph suggested that a SPARQL “CONSTRUCT” statement does exactly this. We should take their triples, run an agreed CONSTRUCT query on this data and import the resulting triples. This can filter URIs to be in certain namespaces, and only allow predicates we choose. It can also construct and alter triples if needed.

So what I need is a command-line tool which takes RDF data as an input, one or more SPARQL “CONSTRUCT” queries as a configuration and outputs RDF data.

Ideally a stand-alone command line script. Realistically I’m thinking it may need a spare triple store running.

Ideas?

Posted in Best Practice, Command Line, RDF, Triplestore.

Tagged with Data.

2 comments

By Christopher Gutteridge – March 29, 2011

Easy wins with Southampton building data

We’ve made the first (hopefully of many!) changes on the ECS website to take advantage of date made available at http://data.southampton.ac.uk/.

The change was simply to start linking to building pages (e.g. building 32) anywhere that we list a building by number.

We do this a lot more on our intranet pages, and also on some public pages such as our seminars and events pages.

Though it was a trivial change to implement, it shows the value we can add (maps, photos and building facilities in this case) to our sites and applications just by being able to talk about a thing directly, rather than just about the name of a thing.

Posted in RDF.

No comments

By Dave Challis – March 11, 2011

Data Blog

We’ve been busy setting up and deploying the new university open data service!

We’ll keep techie stuff and web team stuff on this blog, but to read about the non technical side of the data project, head over to our Data Blog.

This blog will be quiet for a while except for the technical details of how the data site works (to keep it from worrying people who are afraid of angle brackets)

Posted in Uncategorized.

Tagged with Data.

No comments

By Christopher Gutteridge – March 9, 2011

SPARQL Query Caching with Nginx

Motivation

As we use our linked data and triplestores to drive more of our sites and services, it’s becoming apparent that a lot of queries to the store will be repeated with the same query parameters each time (especially for “list” type pages, which serve as jumping off points to individual resources).

While 4store does some partial query caching, it makes sense to avoid hitting the store entirely for frequent queries to slow changing data. Using a separate reverse proxy for this means that applications/sites can either be set to use the cached or live store on an individual basis.

SPARQL queries are just HTTP GET requests, so using a tried and tested web cache looked promising.

Varnish, Nginx, Squid and apache’s own mod_cache all looked promising, but Nginx won out in the end, purely for simplicity of setup and configuration (thanks also to Dan Smith for some advice).

Setting Up Nginx

The examples below assume that you’re running as root (or prefixing them with sudo). Exact locations may vary by linux distribution.

1. Install Nginx
On Ubuntu, this was a simple case of running:

  apt-get install nginx

2. Stop the nginx service (if it’s running)

  service nginx stop

3. Disable the default site
We only want Nginx to act as a reverse proxy, so remove the symlink to the default site:

  cd /etc/nginx/sites-enabled
  rm default

4. Create a cache directory for the store

  mkdir -p /var/cache/nginx/ts_cache

5. Make sure that the cache dir is writable by the proxy server
On Ubuntu, the default user for nginx is ‘www-data’. So run:

  chown www-data:www-data /var/cache/nginx/ts_cache

6. Create a config. file for nginx

  cd /etc/nginx/sites-available

Create and edit a file named something like

  001-ts_cache

The following is a minimal config. you’ll need to get up and running, though it’ll need tweaking according to your needs. Full details can be found in the proxy module section of the Nginx wiki.

proxy_cache_path /var/cache/nginx/ts_cache
                            levels=1:2
                            keys_zone=ts_cache:8m
                            max_size=1000m
                            inactive=10m;

server {
        listen   8001 default;
        server_name  localhost;

        access_log  /var/log/nginx/localhost.access.log;

        location / {
                proxy_pass      http://sparql.example.org:8000/;
                proxy_cache     ts_cache;
                proxy_cache_valid       200 302 10m;
                proxy_cache_valid       404 1m;
                proxy_cache_methods     GET HEAD POST;
        }
}

To briefly explain the above:

This sets up a reverse proxy listening on localhost, port 8001. This acts as a cache for the real SPARQL endpoint, which is at http://sparql.example.org:8000/.

It caches any HTTP GET/HEAD/POST requests that result in an HTTP 200 or 302 status for 10 minutes, and caches 404s for 1 minute.

It will cache up to 1000Mb of files, and delete old entries if it exceeds this (max_size=1000m). It will delete files which haven’t been accessed for 10 minutes (inactive=10m).

7. Enable the site config. created above
Create a symlink to the site config. in the sites-enabled directory:

  cd ../sites-enabled
  ln -s ../sites-available/001-ts_cache

8. Restart Ngninx

  service nginx start

If all is well, you should be able to start making requests to your proxy server (on http://localhost:8001/ in the example above). You can also look at the logs in /var/log/nginx/ to keep an eye on things, as well as checking that items are being added to the cache correctly at /var/cache/nginx/.

Posted in 4store, RDF, SPARQL, Triplestore.

2 comments

By Dave Challis – March 3, 2011

RDF / SPARQL debugging

While working on data.southampton.ac.uk, we run into inevitable problems with data failing to load into our triplestore, or with SPARQL queries failing to return in a reasonable time (or at all…).

I thought I’d share a few of the basic steps I take to find the root of these issues.

RDF Import Problems

The main tool I use for debugging raw RDF when it fails to import into a store is rapper.

This is a small command line utility which uses the Raptor library to parse RDF in a number of formats (currently: rdfxml, ntriples, turtle, trig, rss-tag-soup, grddl, rdfa).

It’s extremely fast at parsing, so it a quick way of finding errors in a file.

I’ll usually run something like:
/usr/bin/rapper -i rdfxml input.rdf > /dev/null

This attempts to read input.rdf as RDF/XML, convert it to ntriples, then throw away the output.

Any syntax errors it comes across are reported on the command line. It usually takes seconds to run, even on large (well, ~4m RDF/XML triples large) datasets, and seems much more accurate in spotting mistakes than web based validators I’ve used.

SPARQL Query Problems

Slow queries have accounted for the majority of the problems we’ve had so far.

The thing that isn’t immediately obvious, is that a query which returns a few items can still be generating thousands of intermediate results under the hood.

So far, there have been two underlying causes for this:

Poorly optimised queries
Incorrect source data imported

Triplestore query optimisation isn’t perfect yet, so it’s always worth playing around with changing the order of parts of queries, rather than assuming an optimiser will do it.

Incorrect (but valid) source data being imported into the store has caused us problems a few times. The main culprit has been where large numbers of values have been mapped to a single predicate for a single object.

One example was in our phonebook data, where a single person URI had several hundred foaf:name, foaf:mbox and foaf:phone assigned to it.

The number of results for queries involving this URI quickly exploded for each predicate added to a query.

The easiest way to debug these problems has been to:

Break each query down into atomic parts
Run a SPARQL COUNT first part
Add another part to the query
Run a SPARQL COUNT on the combined parts
Look for significant changes between 2. and 4.

So in the example of our phonebook data error, the initial query was something like:
SELECT * WHERE { ?person a foaf:Person ; foaf:familyName ?family_name ; foaf:givenName ?given_name ; foaf:name ?name ; foaf:mbox ?email ; foaf:phone ?phone . }

Which looked reasonable, until the query took up 10+ Gb of RAM…

Debugging then started with:
SELECT (COUNT(*) AS ?count) WHERE { ?person a foaf:Person . }

Which gave ~2000 results.

Looking just at foaf:name:
SELECT (COUNT(*) AS ?count) WHERE { ?person a foaf:Person . ?person foaf:name ?name . }

This gave ~70000 results rather than the expected ~2000. A quick look through the source data then showed the issue.

It’s pretty simple really, but took us a while to get a feel for how to debug query problems, and what to look out for.

I’ve also started monitoring 4store‘s query log to look for slow queries, as this has been a great indicator of poor data or query structure.

Posted in RDF, SPARQL, Triplestore.

1 comment

By Dave Challis – March 2, 2011

Open in Browser

If you are a web developer who works in whacky formats which your browser is keen to save to disk rather than render, then this Firefox add on-is a godsend: https://addons.mozilla.org/en-us/firefox/addon/open-in-browser/

Posted in Uncategorized.

No comments

By Christopher Gutteridge – February 26, 2011

Surprise uses for open org data

We’re very nearly to launch data.southampton to the public. This week I’ve had two very interesting suggestions for ways the data can be useful.

Last night, a taxi driver suggested my ‘search a map for university buildings by number‘ [link is university-network-only until March 7th] tool would be really useful for their dispatch to find our more obscure buildings.

I was also asked by one of the door staff at The Hobbit for a feed listing any large student events which might mean they need extra staff. Sadly, a bit of investigation suggests that most student events are locked up in the facebook silo and not easy to get at as data.

In other news, I read a very exciting tweet; Edinburgh had a go at publishing a ‘places’ dataset, using our OpenOrg Places recipe! It looks rather spiffy rendered on Google Maps using my GeoRDF to KML service. Although it’s rendering the addresses rather than the buildings themselves, as that’s what they’ve attached lat/long to.

Drupal

Which CMS?

Early Days in Drupal

Modules

Blocks

Menus

CSS Support

Security and Accounts

Database Intergration

Deploying a new site

Linked Open Data

Drupal for All

Why not Joomla (or CMS-X or CMS-Y)?

Data Catalogue Interoperability Meeting

Catalog or Catalogue?

A Dataset by any other Name

Very Simple Protocol

csv2rdf4lod

One Catalog to Rule them All

Conclusion

A vim one-liner for expanding RDF namespace prefixes

Caveats!

Filtering and Preprocessing 3rd Party Triples

Raw RDF

Filtering Triples

Easy wins with Southampton building data

Data Blog

SPARQL Query Caching with Nginx

Motivation

Setting Up Nginx

RDF / SPARQL debugging

RDF Import Problems

SPARQL Query Problems

Open in Browser

Surprise uses for open org data

Authors

Recent Posts

Meta

Blogroll

Tags

Which CMS?

Early Days in Drupal

Modules

Blocks

Menus

CSS Support

Security and Accounts

Database Intergration

Deploying a new site

Linked Open Data

Drupal for All

Why not Joomla (or CMS-X or CMS-Y)?

Catalog or Catalogue?

A Dataset by any other Name

Very Simple Protocol

csv2rdf4lod

One Catalog to Rule them All

Conclusion

Caveats!

Raw RDF

Filtering Triples

Motivation

Setting Up Nginx

RDF Import Problems

SPARQL Query Problems

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags