Skip to content


Searching a SPARQL endpoint

Recently, OUseful Blog has been talking about how to get started hacking SPARQL queries. So here’s a simple one. It looks for things with a search string in their name, title or label:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?thing ?name ?type {
 { ?thing foaf:name ?name }
 UNION { ?thing rdfs:label ?name }
 UNION { ?thing dct:title ?name  }
 OPTIONAL { ?thing a ?type }
 FILTER (REGEX(?name, "YOUR-QUERY-STRING-HERE","i"))
}

For example, searching for Ventnor in the Ordnance Survey. I suspect it’s not that fast because it’s actually having to work through filtering a huge pile of data. I thought searching for “^Ventnor” might be faster (It would in SQL as the indexes can do string-starts-with quickly), but it doesn’t seem to be. Advice on optimising?

If people are interested, I could add this as an option to the Graphite SPARQL Browser.

SPARQL/SQL Translation

For SQL users, UNION is in effect an “OR”, OPTIONAL can be thought of as “LEFT JOIN” and FILTER as a WHERE.

If the SPARQL endpoint were an SQL database, it would be a single table containing three columns, subject, preficate and object. (Yes I’m skipping some stuff here to keep it simple). I’m going to remove the UNION for now as that’s basically like running several SELECTs and merging the results. Note that “a” is an alias for “rdf:type”.

SELECT DISTINCT ?thing ?name ?type {
 { ?thing foaf:name ?name }
 OPTIONAL { ?thing rdf:type ?type }
 FILTER (REGEX(?name, "YOUR-QUERY-STRING-HERE","i"))
}
SELECT DISTINCT t1.subject, t1.object, t2.object FROM
triples AS t1,
LEFT JOIN triples AS t2
ON t1.subject = t2.subject
WHERE t1.predicate = 'http://xmlns.com/foaf/0.1/name'
AND ( t2.predicate = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' OR t2.predicate = NULL )
AND ( t1.object LIKE '%your string here%' )

Although I’ve changed the regexp to a LIKE. I’m not 100% sure I’ve got this entirely correct, but it should give an SQL hacker a feel for what’s going on. Every triple in the SPARQL select is effectively an inner join where the named parts ?foo are joined to the columns they were associated with in the previous triples. You can do some very funky things in SPARQL, but you need to get joins from lesson one. Even a trivial query¬† on a property of a field will probably require you to add a { ?item a “foaf:Person” } or you’ll get all things of all types which isn’t what you’re going to want.

I think that as RDF and the semantic web achieves escape velocity [PDF], we’ll need to make some tutorials for people who just want to get the job done. Right now we’re still working with almost entirely early adoptors. We need to make getting data out of SPARQL achievable for people who don’t really care. I found a PHP library for working with SPARQL, but it seems to be from more than 5 years ago. Perhaps I should write a SPARQL library which looks like an SQL library? sparql_query() sparql_connect() etc? (comment if it’s worth my time…)

Redirect to SPARQL

Dave Challis had an interesting suggestion yesterday… Making a URL which accepts a ?q=XXX query and redirects to the SPARQL query that searches relevant labels in our endpoint. That way we select which predicates we consider labels, and it gently cues people into SPARQL without forcing the initial learning curve.

Update:

@ldodds points out that the O.S. endpoint has some funky Talis features, so that there is a simple search API. Which gives pretty useful results. I’ve seen, passing, searches which return RSS, but what I’d not realised until today was that the RSS contains lots of useful triples, so in effect it’s just a structured list of RDF descriptions. This approach looks very useful for some usecases I’ve been thinking of. Specifically how to make it easy to search an organisation’s datasets. For example, how to find a building at southamton university when all you know is “Zepler”.

Posted in Uncategorized.


2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Ian Millard says

    In general triplestores are not designed to be used for string searching.

    Whereas a SQL dbase is likely to have internal indexes and/or table ordering over particular text fields, this is not usually the case in a triplestore. A REGEX usually requires the equivalent of a full table scan… so adding a ^ is unlikely to help much.

Continuing the Discussion

  1. My Understanding of SPARQL, the First Attempt… « OUseful.Info, the blog… linked to this post on November 29, 2010

    [...] @cgutteridge’s Searching a SPARQL Endpoint demonstrates a useful ‘get you started’ query for exploring a real datastore (look for [...]



Some HTML is OK

or, reply to this post via trackback.