Southampton Web and Data Innovation Team

Ideas and Tips from the Team

Categories:

Advertising
AI
Apache
Best Practice
Bitcoin
Command Line
Community
Conference Spam
Conference Website
Data
- Research Data
Database
dev8d
Doug Englebart
Drupal
Events
Gateway to Research
GDPR
Geo
HESA
HTTP
Internet Archive
Intranet
Javascript
Jisc
Management
- Recruitment
Minecraft
Open Data
Open Source
ORCID
OSX
Outreach
Perl
PHP
Programming
python
RDF
- 4store
- Graphite
- SPARQL
- Triplestore
Repositories
Sharepoint
SQL
Team
Templates
Terms and Conditions
testing
Tips
Training
Tutorial
twitter
Uncategorized
web management
Wordpress

Searching a SPARQL endpoint

Recently, OUseful Blog has been talking about how to get started hacking SPARQL queries. So here’s a simple one. It looks for things with a search string in their name, title or label:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?thing ?name ?type {
 { ?thing foaf:name ?name }
 UNION { ?thing rdfs:label ?name }
 UNION { ?thing dct:title ?name  }
 OPTIONAL { ?thing a ?type }
 FILTER (REGEX(?name, "YOUR-QUERY-STRING-HERE","i"))
}

For example, searching for Ventnor in the Ordnance Survey. I suspect it’s not that fast because it’s actually having to work through filtering a huge pile of data. I thought searching for “^Ventnor” might be faster (It would in SQL as the indexes can do string-starts-with quickly), but it doesn’t seem to be. Advice on optimising?

If people are interested, I could add this as an option to the Graphite SPARQL Browser.

SPARQL/SQL Translation

For SQL users, UNION is in effect an “OR”, OPTIONAL can be thought of as “LEFT JOIN” and FILTER as a WHERE.

If the SPARQL endpoint were an SQL database, it would be a single table containing three columns, subject, preficate and object. (Yes I’m skipping some stuff here to keep it simple). I’m going to remove the UNION for now as that’s basically like running several SELECTs and merging the results. Note that “a” is an alias for “rdf:type”.

SELECT DISTINCT ?thing ?name ?type {
 { ?thing foaf:name ?name }
 OPTIONAL { ?thing rdf:type ?type }
 FILTER (REGEX(?name, "YOUR-QUERY-STRING-HERE","i"))
}
SELECT DISTINCT t1.subject, t1.object, t2.object FROM
triples AS t1,
LEFT JOIN triples AS t2
ON t1.subject = t2.subject
WHERE t1.predicate = 'http://xmlns.com/foaf/0.1/name'
AND ( t2.predicate = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' OR t2.predicate = NULL )
AND ( t1.object LIKE '%your string here%' )
Although I’ve changed the regexp to a LIKE. I’m not 100% sure I’ve got this entirely correct, but it should give an SQL hacker a feel for what’s going on. Every triple in the SPARQL select is effectively an inner join where the named parts ?foo are joined to the columns they were associated with in the previous triples. You can do some very funky things in SPARQL, but you need to get joins from lesson one. Even a trivial query  on a property of a field will probably require you to add a { ?item a “foaf:Person” } or you’ll get all things of all types which isn’t what you’re going to want.
I think that as RDF and the semantic web achieves escape velocity [PDF], we’ll need to make some tutorials for people who just want to get the job done. Right now we’re still working with almost entirely early adoptors. We need to make getting data out of SPARQL achievable for people who don’t really care. I found a PHP library for working with SPARQL, but it seems to be from more than 5 years ago. Perhaps I should write a SPARQL library which looks like an SQL library? sparql_query() sparql_connect() etc? (comment if it’s worth my time…)
Redirect to SPARQL
Dave Challis had an interesting suggestion yesterday… Making a URL which accepts a ?q=XXX query and redirects to the SPARQL query that searches relevant labels in our endpoint. That way we select which predicates we consider labels, and it gently cues people into SPARQL without forcing the initial learning curve.

Demo of a SPARQL ?q= redirector on the Ordnance Survey endpoint

Update:
@ldodds points out that the O.S. endpoint has some funky Talis features, so that there is a simple search API. Which gives pretty useful results. I’ve seen, passing, searches which return RSS, but what I’d not realised until today was that the RSS contains lots of useful triples, so in effect it’s just a structured list of RDF descriptions. This approach looks very useful for some usecases I’ve been thinking of. Specifically how to make it easy to search an organisation’s datasets. For example, how to find a building at southamton university when all you know is “Zepler”.

Posted in Uncategorized.

rev="post-486" 2 comments

By Christopher Gutteridge – November 10, 2010

2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Ian Millard says

In general triplestores are not designed to be used for string searching.

Whereas a SQL dbase is likely to have internal indexes and/or table ordering over particular text fields, this is not usually the case in a triplestore. A REGEX usually requires the equivalent of a full table scan… so adding a ^ is unlikely to help much.

November 10, 2010, 10:27 am Reply

Continuing the Discussion

My Understanding of SPARQL, the First Attempt… « OUseful.Info, the blog… linked to this post on November 29, 2010
[…] @cgutteridge’s Searching a SPARQL Endpoint demonstrates a useful ‘get you started’ query for exploring a real datastore (look for […]

« What you need to know about RDF+XML Everybody needs a 303 »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

Searching a SPARQL endpoint

SPARQL/SQL Translation

Redirect to SPARQL

2 Responses

Continuing the Discussion

Authors

Recent Posts

Meta

Blogroll

Tags

Searching a SPARQL endpoint

SPARQL/SQL Translation

Redirect to SPARQL

2 Responses

Continuing the Discussion

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags