Skip to content

Fast SPARQL XML Results Parser in Python

For one of our projects we need results from SPARQL endpoints as quickly as possible, with little to no need for validation.

As such, I re-wrote our original SPARQL XML results parser to use Expat, the non-validating (and fast) XML parser.

The results format is a dict in roughly the same as the bindings part of the SPARQL JSON results format.

Example of use:

sp = SparqlParser()
results = sp.Parse(xmlstring)

Code:

import xml.parsers.expat

# Fast Expat based SPARQL stream parser Copyright (c) 2011 Daniel Alexander Smith, University of Southampton
class SparqlParser:

    def __init__(self):
        self.results = []
        self.current = {}
        self.current_name = ""
        self.current_chars = ""
        self.current_type = ""
        self.getting_chars = False
        self.parser = xml.parsers.expat.ParserCreate()
        self.parser.StartElementHandler = self.start_element
        self.parser.EndElementHandler = self.end_element
        self.parser.CharacterDataHandler = self.char_data

    def start_element(self, name, attrs):
        if name == 'binding':
            self.current_name = attrs['name']
        if name == 'literal':
            self.current_type = 'literal'
            self.getting_chars = True
        if name == 'bnode':
            self.current_type = 'bnode'
            self.getting_chars = True
        if name == 'uri':
            self.current_type = 'uri'
            self.getting_chars = True

    def end_element(self, name):
        if name == 'binding':
            self.current[self.current_name] = {'value': self.current_chars, 'type': self.current_type}
            self.current_chars = ""
        if name == 'literal':
            self.getting_chars = False
        if name == 'bnode':
            self.getting_chars = False
        if name == 'uri':
            self.getting_chars = False
        if name == 'result':
            self.results.append(self.current)
            self.current = {}

    def char_data(self, data):
        if self.getting_chars:
            self.current_chars = self.current_chars + data

    def Parse(self, data):
        self.parser.Parse(data, 0)
        return self.results

One Comment

  1. Danny wrote:

    Not sure I understand the problem :)
    Won’t the time taking querying/GETting the results be much much greater than parsing? And if you get results back (HTTP 200 Ok), then I can’t see much chance of them being invalid XML.

    Anyhow – nice work!

    Looks to be a fairly generic SAX parser, so presumably it could be used with libs other than Expat, and probably easily ported to other languages keeping the JSONish bindings.

    Friday, January 28, 2011 at 1:25 pm | Permalink

2 Trackbacks/Pingbacks

  1. [...] This post was mentioned on Twitter by Daniel A Smith, igorop. igorop said: RT @das05r: Fast SPARQL XML results parser: http://blog.soton.ac.uk/enakting/?p=52 #expat #python #opendata #linkeddata #rdf [...]

  2. Scott Banwart's Blog » Blog Archive » Distributed Weekly 87 on Friday, January 28, 2011 at 2:47 pm

    [...] Fast SPARQL XML Results Parser in Python [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*