For one of our projects we need results from SPARQL endpoints as quickly as possible, with little to no need for validation.
As such, I re-wrote our original SPARQL XML results parser to use Expat, the non-validating (and fast) XML parser.
The results format is a dict in roughly the same as the bindings part of the SPARQL JSON results format.
Example of use:
sp = SparqlParser() results = sp.Parse(xmlstring)
Code:
import xml.parsers.expat
# Fast Expat based SPARQL stream parser Copyright (c) 2011 Daniel Alexander Smith, University of Southampton
class SparqlParser:
def __init__(self):
self.results = []
self.current = {}
self.current_name = ""
self.current_chars = ""
self.current_type = ""
self.getting_chars = False
self.parser = xml.parsers.expat.ParserCreate()
self.parser.StartElementHandler = self.start_element
self.parser.EndElementHandler = self.end_element
self.parser.CharacterDataHandler = self.char_data
def start_element(self, name, attrs):
if name == 'binding':
self.current_name = attrs['name']
if name == 'literal':
self.current_type = 'literal'
self.getting_chars = True
if name == 'bnode':
self.current_type = 'bnode'
self.getting_chars = True
if name == 'uri':
self.current_type = 'uri'
self.getting_chars = True
def end_element(self, name):
if name == 'binding':
self.current[self.current_name] = {'value': self.current_chars, 'type': self.current_type}
self.current_chars = ""
if name == 'literal':
self.getting_chars = False
if name == 'bnode':
self.getting_chars = False
if name == 'uri':
self.getting_chars = False
if name == 'result':
self.results.append(self.current)
self.current = {}
def char_data(self, data):
if self.getting_chars:
self.current_chars = self.current_chars + data
def Parse(self, data):
self.parser.Parse(data, 0)
return self.results
One Comment
Not sure I understand the problem
Won’t the time taking querying/GETting the results be much much greater than parsing? And if you get results back (HTTP 200 Ok), then I can’t see much chance of them being invalid XML.
Anyhow – nice work!
Looks to be a fairly generic SAX parser, so presumably it could be used with libs other than Expat, and probably easily ported to other languages keeping the JSONish bindings.
2 Trackbacks/Pingbacks
[...] This post was mentioned on Twitter by Daniel A Smith, igorop. igorop said: RT @das05r: Fast SPARQL XML results parser: http://blog.soton.ac.uk/enakting/?p=52 #expat #python #opendata #linkeddata #rdf [...]
[...] Fast SPARQL XML Results Parser in Python [...]
Post a Comment