Skip to content


SPARQL Query Caching with Nginx

Motivation

As we use our linked data and triplestores to drive more of our sites and services, it’s becoming apparent that a lot of queries to the store will be repeated with the same query parameters each time (especially for “list” type pages, which serve as jumping off points to individual resources).

While 4store does some partial query caching, it makes sense to avoid hitting the store entirely for frequent queries to slow changing data. Using a separate reverse proxy for this means that applications/sites can either be set to use the cached or live store on an individual basis.

SPARQL queries are just HTTP GET requests, so using a tried and tested web cache looked promising.

Varnish, Nginx, Squid and apache’s own mod_cache all looked promising, but Nginx won out in the end, purely for simplicity of setup and configuration (thanks also to Dan Smith for some advice).

Setting Up Nginx

The examples below assume that you’re running as root (or prefixing them with sudo). Exact locations may vary by linux distribution.

1. Install Nginx
On Ubuntu, this was a simple case of running:

  apt-get install nginx

2. Stop the nginx service (if it’s running)

  service nginx stop

3. Disable the default site
We only want Nginx to act as a reverse proxy, so remove the symlink to the default site:

  cd /etc/nginx/sites-enabled
  rm default

4. Create a cache directory for the store

  mkdir -p /var/cache/nginx/ts_cache

5. Make sure that the cache dir is writable by the proxy server
On Ubuntu, the default user for nginx is ‘www-data’. So run:

  chown www-data:www-data /var/cache/nginx/ts_cache

6. Create a config. file for nginx

  cd /etc/nginx/sites-available

Create and edit a file named something like

  001-ts_cache

The following is a minimal config. you’ll need to get up and running, though it’ll need tweaking according to your needs. Full details can be found in the proxy module section of the Nginx wiki.

proxy_cache_path /var/cache/nginx/ts_cache
                            levels=1:2
                            keys_zone=ts_cache:8m
                            max_size=1000m
                            inactive=10m;

server {
        listen   8001 default;
        server_name  localhost;

        access_log  /var/log/nginx/localhost.access.log;

        location / {
                proxy_pass      http://sparql.example.org:8000/;
                proxy_cache     ts_cache;
                proxy_cache_valid       200 302 10m;
                proxy_cache_valid       404 1m;
                proxy_cache_methods     GET HEAD POST;
        }
}

To briefly explain the above:

This sets up a reverse proxy listening on localhost, port 8001. This acts as a cache for the real SPARQL endpoint, which is at http://sparql.example.org:8000/.

It caches any HTTP GET/HEAD/POST requests that result in an HTTP 200 or 302 status for 10 minutes, and caches 404s for 1 minute.

It will cache up to 1000Mb of files, and delete old entries if it exceeds this (max_size=1000m). It will delete files which haven’t been accessed for 10 minutes (inactive=10m).

7. Enable the site config. created above
Create a symlink to the site config. in the sites-enabled directory:

  cd ../sites-enabled
  ln -s ../sites-available/001-ts_cache

8. Restart Ngninx

  service nginx start

If all is well, you should be able to start making requests to your proxy server (on http://localhost:8001/ in the example above). You can also look at the logs in /var/log/nginx/ to keep an eye on things, as well as checking that items are being added to the cache correctly at /var/cache/nginx/.

Posted in 4store, RDF, SPARQL, Triplestore.


2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. g-hennux says

    Note: At least for Joseki as a backend, you should add

    proxy_ignore_headers Cache-Control

    to the config, as Joseki (at least within Jetty) has “Cache-Control: no-cache” set by default, which will make the cache superfluous.

Continuing the Discussion

  1. SPARQL Nginx HOWTO | Nginx Lighttpd Tutorial linked to this post on June 1, 2011

    […] SPARQL Query Caching with Nginx […]



Some HTML is OK

or, reply to this post via trackback.