Southampton Web and Data Innovation Team

Ideas and Tips from the Team

Categories:

Advertising
AI
Apache
Best Practice
Bitcoin
Command Line
Community
Conference Spam
Conference Website
Data
- Research Data
Database
dev8d
Doug Englebart
Drupal
Events
Gateway to Research
GDPR
Geo
HESA
HTTP
Internet Archive
Intranet
Javascript
Jisc
Management
- Recruitment
Minecraft
Open Data
Open Source
ORCID
OSX
Outreach
Perl
PHP
Programming
python
RDF
- 4store
- Graphite
- SPARQL
- Triplestore
Repositories
Sharepoint
SQL
Team
Templates
Terms and Conditions
testing
Tips
Training
Tutorial
twitter
Uncategorized
web management
Wordpress

SPARQL Query Caching with Nginx

Motivation

As we use our linked data and triplestores to drive more of our sites and services, it’s becoming apparent that a lot of queries to the store will be repeated with the same query parameters each time (especially for “list” type pages, which serve as jumping off points to individual resources).

While 4store does some partial query caching, it makes sense to avoid hitting the store entirely for frequent queries to slow changing data. Using a separate reverse proxy for this means that applications/sites can either be set to use the cached or live store on an individual basis.

SPARQL queries are just HTTP GET requests, so using a tried and tested web cache looked promising.

Varnish, Nginx, Squid and apache’s own mod_cache all looked promising, but Nginx won out in the end, purely for simplicity of setup and configuration (thanks also to Dan Smith for some advice).

Setting Up Nginx

The examples below assume that you’re running as root (or prefixing them with sudo). Exact locations may vary by linux distribution.

1. Install Nginx
On Ubuntu, this was a simple case of running:

  apt-get install nginx

2. Stop the nginx service (if it’s running)

  service nginx stop

3. Disable the default site
We only want Nginx to act as a reverse proxy, so remove the symlink to the default site:

  cd /etc/nginx/sites-enabled
  rm default

4. Create a cache directory for the store

  mkdir -p /var/cache/nginx/ts_cache

5. Make sure that the cache dir is writable by the proxy server
On Ubuntu, the default user for nginx is ‘www-data’. So run:

  chown www-data:www-data /var/cache/nginx/ts_cache

6. Create a config. file for nginx

  cd /etc/nginx/sites-available

Create and edit a file named something like

  001-ts_cache

The following is a minimal config. you’ll need to get up and running, though it’ll need tweaking according to your needs. Full details can be found in the proxy module section of the Nginx wiki.

proxy_cache_path /var/cache/nginx/ts_cache
                            levels=1:2
                            keys_zone=ts_cache:8m
                            max_size=1000m
                            inactive=10m;

server {
        listen   8001 default;
        server_name  localhost;

        access_log  /var/log/nginx/localhost.access.log;

        location / {
                proxy_pass      http://sparql.example.org:8000/;
                proxy_cache     ts_cache;
                proxy_cache_valid       200 302 10m;
                proxy_cache_valid       404 1m;
                proxy_cache_methods     GET HEAD POST;
        }
}

To briefly explain the above:

This sets up a reverse proxy listening on localhost, port 8001. This acts as a cache for the real SPARQL endpoint, which is at http://sparql.example.org:8000/.

It caches any HTTP GET/HEAD/POST requests that result in an HTTP 200 or 302 status for 10 minutes, and caches 404s for 1 minute.

It will cache up to 1000Mb of files, and delete old entries if it exceeds this (max_size=1000m). It will delete files which haven’t been accessed for 10 minutes (inactive=10m).

7. Enable the site config. created above
Create a symlink to the site config. in the sites-enabled directory:

  cd ../sites-enabled
  ln -s ../sites-available/001-ts_cache

8. Restart Ngninx

  service nginx start

If all is well, you should be able to start making requests to your proxy server (on http://localhost:8001/ in the example above). You can also look at the logs in /var/log/nginx/ to keep an eye on things, as well as checking that items are being added to the cache correctly at /var/cache/nginx/.

Posted in 4store, RDF, SPARQL, Triplestore.

rev="post-668" 2 comments

By Dave Challis – March 3, 2011

2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

g-hennux says

Note: At least for Joseki as a backend, you should add

proxy_ignore_headers Cache-Control

to the config, as Joseki (at least within Jetty) has “Cache-Control: no-cache” set by default, which will make the cache superfluous.

July 6, 2011, 9:30 pm Reply

Continuing the Discussion

SPARQL Nginx HOWTO | Nginx Lighttpd Tutorial linked to this post on June 1, 2011
[…] SPARQL Query Caching with Nginx […]

« RDF / SPARQL debugging Data Blog »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

SPARQL Query Caching with Nginx

Motivation

Setting Up Nginx

2 Responses

Continuing the Discussion

Authors

Recent Posts

Meta

Blogroll

Tags

SPARQL Query Caching with Nginx

Motivation

Setting Up Nginx

2 Responses

Continuing the Discussion

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags