{"id":668,"date":"2011-03-03T15:47:46","date_gmt":"2011-03-03T15:47:46","guid":{"rendered":"http:\/\/blog.soton.ac.uk\/webteam\/?p=668"},"modified":"2011-03-03T15:47:46","modified_gmt":"2011-03-03T15:47:46","slug":"sparql-query-caching-with-nginx","status":"publish","type":"post","link":"https:\/\/blog.soton.ac.uk\/webteam\/2011\/03\/03\/sparql-query-caching-with-nginx\/","title":{"rendered":"SPARQL Query Caching with Nginx"},"content":{"rendered":"<h2>Motivation<\/h2>\n<p>As we use our linked data and triplestores to drive more of our sites and services, it&#8217;s becoming apparent that a lot of queries to the store will be repeated with the same query parameters each time (especially for &#8220;list&#8221; type pages, which serve as jumping off points to individual resources).<\/p>\n<p>While <a href=\"http:\/\/4store.org\/\">4store<\/a> does some partial query caching, it makes sense to avoid hitting the store entirely for frequent queries to slow changing data.  Using a separate <a href=\"http:\/\/en.wikipedia.org\/wiki\/Reverse_proxy\">reverse proxy<\/a> for this means that applications\/sites can either be set to use the cached or live store on an individual basis.<\/p>\n<p>SPARQL queries are just HTTP GET requests, so using a tried and tested web cache looked promising.<\/p>\n<p><a href=\"http:\/\/www.varnish-cache.org\/\">Varnish<\/a>, <a href=\"http:\/\/wiki.nginx.org\/\">Nginx<\/a>, <a href=\"http:\/\/www.squid-cache.org\/\">Squid<\/a> and apache&#8217;s own <a href=\"http:\/\/httpd.apache.org\/docs\/2.2\/mod\/mod_cache.html\">mod_cache<\/a> all looked promising, but Nginx won out in the end, purely for simplicity of setup and configuration (thanks also to <a href=\"http:\/\/users.ecs.soton.ac.uk\/das05r\/\">Dan Smith<\/a> for some advice).<\/p>\n<h2>Setting Up Nginx<\/h2>\n<p>The examples below assume that you&#8217;re running as root (or prefixing them with sudo).  Exact locations may vary by linux distribution.<\/p>\n<p>1. Install Nginx<br \/>\nOn Ubuntu, this was a simple case of running:<\/p>\n<pre>  <code>apt-get install nginx<\/code><\/pre>\n<p>2. Stop the nginx service (if it&#8217;s running)<\/p>\n<pre>  <code>service nginx stop<\/code><\/pre>\n<p>3. Disable the default site<br \/>\nWe only want Nginx to act as a reverse proxy, so remove the symlink to the default site:<\/p>\n<pre>\r\n  <code>cd \/etc\/nginx\/sites-enabled<\/code>\r\n  <code>rm default<\/code>\r\n<\/pre>\n<p>4. Create a cache directory for the store<\/p>\n<pre>  <code>mkdir -p \/var\/cache\/nginx\/ts_cache<\/code><\/pre>\n<p>5. Make sure that the cache dir is writable by the proxy server<br \/>\nOn Ubuntu, the default user for nginx is &#8216;www-data&#8217;. So run:<\/p>\n<pre>  <code>chown www-data:www-data \/var\/cache\/nginx\/ts_cache<\/code><\/pre>\n<p>6. Create a config. file for nginx<\/p>\n<pre>  <code>cd \/etc\/nginx\/sites-available<\/code><\/pre>\n<p>Create and edit a file named something like<\/p>\n<pre>  <code>001-ts_cache<\/code><\/pre>\n<p>The following is a minimal config. you&#8217;ll need to get up and running, though it&#8217;ll need tweaking according to your needs.  Full details can be found in the <a href=\"http:\/\/wiki.nginx.org\/HttpProxyModule\">proxy module<\/a> section of the <a href=\"http:\/\/wiki.nginx.org\/Main\">Nginx wiki<\/a>.<\/p>\n<pre><code>proxy_cache_path \/var\/cache\/nginx\/ts_cache\r\n                            levels=1:2\r\n                            keys_zone=ts_cache:8m\r\n                            max_size=1000m\r\n                            inactive=10m;\r\n\r\nserver {\r\n        listen   8001 default;\r\n        server_name  localhost;\r\n\r\n        access_log  \/var\/log\/nginx\/localhost.access.log;\r\n\r\n        location \/ {\r\n                proxy_pass      http:\/\/sparql.example.org:8000\/;\r\n                proxy_cache     ts_cache;\r\n                proxy_cache_valid       200 302 10m;\r\n                proxy_cache_valid       404 1m;\r\n                proxy_cache_methods     GET HEAD POST;\r\n        }\r\n}<\/code><\/pre>\n<p>To briefly explain the above:<\/p>\n<p>This sets up a reverse proxy listening on localhost, port 8001.  This acts as a cache for the real SPARQL endpoint, which is at http:\/\/sparql.example.org:8000\/.<\/p>\n<p>It caches any HTTP GET\/HEAD\/POST requests that result in an HTTP 200 or 302 status for 10 minutes, and caches 404s for 1 minute.<\/p>\n<p>It will cache up to 1000Mb of files, and delete old entries if it exceeds this (max_size=1000m).  It will delete files which haven&#8217;t been accessed for 10 minutes (inactive=10m).<\/p>\n<p>7. Enable the site config. created above<br \/>\nCreate a symlink to the site config. in the sites-enabled directory:<\/p>\n<pre>\r\n  <code>cd ..\/sites-enabled<\/code>\r\n  <code>ln -s ..\/sites-available\/001-ts_cache<\/code>\r\n<\/pre>\n<p>8. Restart Ngninx<\/p>\n<pre>  <code>service nginx start<\/code><\/pre>\n<p>If all is well, you should be able to start making requests to your proxy server (on http:\/\/localhost:8001\/ in the example above).  You can also look at the logs in <code>\/var\/log\/nginx\/<\/code> to keep an eye on things, as well as checking that items are being added to the cache correctly at <code>\/var\/cache\/nginx\/<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Motivation As we use our linked data and triplestores to drive more of our sites and services, it&#8217;s becoming apparent that a lot of queries to the store will be repeated with the same query parameters each time (especially for &#8220;list&#8221; type pages, which serve as jumping off points to individual resources). While 4store does [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[4226,136,411,4227],"tags":[],"class_list":["post-668","post","type-post","status-publish","format-standard","hentry","category-4store","category-rdf","category-sparql","category-triplestore"],"_links":{"self":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts\/668","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/comments?post=668"}],"version-history":[{"count":24,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts\/668\/revisions"}],"predecessor-version":[{"id":692,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts\/668\/revisions\/692"}],"wp:attachment":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/media?parent=668"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/categories?post=668"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/tags?post=668"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}