Varnish, Plone and nginx
Because it's dumb to hit a Plone instance for the same unchanging information time and time again, it's a smart idea to put a caching proxy in front of any Plone instance. Traditionally Squid was used, but Squid is a nasty beast. Some recent apps have appeared since I last played with Plone several years ago, in particular Varnish. The only problem, it can't serve content on its own. It must proxy to other servers. Fortunately Varnish is extremely flexible and makes it dead simple to get the desired result with the help of an additional lightweight web server. This should work with any other app server, be it Zope or a burly Java monstrosity.
Now you may be wondering, "Why don't you just use Apache?" Apache is designed to be a web server, serving up its own documents, running its own apps using its own modules. The proxy feature is an afterthought, although combined with mod_rewrite it is quite powerful. Let's face it. If you want to run a PHP app or a mod_perl app, use Apache. Otherwise, use something else.
Rob pointed me to nginx, which will certainly work as a load balancer, but it doesn't have a caching module. There is a a third party module called ncache, but it looks rather scary. Based on Google, many Plone installs tend to use Varnish, so I decided to use it as well. As mentioned earlier, it doesn't support serving its own content, only proxying and caching for other servers. I have one particular directory off johnhavard.com for static content that I've used for a couple of years now, mostly taking the place of wisedonkey.sevensages.org, and it would cause great pain to move it elsewhere. I needed another web server to handle that one single directory and the ability to map it into what Varnish thinks is johnhavard.com. As I found nginx to be rediculously fast, I decided to stay with it for the static content. Now on to making it all work.
The varnish config generator that's part of the CacheFu project is broken, so I won't link to it. It generates VCL, that's the Varnish Configuration Language, for some version of varnish that's not the current release. I was, however, able to take the generated config and turn it into something useful. This config is mostly what's emitted by the CacheFu config generator. It isn't perfect, and I didn't put much thought into it other than getting it to work. I'll update this article in the future with a cleaner config.
/* plone and static server varnish.vcl */
backend backend_1 {
.host = "127.0.0.1";
.port = "8080";
}
backend nginx {
.host = "127.0.0.1";
.port = "8186";
}
acl purge {
"localhost";
"127.0.0.1";
"127.0.0.1";
}
sub vcl_recv {
/* Find the right backend for this request */
if (!req.url) {
error 404 "Unknown host";
} elseif (req.http.host ~ "^johnhavard.com") {
if (req.url ~ "^/(stuff|stuff/.*|blah|blah/.*)") {
set req.backend = nginx;
} else {
set req.backend = backend_1;
set req.url = regsub(req.url, "(.*)",
"/VirtualHostBase/http/johnhavard.com:80/johnhavard.com/VirtualHostRoot\1");
}
} elseif (req.http.host ~ "^www.johnhavard.com") {
if (req.url ~ "^/(stuff|stuff/.*|blah|blah/.*)") {
set req.backend = nginx;
} else {
set req.backend = backend_1;
set req.url = regsub(req.url, "(.*)",
"/VirtualHostBase/http/www.johnhavard.com:80/johnhavard.com/VirtualHostRoot\1");
}
} else {
error 404 "Unknown host";
}
/* Do not cache if request is not GET or HEAD */
if (req.request != "GET" && req.request != "HEAD") {
/* Forward to 'lookup' if request is an authorized PURGE request */
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
lookup;
}
set req.http.connection = "close";
pipe;
}
/* Do not cache if request contains an Expect header */
if (req.http.Expect) {
set req.http.connection = "close";
pipe;
}
/* Varnish doesn't do INM requests so pass it through */
if (req.http.If-None-Match) {
pass;
}
lookup;
}
sub vcl_hit {
if (req.request == "PURGE") {
purge_url(req.url);
error 200 "Purged";
}
if (!obj.cacheable) {
pass;
}
deliver;
}
sub vcl_miss {
/* Varnish doesn't do IMS to backend, so if not in cache just pass it through */
if (req.http.If-Modified-Since) {
pass;
}
if (req.request == "PURGE") {
error 404 "Not in cache";
}
fetch;
}
sub vcl_fetch {
if (!obj.cacheable) {
pass;
}
if (obj.http.Set-Cookie) {
pass;
}
/* Do not cache if response contains any 'no cache' tokens */
if (obj.http.Cache-Control ~ "(private|no-cache|no-store)") {
pass;
}
/* Do not cache if request contains an Authorization header,
* unless response is 'public'
*/
if (req.http.Authorization && !obj.http.Cache-Control ~ "public") {
pass;
}
deliver;
}
Next up is nginx.conf, which is fairly straight forward.
# nginx.conf
worker_processes 1;
events {
use kqueue;
worker_connections 4096;
}
http {
# port_in_redirect off IS VERY IMPORTANT!!
port_in_redirect off;
client_max_body_size 32m;
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 300;
server {
listen 8186;
server_name www.johnhavard.com johnhavard.com;
access_log logs/jhdotcom.access;
location / {
root /www/jhdotcom;
index index.html index.htm;
autoindex on;
}
}
}
There was one problem with my initial config. If you went to /blah or some other directory under that without the trailing slash, it would redirect to port 8186, which isn't very amusing when you have it bound to localhost! The port_in_redirect off directive is very important. It strips out the port number on the redirect which makes it play nicely with a front end proxy.
With this, the configuration is complete. You have a fast caching proxy in front of your Plone install, and a fast web server for everything else mapped into the URL space for your domain. As mentioned earlier, this config isn't perfect but it will get you running. I'll try to clean it up in the future and keep this page updated.
