Example VCL template

Tags: vcl (20)

Varnish’s built-in VCL is very conservative and focuses on not caching stateful and personalized content. By following the HTTP standard for caching rules, Varnish is safe by default. Unfortunately, in many real-life situations where backend servers do not send good caching headers, this will result in a low hit rate.

In this tutorial we’ll present a collection of customizable VCL examples for Varnish. We’ll focus on the individual VCL template features and in the end we’ll bring it all together into a single VCL file.

1. Backend definition

The first step is to define the backend. This is the origin server where Varnish fetches the content from.

backend server1 {
    .host = "127.0.0.1";
    .port = "8080";
    .max_connections = 100;
    .probe = {
        .request =
            "HEAD / HTTP/1.1"
            "Host: localhost"
            "Connection: close"
            "User-Agent: Varnish Health Probe";
        .interval  = 10s;
        .timeout   = 5s;
        .window    = 5;
        .threshold = 3;
    }
    .connect_timeout        = 5s;
    .first_byte_timeout     = 90s;
    .between_bytes_timeout  = 2s;
}

Backend connection information

Every backend definition has a name. In the example above this is server1. The .host and .port attributes contain the address and port number of your backend.

In the example above, the origin, which is probably an Apache or Nginx server, is hosted on the same machine as Varnish. Hence the IP address 127.0.0.1, which corresponds to localhost. Since HTTP clients connect to port 80, the conventional web server is set up to listen to a different port. In the example above, port 8080 is used, and the backend needs to be configured to listen on the same port.

The .max_connections attribute will limit the number of simultaneous backend connections that Varnish establishes. If this limit is exceeded, backend requests will start failing.

Health probe

By defining a health probe, Varnish is made aware of the current health of the backend. The backend is probed at regular intervals and is considered healthy if the backend responds correctly often enough.

Backend health is used first and foremost when load balancing through the vmod_directors module. In these cases, a director load balances between several backends, each representing one server. Backends that are considered sick are not included in the load-balancing rotation.

In VCL, the function std.healthy(backend) can be used to check if a given backend is healthy.

In our sample VCL we send an HTTP HEAD request to the backend. This is done using the .request attribute, which allows us to send a raw HTTP request. This can include any valid HTTP request header.

The health probe is configured to send a request to the backend every ten seconds. This is done through the .interval attribute.

If the backend doesn’t respond in five seconds, the poll is considered unsuccessful. This is configured through the .timeout attribute.

If three out of five polling attempts fail, the backend is considered sick. If the backend comes online again, it is considered healthy if three of the last five polling attempts succeed.

The polling window is configured through the .window attribute and the .threshold attribute defines the number of (un)successful polling attempts that determine the backend health.

Timeouts

Varnish will wait for the backend to respond, but does impose timeouts to avoid excessive waiting times.

In the example above the .connect_timeout attribute is set to five seconds, which means Varnish will wait for a maximum of five seconds while attempting to connect to the backend.

The .first_byte_timeout attribute, which is set to 90 seconds in the example, refers to the amount of time Varnish waits after successfully opening a connection before the first byte is received.

And finally the .between_bytes_timeout specifies the maximum time to wait between bytes when reading the response. In this example it is set to two seconds.

2. Purging ACL

One of the features of the sample VCL template is to allow HTTP-based content purging through the PURGE request method. It’s important to avoid unauthorized access to the purge logic.

Therefore, an ACL (Access Control List) should be defined that lists the IP addresses, hostnames and subnets that are allowed to purge content.

Here’s the VCL code that defines an ACL called purge and only allows access from the local server:

acl purge {
    "localhost";
    "127.0.0.1";
    "::1";
}

3. Host header normalization

If your Varnish server listens for incoming connections on multiple ports or port numbers other than 80, the port number may end up in the Host header.

For requests to http://example.com:80, the port number will probably be stripped off by your web client, but for other port numbers that won’t be the case.

Because Varnish uses the Host header as part of the object hash, it makes sense to strip off the port number in VCL. Otherwise multiple hashes will be created for what ultimately is the same content. This will reduce the cache capacity and lower the hit rate.

The following VCL code removes the port number from the Host header to avoid this unwanted duplication:

sub vcl_recv {
    set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
}

4. Httpoxy mitigation

Httpoxy is a set of vulnerabilities that affect certain applications through the Proxy request header.

The general advice is to remove this header to mitigate the impact of httpoxy.

Here’s the VCL code to do this:

sub vcl_recv {
    unset req.http.proxy;
}

5. Sorting query string parameters

Varnish uses both the URL and the Host header to compose the hash that identifies objects in cache. As you’ve seen earlier, Host header normalization is required to avoid cache duplication. However, this also applies to the URL.

Query string parameters, such as, for example /?id=1&gid=5, are also part of the URL, and although their order doesn’t impact the response, it does impact the hash.

That’s why we encourage you to alphabetically sort the query string parameters using the following VCL code:

import std;

sub vcl_recv {
    set req.url = std.querysort(req.url);
}

6. Stripping off a trailing question mark

Here’s another attempt at avoiding cache duplication: removing a trailing question mark.

If your URL is /?, the question mark indicates that query string parameters will follow. But if there are none, the trailing question mark serves no purpose.

The following VCL code will strip off the trailing question mark:

sub vcl_recv {
    set req.url = regsub(req.url, "\?$", "");
}

7. Removing Google Analytics URL parameters

A lot of websites use Google Analytics to analyze their traffic. Specific query string parameters are added to the URL to track the user journey.

Although they are beneficial to those who analyze incoming traffic, they are responsible for cache duplication. Removing these parameters has no detrimental effect on the tracking: Google Analytics does not rely on server-side code and only uses client-side JavaScript.

Here’s how to remove the Google Analytics query string parameters:

sub vcl_recv {
    if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
        set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
        set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
        set req.url = regsub(req.url, "\?&", "?");
        set req.url = regsub(req.url, "\?$", "");
    }
  }

8. Allow purging

Earlier on, we defined an ACL that contains the IP addresses, hostnames and subnets of the clients that are allowed to purge.

Now it’s time to match the client IP address to this ACL, prevent unauthorized clients from purging, and execute return(purge) for authorized clients.

Here’s the VCL code:

sub vcl_recv {
    if (req.method == "PURGE") {
        if (!client.ip ~ purge) {
            return (synth(405, client.ip + " is not allowed to send PURGE requests."));
        }
        return (purge);
    }
}

The !client.ip ~ purge statement will evaluate to true if the client IP address doesn’t match an entry from the purge ACL that we defined earlier. If that is the case, a synthetic response is returned with a 405 Method Not Allowed status.

If the request method equals PURGE and that client IP address matches the purge ACL, return(purge) is used to purge the requested object from the cache.

9. Dealing with websockets

Although websockets use the same TCP connection as the incoming HTTP request, websockets do not use the HTTP protocol to communicate. Since Varnish only supports HTTP, sending the request directly to the backend using the normal return(pass) procedure will not suffice.

Instead a return(pipe) is required: this opens up the TCP connection to the backend and sends the upgrade request. The websocket data that is returned will not be interpreted by Varnish, and no attempt is made to treat the response as HTTP.

This is the VCL code that is required to make this happen:

sub vcl_recv {
    if (req.http.Upgrade ~ "(?i)websocket") {
        return (pipe);
    }
}

sub vcl_pipe {
    if (req.http.upgrade) {
        set bereq.http.upgrade = req.http.upgrade;
    }
    return (pipe);
}

10. Piping other non-HTTP content

Although the previous VCL snippet was tailored around websockets, there are other situations that grant the use of return(pipe).

When the request method is not one of the following, we can conclude that we’re not dealing with a valid HTTP request:

  • GET
  • HEAD
  • PUT
  • POST
  • TRACE
  • OPTIONS
  • PATCH
  • DELETE

Here’s the VCL code to pipe content to the backend when the request method doesn’t match our expectations:

sub vcl_recv {
    if (req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "PATCH" &&
        req.method != "DELETE") {
        return (pipe);
    }
}

If you know that your site only supports a subset of the above, you should consider synthesizing a response in Varnish indicating this:

sub vcl_recv {
    if (req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "POST" &&
        req.method != "OPTIONS") {
        return (synth(405, "Method Not Allowed")));
    }
}

sub vcl_synth {
    if (resp.status == 405) {
        set resp.http.Allow = "GET, HEAD, POST, OPTIONS";
	      set resp.body = "Method not allowed";
	      return (deliver);
    }
}

11. Only cache GET and HEAD requests

Now that we’re certain that we’re dealing with actual HTTP requests, we can decide to bypass the cache for non-cacheable request methods.

Under normal circumstances, only GET and HEAD requests are cached. Other request methods, such as POST or DELETE, are designed to explicitly change the requested resource.

Why would you cache a resource that will be changed once it is called? That’s why the following VCL example will bypass the cache when a request method other than GET or HEAD is used:

sub vcl_recv {
    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }
}

12. Remove tracking cookies

Varnish’s built-in VCL behavior bypasses the cache when a Cookie request header is detected. As mentioned at the start of this tutorial: Varnish is very conservative when it comes to its standard behavior.

Because cookies imply a level of personalization, caching private data is tricky without knowing the purpose of these cookies.

However, tracking cookies aren’t really an issue: they are handled by the browser and can safely be removed by Varnish to ensure a better hit rate.

Google Analytics

The following VCL code will remove tracking cookies set by Google Analytics:

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "(__utm|_ga|_opt)[a-z_]*=[^;]+(; )?", "");
}

HubSpot

The following VCL code will remove tracking cookies set by HubSpot:

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "(__)?hs[a-z_\-]+=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "hubspotutk=[^;]+(; )?", "");
}

Hotjar

The following VCL code will remove tracking cookies set by Hotjar:

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "_hj[a-zA-Z]+=[^;]+(; )?", "");
}

Google advertising products

The following VCL code will remove tracking cookies set by Google advertising products:

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "(NID|DSID|__gads|GED_PLAYLIST_ACTIVITY|ACLK_DATA|ANID|AID|IDE|TAID|_gcl_[a-z]*|FLC|RUL|PAIDCONTENT|1P_JAR|Conversion|VISITOR_INFO1[a-z_]*)=[^;]+(; )?", "");
}

Other tracking cookies

The list of tracking cookies that was featured in this tutorial only covers some of the big analytics and advertising tools. You might use other tools, which use other cookies.

If your website features other tracking cookies, please strip them off using the code below:

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "cookiename=[^;]+(; )?", "");
}

Replace cookiename with the name of the tracking cookie you want to remove, or a regular expression pattern that matches multiple names.

Remove semicolon prefix

After you have stripped off all tracking cookies, you might be left with a ; prefix as the remaining value of the Cookie header. If that is the case, it needs to be removed as well:

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "^;\s*", "");
}

Remove empty cookies

When stripping off tracking cookies results in an empty string, or a collection of whitespace characters, you can safely remove the entire Cookie header as illustrated below:

sub vcl_recv {
    if (req.http.cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

Being able to remove the entire Cookie header means that there are no functional cookies left and that the entire page is cacheable.

13. Setting the X-Forwarded-Proto header

The X-Forwarded-Proto header is used to transmit the request protocol that was used by the client. This is useful information to Varnish because the open-source version of Varnish doesn’t support native TLS.

The lack of native TLS support is circumvented by the use of a TLS proxy in front of Varnish. The TLS proxy that handles the HTTPS connection should add an X-Forwarded-Proto header to indicate whether or not HTTPS was used as the request protocol.

If that header wasn’t set, the VCL code below will set it to either https or http depending on the protocol that was used for the request:

import std;

sub vcl_recv {
    if (!req.http.X-Forwarded-Proto) {
        if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
            set req.http.X-Forwarded-Proto = "https";
        } else {
            set req.http.X-Forwarded-Proto = "http";
        }
    }
}

sub vcl_hash {
    hash_data(req.http.X-Forwarded-Proto);
}

Varnish has no awareness of the requested protocol and will serve the same object. This might lead to mixed content or even a redirect loop when an HTTP to HTTPS redirection ends up getting cached.

That’s why it makes sense to add the value of the X-Forwarded-Proto header to the hash when requesting the object from cache. This will ensure that each page has an HTTP version and an HTTPS version stored in cache.

13. Caching static content

By now you should be aware that Varnish doesn’t cache when a Cookie request header is presented unless you write the appropriate VCL.

But for static content, such as images, CSS files, JavaScript files and other similar resources, we can force them to be cached. This also means stripping off any potential cookies.

The following VCL snippet will force caching for requests matching the following file extensions:

  • 7z
  • avi
  • bmp
  • bz2
  • css
  • csv
  • doc
  • docx
  • eot
  • flac
  • flv
  • gif
  • gz
  • ico
  • jpeg
  • jpg
  • js
  • less
  • mka
  • mkv
  • mov
  • mp3
  • mp4
  • mpeg
  • mpg
  • odt
  • ogg
  • ogm
  • opus
  • otf
  • pdf
  • png
  • ppt
  • pptx
  • rar
  • rtf
  • svg
  • svgz
  • swf
  • tar
  • tbz
  • tgz
  • ttf
  • txt
  • txz
  • wav
  • webm
  • webp
  • woff
  • woff2
  • xls
  • xlsx
  • xml
  • xz
  • zip
sub vcl_recv {
    if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        unset req.http.Cookie;
	      unset req.http.Authorization
	      # Only keep the following if VCL handling is complete
        return(hash);
    }
}

sub vcl_backend_response {
    if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        unset beresp.http.Set-Cookie;
        set beresp.ttl = 1d;
    }
}

This means that the Cookie and Authorization request headers are stripped off and the request is immediately looked up in cache. In case of a cache miss, a potential Set-Cookie response header is stripped off as well and the Time-To-Live of the object is set to a day.

14. ESI support

When your application supports Edge Side Includes (ESI), we need to write some VCL code to enable ESI parsing.

The idea is that we announce ESI support through the Surrogate-Capability: key=ESI/1.0 request header. When the application notices this, it should send a corresponding Surrogate-Control header that also contains ESI/1.0. When Varnish sees this Surrogate-Control response header, it will enable ESI parsing as illustrated below:

sub vcl_recv {
    set req.http.Surrogate-Capability = "key=ESI/1.0";
}

sub vcl_backend_response {
    if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
        unset beresp.http.Surrogate-Control;
        set beresp.do_esi = true;
    }
}

15. Setting grace mode

When content has expired from the cache, Varnish needs to revalidate that content with the origin server. This usually involves a backend fetch, and the client that requested the content has to wait until the origin responds. This could potentially be detrimental to the user experience.

Thanks to grace mode, Varnish can asynchronously revalidate content while serving the stale version to the client.

The standard grace value is set to ten seconds, which means content will be asynchronously revalidated until an object is ten seconds past the expiration of its Time-To-Live.

If you want to change this value in your VCL file, you can use the following code:

import std;

sub vcl_recv {
    if (std.healthy(req.backend_hint)) {
        set req.grace = 10s;
    }
}

sub vcl_backend_response {
    set beresp.grace = 6h;
}

This example will set the grace value to six hours, but will only use ten seconds of grace at request time when the backend is healthy.

If the origin server is down, Varnish will still serve an outdated version of the object for another six hours, which can potentially be a lifesaver. When the origin recovers, the grace value of ten seconds will be used again.

16. Putting it all together

Here’s the all-in-one VCL file that has all the previous snippets:

vcl 4.1;

import std;

backend server1 {
    .host = "127.0.0.1";
    .port = "8080";
    .max_connections = 100;
    .probe = {
        .request =
            "HEAD / HTTP/1.1"
            "Host: localhost"
            "Connection: close"
            "User-Agent: Varnish Health Probe";
        .interval  = 10s;
        .timeout   = 5s;
        .window    = 5;
        .threshold = 3;
    }
    .connect_timeout        = 5s;
    .first_byte_timeout     = 90s;
    .between_bytes_timeout  = 2s;
}

acl purge {
    "localhost";
    "127.0.0.1";
    "::1";
}

sub vcl_recv {
    set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
    unset req.http.proxy;
    set req.url = std.querysort(req.url);
    set req.url = regsub(req.url, "\?$", "");
    set req.http.Surrogate-Capability = "key=ESI/1.0";

    if (std.healthy(req.backend_hint)) {
        set req.grace = 10s;
    }

    if (!req.http.X-Forwarded-Proto) {
        if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
            set req.http.X-Forwarded-Proto = "https";
        } else {
            set req.http.X-Forwarded-Proto = "https";
        }
    }

    if (req.http.Upgrade ~ "(?i)websocket") {
        return (pipe);
    }

    if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
        set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
        set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
        set req.url = regsub(req.url, "\?&", "?");
        set req.url = regsub(req.url, "\?$", "");
    }

    if (req.method == "PURGE") {
        if (!client.ip ~ purge) {
            return (synth(405, client.ip + " is not allowed to send PURGE requests."));
        }
        return (purge);
    }

    if (req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "PATCH" &&
        req.method != "DELETE") {
        return (pipe);
    }

    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }

    if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        unset req.http.Cookie;
        return(hash);
    }

    set req.http.Cookie = regsuball(req.http.Cookie, "(__utm|_ga|_opt)[a-z_]*=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "(__)?hs[a-z_\-]+=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "hubspotutk=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "_hj[a-zA-Z]+=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "(NID|DSID|__gads|GED_PLAYLIST_ACTIVITY|ACLK_DATA|ANID|AID|IDE|TAID|_gcl_[a-z]*|FLC|RUL|PAIDCONTENT|1P_JAR|Conversion|VISITOR_INFO1[a-z_]*)=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^;\s*", "");

    if (req.http.cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

sub vcl_hash {
    hash_data(req.http.X-Forwarded-Proto);
}

sub vcl_backend_response {
    if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        unset beresp.http.Set-Cookie;
        set beresp.ttl = 1d;
    }

    if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
        unset beresp.http.Surrogate-Control;
        set beresp.do_esi = true;
    }

    set beresp.grace = 6h;
}