Configuring Varnish for Drupal

Tags: drupal (1) vcl (29)

Drupal is an open-source content management framework that is popular for mid-market and enterprise solutions that require more complex content types and workflows.

This tutorial is a step-by-step guide on how to configure Varnish for Drupal.

1. Install and configure Varnish

If you are already running a Drupal CMS and you want to use Varnish to accelerate it, you’ll have to decide where to install Varnish:

  • You can install Varnish on a dedicated machine and point your DNS records to that server.
  • You can install Varnish on the same server as your Drupal site.

For a detailed step-by-step Varnish installation guide, we’d like to refer you to one of the following dedicated tutorials:

2. Reconfigure the web server port

The web server that is hosting your Drupal CMS is most likely set up to handle incoming HTTP requests on port 80. For Varnish caching to properly work, Varnish needs to listen on port 80. This also means that your web server needs to be configured on another listening port. We’ll use port 8080 as the new web server listening port.

Depending on the type of web server you’re using, different configuration files need to be modified. Here’s a quick how-to for Apache and Nginx.

Apache

If you’re using Apache as your web server, you need to replace Listen 80 with Listen 8080 in Apache’s main configuration file.

The individual virtual hosts will also contain port information. You will need to replace <VirtualHost *:80> with <VirtualHost *:8080> in all virtual host files.

Here’s how to change Apache’s listening port for various Linux distributions:

These changes will only take effect once Apache is restarted.

Nginx

If you’re using Nginx, you’ll only have to replace listen 80; with listen 8080; in all virtual host files.

Here’s how to change Nginx’s listening port for various Linux distributions:

These changes will only take effect once Nginx is restarted.

3. Install Drupal purging modules

Drupal has a collection of modules that can be used to invalidate the cache. For Drupal to support Varnish, the following modules need to be installed:

The Purge module has the following set of submodules that also should be enabled, depending on your preferences:

  • Purge Drush
  • Purge Tokens
  • Purge UI
  • Purge Cron processor
  • Purge Late runtime processor
  • Purge Core tags queuer

The Generic HTTP Purger module also has a Generic HTTP Tags Header submodule that needs to be enabled.

You can download these modules yourself, or install them from the /admin/modules panel. However, the quickest way to install these modules is by using the following commands:

composer require drupal/purge drupal/purge_purger_http

This command will install the required dependencies via the composer package manager for PHP.

drush en purge_drush \
purge_processor_lateruntime \
purge_queuer_coretags \
purge_processor_cron \
purge_tokens \
purge_ui \
purge \
purge_purger_http \
purge_purger_http_tagsheader

This command will enable the required modules in Drupal.

4. Configure caching and purging in Drupal

The Performance section of the Drupal Administration Configuration allows you to tune caching and cache invalidation settings.

Set cache Time To Live

Before we can configure how Drupal invalidates objects from the cache, we must first ensure the objects are properly stored in cache.

Please follow these steps to configure the caching Time To Live:

  1. Go to the Drupal admin panel
  2. Select Configuration > Performance
  3. Choose a Browser and proxy cache maximum age value
  4. Click the Save configuration button at the bottom of the window

Configuring tag-based cache invalidation

Drupal uses a Purge-Cache-Tags response header to register tags for every page. These tags are cached and can be matched by a ban expressions in Varnish’s ban() VCL function. This will invalidate multiple pages at once.

For this to work, the Purge module needs to be configured. This can be done by following these steps:

  1. Go to the Drupal admin panel
  2. Select Configuration > Performance
  3. Click the Purge tab
  4. Click the Add purger button to add the HTTP Purger
  5. Select the HTTP purger option
  6. Click Add
  7. Select the Configure dropdown option next to the newly created HTTP Purger
  8. Assign a name to the new purger (e.g. Varnish - Tag)
  9. Keep Tag as the selected value of the Type field
  10. Ensure the Request tab is selected
  11. Set the hostname of your Varnish server in the Hostname field (defaults to localhost)
  12. Set the port number of your Varnish server in the Port field (defaults to 80)
  13. Keep / as the value of the Path field
  14. Keep BAN as the selected value of the Request Method field
  15. Keep http the selected value of the Scheme` field
  16. Select the Headers tab
  17. Add a new header by setting Purge-Cache-Tags as the HEADER field value
  18. Set [invalidation:expression] as the value of the VALUE field
  19. Click the Save configuration button at the bottom of the window

When you run the following command, all pages that require an update will be purged from the Varnish cache:

drush p-queue-work

The invalidation will also take place automatically through the Late runtime processor that was also configured. There is also a Cron processor that will process the purge queue at set intervals.

Allow purging all items from the cache

When content on pages changes the Late runtime processor, the Cron processor or drush p-queue-work will ensure that these changes result in the right cache invalidation calls: only the affected content will be purged from the cache.

However there are situations where you want the entire cache to be flushed. The Purge module allows you to do this through the following command:

drush p:invalidate everything

For this to work, the everything invalidation type needs to be configured. The following steps will help you configure this:

  1. Go to the Drupal admin panel
  2. Select Configuration > Performance
  3. Click the Purge tab
  4. Click the Add purger button to add the HTTP Purger
  5. Select the HTTP purger option
  6. Click Add
  7. Select the Configure dropdown option next to the newly created HTTP Purger
  8. Assign a name to the new purger (e.g. Varnish - Everything)
  9. Select Everything as the value of the Type field
  10. Ensure the Request tab is selected
  11. Set the hostname of your Varnish server in the Hostname field (defaults to localhost)
  12. Set the port number of your Varnish server in the Port field (defaults to 80)
  13. Keep / as the value of the Path field
  14. Keep BAN as the selected value of the Request Method field
  15. Keep http the selected value of the Scheme` field
  16. Select the Headers tab
  17. Add a new header by setting Purge-Cache-Tags as the HEADER field value
  18. Set .* as the value of the VALUE field
  19. Click the Save configuration button at the bottom of the window

The drush p:invalidate everything will remove all Drupal pages from cache as soon as the right purger configuration is created.

5. Deploy the custom Drupal VCL

A custom VCL file containing the necessary caching rules is needed to guarantee a decent performance. This file is located in /etc/varnish/default.vcl and also contains the backend definition that Varnish uses to connect to the web server.

The Drupal VCL file

Here’s the complete VCL file you can use:

vcl 4.1;

import std;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

# Add hostnames, IP addresses and subnets that are allowed to purge content
acl purge {
    "localhost";
    "127.0.0.1";
    "::1";
}

sub vcl_recv {
    # Announce support for Edge Side Includes by setting the Surrogate-Capability header
    set req.http.Surrogate-Capability = "Varnish=ESI/1.0";
    
    # Remove empty query string parameters
    # e.g.: www.example.com/index.html?    
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }

    # Remove port number from host header
    set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
    
    # Sorts query string parameters alphabetically for cache normalization purposes.    
    set req.url = std.querysort(req.url);
    
    # Remove the proxy header to mitigate the httpoxy vulnerability
    # See https://httpoxy.org/    
    unset req.http.proxy;

    # Add X-Forwarded-Proto header when using https
    if (!req.http.X-Forwarded-Proto) {
        if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
            set req.http.X-Forwarded-Proto = "https";
        } else {
            set req.http.X-Forwarded-Proto = "http";
        }
    }

    # Ban logic to remove multiple objects from the cache at once. Tailored to Drupal's cache invalidation mechanism
    if(req.method == "BAN") {
        if(!client.ip ~ purge) {
            return(synth(405, "BAN not allowed for this IP address"));
        }
        
        if (req.http.Purge-Cache-Tags) {
            ban("obj.http.Purge-Cache-Tags ~ " + req.http.Purge-Cache-Tags);
        }
        else {
            ban("obj.http.x-url ~ " + req.url + " && obj.http.x-host == " + req.http.host);
        }

        return (synth(200, "Ban added."));
    }

    # Purge logic to remove objects from the cache
    if(req.method == "PURGE") {
        if(!client.ip ~ purge) {
            return(synth(405,"PURGE not allowed for this IP address"));
        }
        return (purge);
    }
    
    # Only handle relevant HTTP request methods
    if (
        req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "PATCH" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "DELETE"
    ) {
        return (pipe);
    }

    # Remove tracking query string parameters used by analytics tools
    if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
        set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
        set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
        set req.url = regsub(req.url, "\?&", "?");
        set req.url = regsub(req.url, "\?$", "");
    }

    # Only cache GET and HEAD requests
    if ((req.method != "GET" && req.method != "HEAD") || req.http.Authorization) {
        return(pass);
    }

    # Mark static files with the X-Static-File header, and remove any cookies
    # X-Static-File is also used in vcl_backend_response to identify static files
    if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        set req.http.X-Static-File = "true";
        unset req.http.Cookie;
        return(hash);
    }

	# Don't cache the following pages
    if (req.url ~ "^/status.php$" ||
        req.url ~ "^/update.php$" ||
        req.url ~ "^/cron.php$" ||
        req.url ~ "^/admin$" ||
        req.url ~ "^/admin/.*$" ||
        req.url ~ "^/flag/.*$" ||
        req.url ~ "^.*/ajax/.*$" ||
        req.url ~ "^.*/ahah/.*$") {
        return (pass);
    }

	# Remove all cookies except the session & NO_CACHE cookies
    if (req.http.Cookie) {
        set req.http.Cookie = ";" + req.http.Cookie;
        set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
        set req.http.Cookie = regsuball(req.http.Cookie, ";(S?SESS[a-z0-9]+|NO_CACHE)=", "; \1=");
        set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
        set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

        if (req.http.cookie ~ "^\s*$") {
            unset req.http.cookie;
        } else {
            return(pass);
        }
    }
    return(hash);
}

sub vcl_hash {
    # Create cache variations depending on the request protocol
    hash_data(req.http.X-Forwarded-Proto);
}

sub vcl_backend_response {
    # Inject URL & Host header into the object for asynchronous banning purposes
    set beresp.http.x-url = bereq.url;
    set beresp.http.x-host = bereq.http.host;

	# Serve stale content for 2 minutes after object expiration
	# Perform asynchronous revalidation while stale content is served
    set beresp.grace = 120s;

    # If the file is marked as static we cache it for 1 day
    if (bereq.http.X-Static-File == "true") {
        unset beresp.http.Set-Cookie;
        set beresp.ttl = 1d;
    }

    # If we dont get a Cache-Control header from the backend
    # we default to 1h cache for all objects
    if (!beresp.http.Cache-Control) {
        set beresp.ttl = 1h;
    }

    # Parse Edge Side Include tags when the Surrogate-Control header contains ESI/1.0
    if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
        unset beresp.http.Surrogate-Control;
        set beresp.do_esi = true;
    }
}

sub vcl_deliver {
    # Cleanup of headers
    unset resp.http.x-url;
    unset resp.http.x-host;
    unset req.http.X-Static-File;
}

Customizations

The VCL file for Drupal contains two sections that might require some customization:

  • The backend definition
  • The access control list (ACL)

Here’s the backend definition:

backend default {
	.host = "127.0.0.1";
	.port = "8080";
}

This backend definition makes Varnish connect to 127.0.0.1 on port 8080 when content needs to be fetched. Assuming that Varnish is hosted on the same server as your Drupal CMS, this value can be left unchanged.

If you’re hosting Drupal on another server or on another port, you’ll need to modify the .host and .port values.

The current values inside the purge access control list (ACL) all refer to the local machine, as you can see in the snippet below:

acl purge {
	"localhost";
	"127.0.0.1";
	"0.0.0.0";
	"::1";
}

Again, assuming that Drupal and Varnish are hosted on the same machine, these values can be left untouched. If your Drupal site is hosted on another machine, the right IP address needs to be added to the list.

If invalidations happen from external locations, the IP addresses, the IP ranges, or the hostnames of these locations have to be added to the ACL.

6. Restart the services

If you’re using Apache as a web server, you’ll run the following command to restart it:

sudo systemctl restart apache2

If you’re using Nginx instead, please run the following command to restart your web server:

sudo systemctl restart nginx

And finally, you’ll have to run the following command to restart Varnish:

sudo systemctl restart varnish

After the restart, your web server will accept traffic on port 8080, Varnish will handle HTTP traffic on port 80. The restart will also ensure the right VCL file is loaded, which will ensure that requests for your Drupal CMS can be properly cached.

7. Flushing the cache

In the final step we will flush all Drupal caches to ensure a consistent state:

drush cr
drush p:invalidate everything