Removing cookies in Varnish

Tags: vcl (20)

Caching and cookies don’t always go hand in hand. Varnish is very conservative when it comes to handling cookies because the very nature of a cookie is to keep track of state and personalize the response.

The built-in VCL indicates that Varnish will not serve an object from the cache if the request contains a Cookie header.

This standard behavior doesn’t work in the real world where cookies are omnipresent.

This tutorial presents a VCL-based solution that removes cookies in situations where they are not required.

Cookies are a collection of key-value pairs, separated by a semicolon. They are transported via the Cookie request header.

Here’s an example of a Cookie header:

Cookie: lang=en; _ga=GA1.3.292651669.1502954402; sessionID=0aef28c82761e4507d5f8ae49a259284

There are three distinct cookies in this header (lang, _ga and sessionID). Varnish treats the Cookie request header like any other header. This means that a cookie is nothing more than a string in Varnish.

Accessing individual cookies and cookie values requires the use of pattern matching functions like regsub() and regsuball().

It’s safe to remove tracking cookies

By removing cookies in Varnish, the origin application won’t be able to use these cookies, which may result in inconsistent behavior or attempts to reset the cookie.

However, for cookies that are processed on the client it doesn’t really matter. Tracking cookies are the perfect example: they are processed by third-party libraries in JavaScript and are not processed by the server.

Removing all cookies

You can remove all cookies by unsetting the Cookie header in VCL. This ensures that websites that use cookies become cacheable.

Here’s the code to do that:

vcl 4.1;

sub vcl_recv {
    unset req.http.Cookie;
}

This is a pretty drastic action that is only justified if you know for sure that the origin application doesn’t need any of the cookies that were set. If you only have tracking cookies, this is a good solution.

Conditionally removing all cookies

However, if some parts of your application rely on a cookie, such as a session cookie for example, you can conditionally remove all cookies.

Here’s an example where all cookies are removed, except for /admin requests:

vcl 4.1;

sub vcl_recv {
    if(req.url !~ "^/admin(/.*)?$") {
        unset req.http.Cookie;
    }
}

Removing individual cookies

Instead of removing all cookies at once by unsetting the Cookie request header, you can also remove individual cookies. As mentioned before, to Varnish cookies are just a string. A find and replace action needs to happen to remove an individual cookie.

Here’s how to do that using the regsuball() function:

vcl 4.1;

sub vcl_recv {
    set req.http.Cookie = regsuball(req.http.Cookie, "lang=[^;]+(; )?", "");
    if (req.http.Cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

The VCL snippet will remove the lang cookie from the Cookie header. Because this regsuball() function replaces cookies with an empty string, the Cookie header needs to be removed if only the empty string remains.

Removing tracking cookies

Removing individual cookies becomes useful when you want to get rid of tracking cookies.

Here’s an example where tracking cookies are explicitly removed:

vcl 4.1;

sub vcl_recv {
    # Some generic cookie manipulation, useful for all templates that follow
    # Remove the "has_js" cookie
    set req.http.Cookie = regsuball(req.http.Cookie, "has_js=[^;]+(; )?", "");

    # Remove any Google Analytics based cookies
    set req.http.Cookie = regsuball(req.http.Cookie, "__utm[^=]+=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "_ga[^=]*=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "_gcl_[^=]+=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "_gid=[^;]+(; )?", "");

    # Remove DoubleClick offensive cookies
    set req.http.Cookie = regsuball(req.http.Cookie, "__gads=[^;]+(; )?", "");

    # Remove the Quant Capital cookies (added by some plugin, all __qca)
    set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", "");

    # Remove the AddThis cookies
    set req.http.Cookie = regsuball(req.http.Cookie, "__atuv.=[^;]+(; )?", "");

    # Remove a ";" prefix in the cookie if present
    set req.http.Cookie = regsuball(req.http.Cookie, "^;\s*", "");

    # Are there cookies left with only spaces or that are empty?
    if (req.http.cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

Only keep required cookies

The example above where tracking cookies are removed is quite explicit: you know exactly which cookies are removed. But this can get tedious if you have a lot of tracking cookies. You also need to keep this list up to date if new tracking cookies are introduced.

That’s why you can also remove all cookies, except the ones that you need for server-side processing. Here’s the VCL code to do this:

vcl 4.1;

sub vcl_recv {
    if (req.http.Cookie) {
        set req.http.Cookie = ";" + req.http.Cookie;
        set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
        set req.http.Cookie = regsuball(req.http.Cookie, ";(sessionID|cart)=", "; \1=");
        set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
        set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

        if (req.http.cookie ~ "^\s*$") {
            unset req.http.cookie;
        }
    }
}

This VCL example will remove every single cookie, except the sessionID cookie and the cart cookie. It also removes the Cookie header entirely if it’s empty. All this logic is only executed if the request actually contains a Cookie header.

In the end if a sessionID or cart cookie is set, Varnish will not cache, assuming the origin application needs the values of the cookies to compose the output.

Cookies are stored client-side and are sent to the server through a Cookie request header. But when you visit a website for the first time, you will not have any cookies in your cookie jar for that site.

When you perform an action on the website that should set a cookie, the server will attach a Set-Cookie header to the response. The client will process this header and will store the cookie value in its cookie jar.

The built-in VCL indicates that Varnish will not store the response in the cache if it contains a Set-Cookie header.

When a response contains a Set-Cookie header, the object will end up on the Hit-For-Miss list for the next two minutes. All subsequent requests for that resource will bypass the cache for that duration or until the next response doesn’t contain a Set-Cookie header.

The end result of setting a cookie is a cache miss. Usually, the Set-Cookie header remains untouched because the value of the cookie it sets usually matters.

There are exceptions though.

If your application starts setting cookies when static files are requested, you probably want to remove them. As the name implies, the files are static and their output doesn’t depend on a cookie.

Here’s an example where a potential Set-Cookie header is removed for all images, CSS files, JavaScript files and web font files:

vcl 4.1;

sub vcl_backend_response {
    if (bereq.url ~ "^[^?]*\.(css|gif|ico|jpeg|jpg|js|png|svg|webp|woff|woff2)(\?.*)?$") {
        unset beresp.http.Set-Cookie;
    }
}

You can also use the Content-Type header to determine whether the content is static:

vcl 4.1;

sub vcl_backend_response {
    if (bereq.http.Content-type ~ "^(((image|video|font)/.+)|application/javascript|text/css).*$") {
        unset beresp.http.Set-Cookie;
    }
}

This VCL snippet will strip off the Set-Cookie header for the following content types:

  • All image content types that start with image/, like for example image/jpeg or image/png
  • All video content types that start with video/, like for example video/mpeg
  • All font content types that start with font/, like for example font/woff2
  • JavaScript content with the application/javascript content type
  • CSS content with the text/css content type

Setting a cookie is usually done when you log in to a website or when you add your first product to a shopping cart. These actions usually set a session identification cookie.

Unfortunately, some websites already set a session ID cookie on the homepage, just in case it is needed later. This results in a cache miss on your homepage, which is arguably your most visited page.

Here’s a VCL example where the Set-Cookie header is removed unless the /admin pages are visited:

vcl 4.1;

sub vcl_backend_response {
    if(bereq.url !~ "^/admin(/.*)?$") {
        unset beresp.http.Set-Cookie;
    }
}

As of Varnish Cache version 6.4, vmod_cookie is an in-tree VMOD. This means the module is installed and can be imported in your VCL.

vmod_cookie provides an API that makes interacting with cookies a lot easier.

The cookie.delete() function is an easy way to remove individual cookies. Instead of coming up with complicated regular expressions that are parsed by regsuball(), you can simply call this one function.

The following example deletes the lang cookie:

vcl 4.1;

import cookie;

sub vcl_recv {
    if (req.http.Cookie) {
        cookie.parse(req.http.Cookie);
        cookie.delete("lang");
        set req.http.Cookie = cookie.get_string();
    }
}

You can also remove multiple cookies:

vcl 4.1;

import cookie;

sub vcl_recv {
    if (req.http.Cookie) {
        cookie.parse(req.http.Cookie);
        cookie.filter("lang,cart");
        set req.http.Cookie = cookie.get_string();
    }
}

This example removes the lang cookie and the cart cookie.

You can even filter out cookies using a regular expression pattern:

vcl 4.1;

import cookie;

sub vcl_recv {
    if (req.http.Cookie) {
        cookie.parse(req.http.Cookie);
        cookie.filter_re("__utm.");
        set req.http.Cookie = cookie.get_string();
    }
}

This example removes the various __utm cookies set by Google Analytics, such as __utma and __utmz

One of the previous examples featured a set of regsuball() calls to remove all cookies except a list of cookies that were needed by the server. The cookie.keep() function can replace these five lines of code as illustrated below:

vcl 4.1;

import cookie;

sub vcl_recv {
    if (req.http.Cookie) {
        cookie.parse(req.http.Cookie);
        cookie.keep("sessionID,lang");
        set req.http.Cookie = cookie.get_string();
    }
}

This VCL example will remove all cookies, except the sessionID and lang cookies.

The cookie.keep_re() function does the same thing, but using a regular expression pattern.

Here’s an example where we will only keep a select set of session ID cookies:

vcl 4.1;

import cookie;

sub vcl_recv {
    if (req.http.Cookie) {
        cookie.parse(req.http.Cookie);
        cookie.keep_re("(session(ID)?|PHPSESSID)");
        set req.http.Cookie = cookie.get_string();
    }
}

This example will remove all cookies, except the session, sessionID and PHPSESSID cookies.