VCL Basics

  • VCL as a state engine
  • Basic syntax
  • VCL_recv and VCL_fetch
  • Regular expressions

The Varnish Configuration Language allows you to define your caching policy. You write VCL code which Varnish will parse, translate to C code, compile and link to.

The following chapter focuses on the most important tasks you will do in VCL. Varnish has a number of states that you can hook into with VCL, but if you master the vcl_fetch and vcl_recv methods, you will be have covered the vast majority of the actual work you need to do.

VCL is often described as domain specific or a state engine. The domain specific part of it is that some data is only available in certain states. For example: You can not access response headers before you’ve actually started working on a response.

The VCL State Engine

  • Each request is processed separately.
  • Each request is independent of any others going on at the same time, previously or later.
  • States are related, but isolated.
  • return(x); exits one state and instructs Varnish to proceed to the next state.
  • Default VCL code is always present, appended below your own VCL.

Before we begin looking at VCL code, it’s worth trying to understand the fundamental concepts behind VCL.

When Varnish processes a request, it starts by parsing the request itself, separating the request method from headers, verifying that it’s a valid HTTP request and so on. When this basic parsing has completed, the very first policy decisions can be done: Should Varnish even attempt to find this resource in the cache? This decision is left to VCL, more specifically the vcl_recv method.

If you do not provide any vcl_recv function, the default VCL function for vcl_recv is executed. But even if you do specify your own vcl_recv function, the default is still present. Whether it is executed or not depends on whether your own VCL code terminates that specific state or not.

Tip

It is strongly advised to let the default VCL run whenever possible. It is designed with safety in mind, which often means it’ll handle any flaws in your VCL in a reasonable manner. It may not cache as much, but often it’s better to not cache some content instead of delivering the wrong content to the wrong user.

There are exceptions, of course, but if you can not understand why the default VCL does not let you cache some content, it is almost always worth it to investigate why instead of overriding it.

Syntax

  • //, # and /* foo */ for comments
  • sub $name functions
  • No loops, limited variables
  • Terminating statements, no return values
  • Domain-specific
  • Add as little or as much as you want

If you have worked with a programing language or two before, the basic syntax of Varnish should be reasonably straight forward. It is inspired mainly by C and Perl.

The functions of VCL are not true functions in the sense that they accept variables and return values. To send data inside of VCL, you will have to hide it inside of HTTP headers.

The “return” statement of VCL returns control from the VCL state engine to Varnish. If you define your own function and call it from one of the default functions, typing “return(foo)” will not return execution from your custom function to the default function, but return execution from VCL to Varnish. That is why we say that VCL has terminating statements, not traditional return values.

For each domain, you can return control to Varnish using one or more different return values. These return statements tell Varnish what to do next. Examples include “look this up in cache”, “do not look this up in the cache” and “generate an error message”.

VCL - request flow

_images/vcl.png

Detailed request flow

_images/request.png

VCL - functions

  • regsub(str, regex, sub)
  • regsuball(str, regex, sub)
  • ban_url(regex)
  • ban(expression)
  • purge;
  • return(restart)
  • return()
  • hash_data()

VCL offers a handful of simple functions that allow you to modify strings, add bans, restart the VCL state engine and return control from the VCL Run Time (VRT) environment to Varnish.

You will get to test all of these in detail, so the description is brief.

regsub() and regsuball() has the same syntax and does the same thing: They both take a string as input, search it with a regular expression and replace it with another string. The difference between regsub() and regsuball() is that the latter changes all occurrences while the former only affects the first match.

ban_url is one of the original ban functions provided, and are generally not used much. The more flexible ban() function can perform the same task. ban_url(foo) is the equivalent of ban("req.url ~ " foo): Add a URL, host name excluded, to the ban list. We will go through purging in detail in later chapters.

return(restart) offers a way to re-run the VCL logic, starting at vcl_recv. All changes made up until that point are kept and the req.restarts variable is incremented. The max_restarts parameter defines the maximum number of restarts that can be issued in VCL before an error is triggered, thus avoiding infinite looping.

return() is used when execution of a VCL domain (for example vcl_recv) is completed and control is returned to Varnish with a single instruction as to what should happen next. Return values are lookup, pass, pipe, hit_for_pass, fetch, deliver and hash, but only a limited number of them are available in each VCL domain.

Warning

ban_url() uses a regular expression instead of actual string matching. It will be removed in Varnish 4. You should use ban() instead.

VCL - vcl_recv

  • Normalize client-input
  • Pick a backend web server
  • Re-write client-data for web applications
  • Decide caching policy based on client-input
  • Access control
  • Security barriers
  • Fixing mistakes (e.g: index.htlm -> index.html)

vcl_recv is the first VCL function executed, right after Varnish has decoded the request into its basic data structure. It has four main uses:

  1. Modifying the client data to reduce cache diversity. E.g., removing any leading “www.” in a URL.
  2. Deciding caching policy based on client data. E.g., Not caching POST requests, only caching specific URLs, etc
  3. Executing re-write rules needed for specific web applications.
  4. Deciding which Web server to use.

In vcl_recv you can perform the following terminating statements:

pass the cache, executing the rest of the Varnish processing as normal, but not looking up the content in cache or storing it to cache.

pipe the request, telling Varnish to shuffle byte between the selected backend and the connected client without looking at the content. Because Varnish no longer tries to map the content to a request, any subsequent request sent over the same keep-alive connection will also be piped, and not appear in any log.

lookup the request in cache, possibly entering the data in cache if it is not already present.

error - Generate a synthetic response from Varnish. Typically an error message, redirect message or response to a health check from a load balancer.

It’s also common to use vcl_recv to apply some security measures. Varnish is not a replacement for Intrusion Detection Systems, but can still be used to stop some typical attacks early. Simple access control lists can be applied in vcl_recv too. For further discussion about security in VCL, take a look at the Security.vcl project, found at https://github.com/comotion/security.vcl.

Default: vcl_recv

sub vcl_recv {
    if (req.restarts == 0) {
        if (req.http.x-forwarded-for) {
            set req.http.X-Forwarded-For =
                req.http.X-Forwarded-For + ", " + client.ip;
        } else {
            set req.http.X-Forwarded-For = client.ip;
        }
    }
    if (req.request != "GET" &&
      req.request != "HEAD" &&
      req.request != "PUT" &&
      req.request != "POST" &&
      req.request != "TRACE" &&
      req.request != "OPTIONS" &&
      req.request != "DELETE") {
        /* Non-RFC2616 or CONNECT which is weird. */
        return (pipe);
    }
    if (req.request != "GET" && req.request != "HEAD") {
        /* We only deal with GET and HEAD by default */
        return (pass);
    }
    if (req.http.Authorization || req.http.Cookie) {
        /* Not cacheable by default */
        return (pass);
    }
    return (lookup);
}

The default VCL for vcl_recv is designed to ensure a safe caching policy even with no modifications in VCL. It has two main uses:

  1. Only handle recognized HTTP methods and cache GET and HEAD
  2. Do not cache data that is likely to be user-specific.

It is executed right after any user-specified VCL, and is always present. You can not remove it. However, if you terminate the vcl_recv function using one of the terminating statements (pass, pipe, lookup, error), the default VCL will not execute, as control is handed back from the VRT (VCL Run-Time) to Varnish.

Most of the logic in the default VCL is needed for a well-behaving Varnish server, and care should be taken when vcl_recv is terminated before reaching the default VCL. Consider either replicating all the logic in your own VCL, or letting Varnish fall through to the default VCL.

Example: Basic Device Detection

One way of serving different content for mobile devices and desktop browsers is to run some simple parsing on the User-Agent header to create your own custom-header for mobile devices:

sub vcl_recv {
        if (req.http.User-Agent ~ "iPad" ||
            req.http.User-Agent ~ "iPhone" ||
            req.http.User-Agent ~ "Android") {
                set req.http.X-Device = "mobile";
        } else {
                set req.http.X-Device = "desktop";
        }
}

You can read more about different types of device detection at https://www.varnish-cache.org/docs/trunk/users-guide/devicedetection.html

This simple VCL will create a request header called X-Device which will contain either mobile or desktop. The Web server can then use this header to determine what page to serve, and inform Varnish about it through Vary: X-Device.

It might be tempting to just send Vary: User-Agent, but that would either require you to normalize the User-Agent header itself and losing the detailed information on the browser, or it would drastically inflate the cache size by keeping possibly hundreds of different variants for each object just because there are tiny variations of the User-Agent header.

For more information on the Vary-header, see the HTTP chapter.

Note

If you do use Vary: X-Device, you might want to send Vary: User-Agent to the users after Varnish has used it. Otherwise, intermediary caches will not know that the page looks different for different devices.

Exercise: Rewrite URLs and Host headers

  1. Copy the original Host-header (req.http.Host) and URL (req.url) to two new request header of your choice. E.g: req.http.x-host and req.http.x-url.
  2. Ensure that www.example.com and example.com are cached as one, using regsub().
  3. Rewrite all URLs under http://sport.example.com to http://example.com/sport/. For example: http://sport.example.com/article1.html to http://example.com/sport/article1.html.
  4. Use varnishlog to verify the result.

Extra: Make sure / and /index.html is cached as one object. How can you verify that it is, without changing the content?

Extra 2: Make the redirection work for any domain with sport. at the front. E.g: sport.example.com, sport.foobar.example.net, sport.blatti, etc.

The syntax for regsub() is regsub(<string>, <regex>, <replacement>);. string is the input string, in this case, req.http.host. regex is the regular expression matching whatever content you need to change. “^www.” matches a string that begins (^) with www followed by a literal dot. replacement is what you desire to change it with, “” can be used to remove it.

To write a header, use set req.http.headername = "value"; or set req.http.headername = regsub(...);.

To verify the result, you can use varnishlog, or lwp-request. Example command:

GET -H "Host: www.example.com" -USsed http://localhost/

You can use if () to perform a regular expression if-test, or a plain string test. In the above exercise, both are valid. E.g.:

if (req.http.host ~ "^sport\.example\.com$") {

is equivalent with:

if (req.http.host == "sport.example.com") {

It is slightly faster to use == to perform a string comparison instead of a regular expression, but negligible.

Tip

You do not need to use regsub() on the host header for this exercise unless you want it to apply for all instances of sport.<some domain>. You will, however, need it to prepend /sport to the req.url. Remember, you can match just the beginning of the line with regsub(input,"^",replacement)

Solution: Rewrite URLs and Host headers

sub vcl_recv {
        set req.http.x-host = req.http.host;
        set req.http.x-url = req.url;
        set req.http.host = regsub(req.http.host, "^www\.", "");

        if (req.http.host == "sport.example.com") {
                set req.http.host = "example.com";
                set req.url = regsub(req.url, "^", "/sport");
        }

        // Or:

        if (req.http.host ~ "^sport\.") {
                set req.http.host = regsub(req.http.host,"^sport\.", "");
                set req.url = regsub(req.url, "^", "/sport");
        }
}
Note how both are valid.

VCL - vcl_fetch

  • Sanitize server-response
  • Override cache duration

The vcl_fetch function is the backend-counterpart to vcl_recv. In vcl_recv you can use information provided by the client to decide on caching policy, while you use information provided by the server to further decide on a caching policy in vcl_fetch.

If you chose to pass the request in an earlier VCL function (e.g.: vcl_recv), you will still execute the logic of vcl_fetch, but the object will not enter the cache even if you supply a cache time.

You have multiple tools available in vcl_fetch. First and foremost you have the beresp.ttl variable, which defines how long an object is kept.

Warning

If the request was not passed before reaching vcl_fetch, the beresp.ttl is still used even when you perform a hit_for_pass in vcl_fetch. This is an important detail that is important to remember: When you perform a pass in vcl_fetch you cache the decision you made. In other words: If beresp.ttl is 10 hours and you issue a pass, an object will be entered into the cache and remain there for 10 hours, telling Varnish not to cache. If you decide not to cache a page that returns a “500 Internal Server Error”, for example, this is critically important, as a temporary glitch on a page can cause it to not be cached for a potentially long time.

Always set beresp.ttl when you issue a pass in vcl_fetch.

Returning deliver in vcl_fetch tells Varnish to cache, if possible. Returning hit_for_pass tells it not to cache, but does not run the vcl_pass function of VCL for this specific client. The next client asking for the same resource will hit the hitpass-object and go through vcl_pass.

Typical tasks performed in vcl_fetch include:

  • Overriding cache time for certain URLs
  • Stripping Set-Cookie headers that are not needed
  • Stripping bugged Vary headers
  • Adding helper-headers to the object for use in banning (more information in later chapters)
  • Applying other caching policies

Default: vcl_fetch

sub vcl_fetch {
    if (beresp.ttl <= 0s ||
        beresp.http.Set-Cookie ||
        beresp.http.Vary == "*") {
                /*
                 * Mark as "Hit-For-Pass" for the next 2 minutes
                 */
                set beresp.ttl = 120 s;
                return (hit_for_pass);
    }
    return (deliver);
}
The default VCL for vcl_fetch is designed to avoid caching anything with a set-cookie header. There are very few situations where caching content with a set-cookie header is desirable.

The initial value of beresp.ttl

Before Varnish runs vcl_fetch, the beresp.ttl variable has already been set to a value. It will use the first value it finds among:

  • The s-maxage variable in the Cache-Control response header
  • The max-age variable in the Cache-Control response header
  • The Expires response header
  • The default_ttl parameter.

Only the following status codes will be cached by default:

  • 200: OK
  • 203: Non-Authoritative Information
  • 300: Multiple Choices
  • 301: Moved Permanently
  • 302: Moved Temporarily
  • 307: Temporary Redirect
  • 410: Gone
  • 404: Not Found

You can still cache other status codes, but you will have to set the beresp.ttl to a positive value in vcl_fetch yourself.

Since all this is done before vcl_fetch is executed, you can modify the Cache-Control headers without affecting beresp.ttl, and vice versa.

A sensible approach is to use the s-maxage variable in the Cache-Control header to instruct Varnish to cache, then have Varnish remove that variable before sending it to clients using regsub() in vcl_fetch. That way, you can safely set max-age to what cache duration the clients should use and s-maxage for Varnish without affecting intermediary caches.

Warning

Varnish, browsers and intermediary will parse the Age response header. If you stack multiple Varnish servers in front of each other, this means that setting s-maxage=300 will mean that the object really will be cached for only 300 seconds throughout all Varnish servers.

On the other hand, if your web server sends Cache-Control: max-age=300, s-maxage=3600 and you do not remove the Age response header, Varnish will send an Age-header that exceeds the max-age of the objects, which will cause browsers to not cache the content.

Example: Enforce caching of .jpg urls for 60 seconds

sub vcl_fetch {
        if (req.url ~ "\.jpg$") {
                set beresp.ttl = 60s;
        }
}

The above example is typical for a site migrating to Varnish. Setting beresp.ttl ensures it’s cached.

Keep in mind that the default VCL will still be executed, which means that an image with a Set-Cookie header will not be cached.

Example: Cache .jpg for 60 only if s-maxage isn’t present

sub vcl_fetch {
        if (beresp.http.cache-control !~ "s-maxage" && req.url ~ "\.jpg$") {
                set beresp.ttl = 60s;
        }
}

The Cache-Control header can contain a number of headers. Varnish evaluates it and looks for s-maxage and max-age. It will set the TTL to the value of s-maxage if found. If s-maxage isn’t found, it will use max-age. If neither exist, it will use the Expires header to set the ttl. If none of those headers exist, it will use the default TTL.

This is done before vcl_fetch is executed and the process can be seen by looking at the TTL tag of varnishlog.

The purpose of the above example is to allow a gradual migration to using a backend-controlled caching policy. If the backend supplies s-maxage, it will be used, but if it is missing, a forced TTL is set.

Exercise: Avoid caching a page

  1. Write a VCL which avoids caching the index page at all. It should cover both accessing / and /index.html

  2. Write a VCL that makes Varnish honor the following headers:

    Cache-Control: no-cache
    Cache-Control: private
    Pragma: no-cache

When trying this out, remember that Varnish keeps the Host-header in req.http.host and the part after the hostname in req.url.

For http://www.example.com/index.html, the http:// part is not seen by Varnish at all, but req.http.host will have the value of www.example.com and req.url the value of /index.html. Note how the leading / is included in req.url.

Varnish only obeys the first header it finds of “s-maxage” in Cache-Control, “max-age” in Cache-Control or the Expire header. However, it is often necessary to check the values of other headers too - vcl_fetch is the place to do that.

Solution: Avoid caching a page

sub vcl_recv {
        if (req.url ~ "^/index\.html" ||
            req.url ~ "^/$") { return(pass); }
}
// Or:
sub vcl_fetch {
        if (req.url ~ "^/index\.html" ||
            req.url ~ "^/$") { return(hit_for_pass); }
}

// Second part of exercise
sub vcl_fetch {
        if (beresp.http.cache-control ~ "(no-cache|private)" ||
            beresp.http.pragma ~ "no-cache") {
                set beresp.ttl = 0s;
        }
}

The above examples are both valid.

It is usually most convenient to do as much as possible in vcl_recv, and this is no exception. Even though using pass in vcl_fetch is reasonable, it creates a hitpass object, which can create unnecessary complexity. Whenever you do use pass in vcl_fetch, you should also make it a habit to set the beresp.ttl to a short duration, to avoid accidentally adding a hitpass object that prevents caching for a long time.

Exercise: Either use s-maxage or set ttl by file type

Write a VCL that:

  • Uses Cache-Control: s-maxage where present
  • Caches .jpg for 30 seconds if s-maxage isn’t present
  • Caches .html for 10 seconds if s-maxage isn’t present
  • Removes the Set-Cookie header if s-maxage OR the above rules indicates that Varnish should cache.

Tip

Try solving each part of the exercise by itself first. Most somewhat complex VCL tasks are easily solved when you divide the tasks into smaller bits and solve them individually.

Note

Varnish automatically reads s-maxage for you, so you only need to check if it is there or not - if it’s present, Varnish has already used it to set beresp.ttl.

Solution: Either use s-maxage or set ttl by file type

sub vcl_fetch {
        if (beresp.http.cache-control !~ "s-maxage") {
                if (req.url ~ "\.jpg(\?|$)") {
                        set beresp.ttl = 30s;
                        unset beresp.http.Set-Cookie;
                }
                if (req.url ~ "\.html(\?|$)") {
                        set beresp.ttl = 10s;
                        unset beresp.http.Set-Cookie;
                }
        } else {
                if (beresp.ttl > 0s) {
                        unset beresp.http.Set-Cookie;
                }
        }
}
There are many ways to solve this exercise, and this solution is only one of them. The first part checks that s-maxage is /not/ present, then handles .jpg and .html files - including cookie removal. The second part checks if s-maxage caused Varnish to set a positive ttl and consider it cacheable.

Summary of VCL - Part 1

  • VCL provides a state machine for controlling Varnish.
  • Each request is handled independently.
  • Building a VCL file is done one line at a time.

VCL is all about policy. By providing a state machine which you can hook into, VCL allows you to affect the handling of any single request almost anywhere in the execution chain.

This provides both the pros and cons of any other programming language. There isn’t going to be any complete reference guide to how you can deal with every possible scenario in VCL, but on the other hand, if you master the basics of VCL you can solve complex problems that nobody has thought about before. And you can usually do it without requiring too many different sources of documentation.

Whenever you are working on VCL, you should think of what that exact line you are writing has to do. The best VCL is built by having many independent sections that don’t interfere with each other more than they have to.

This is made easier by the fact that VCL also has a default - which is always present. If you just need to modify one little thing in vcl_recv, you can do just that. You don’t have to copy the default VCL, because it will be executed after your own - assuming you don’t have any return statements.