This chapter is for the webdeveloper course only
This chapter covers:
HTTP is at the heart of Varnish, or rather the model HTTP represents.
This chapter will cover the basics of HTTP as a protocol, how it’s used in the wild, and delve into caching as it applies to HTTP.
HTTP is a networking protocol for distributed systems. It is the foundation of data communication for the Web. The development of this standard is done by the IETF and the W3C. The latest version of the standard is HTTP/1.1.
A new version of HTTP called HTTP bis is under development, you can follow the work document at http://datatracker.ietf.org/wg/httpbis/charter/. Basically HTTP bis will be HTTP/1.1 with new features for example a better caching of web pages.
Each request has the same, strict and fairly simple pattern. A request method informs the web server what sort of request this is: Is the client trying to fetch a resource (GET), or update some data(POST)? Or just get the headers of a resource (HEAD)?
There are strict rules that apply to the request methods. For instance, a GET request can not contain a request body, but a POST request can.
Similarly, a web server can not attach a request body to a response to a HEAD body.
GET / HTTP/1.1 Host: localhost User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr; rv:22.214.171.124) Gecko/20110319 Firefox/3.6.16 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Cache-Control: max-age=0
The above is a typical HTTP GET request for the / resource.
Note that the Host-header contains the hostname as seen by the browser. The above request was generated by entering http://localhost/ in the browser. The browser automatically adds a number of headers. Some of these will vary depending on language settings, others will vary depending on whether the client has a cached copy of the page already, or if the client is doing a refresh or forced refresh.
Whether the server honors these headers will depend on both the server in question and the specific header.
The following is an example of a HTTP request using the POST method, which includes a request body:
POST /accounts/ServiceLoginAuth HTTP/1.1 Host: www.google.com User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr; rv:126.96.36.199) Gecko/20110319 Firefox/3.6.16 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Referer: https://www.google.com/accounts/ServiceLogin Cookie: GoogleAccountsLocale_session=en;[...] Content-Type: application/x-www-form-urlencoded Content-Length: 288 ltmpl=default[...]&signIn=Sign+in&asts=
HTTP/1.1 200 OK Cache-Control: max-age=150 Content-Length: 150 [data]
The HTTP response is similar to the request itself. The response code informs the browser both whether the request succeeded and what type of response this is. The response message is a text-representation of the same information, and is often ignored by the browser itself.
Examples of status codes are 200 OK, 404 File Not Found, 304 Not Modified and so fort. They are all defined in the HTTP standard, and grouped into the following categories:
HTTP/1.1 200 OK Server: Apache/2.2.14 (Ubuntu) X-Powered-By: PHP/5.3.2-1ubuntu4.7 Cache-Control: public, max-age=86400 Last-Modified: Mon, 04 Apr 2011 04:13:41 +0000 Expires: Sun, 11 Mar 1984 12:00:00 GMT Vary: Cookie,Accept-Encoding ETag: "1301890421" Content-Type: text/html; charset=utf-8 Content-Length: 23562 Date: Mon, 04 Apr 2011 09:02:26 GMT X-Varnish: 1886109724 1886107902 Age: 17324 Via: 1.1 varnish Connection: keep-alive (data)
The client sends an HTTP request to the server which returns an HTTP response with the message body.
Before we talk about all the various cache headers and cache mechanisms, we will use httpheadersexample.php to experiment and get a sense of what it’s all about.
Try both clicking the links twice, hitting refresh and forced refresh (usually done by hitting control-F5, depending on browser).
When performing this exercise, try to see if you can spot the patterns. There are many levels of cache on the Web, and you have to think about more than just Varnish.
If it hasn’t already, it’s likely that browser cache will confuse you at least a few times through this course. When that happens, pull up varnishlog or another browser.
The Expires response header field gives the date/time after which the response is considered stale. A stale cache item will not be returned by any cache (proxy cache or client cache).
The syntax for this header is:
Expires: GMT formatted date
It is recommended not to define Expires too far in the future. Setting it to 1 year is usually enough.
Using Expires does not prevent the cached resource to be updated. If a resource is updated changing its name (by using a version number for instance) is possible.
The Cache-Control header field specifies directives that must be applied by all caching mechanisms (from proxy cache to browser cache). Cache-Control accepts the following arguments (only the most relevant are described):
Unlike Expires, Cache-Control is both a request and a response header, here is the list of arguments you may use for each context:
Example of a Cache-Control header:
Cache-Control: public, must-revalidate, max-age=2592000
As you might have noticed Expires and Cache-Control do more or less the same job, Cache-Control gives you more control though. There is a significant difference between these two headers:
- Cache-Control uses relative times in seconds, cf (s)max-age
- Expires always returns an absolute date
Cache-Control always overrides Expires.
By default, Varnish does not care about the Cache-Control request header. If you want to let users update the cache via a force refresh you need to do it yourself.
The Last-Modified response header field indicates the date and time at which the origin server believes the variant was last modified. This headers may be used in conjunction with If-Modified-Since and If-None-Match.
Example of a Last-Modified header:
Last-Modified: Wed, 01 Sep 2004 13:24:52 GMT
The If-Modified-Since request header field is used with a method to make it conditional:
Example of an If-Modified-Since header:
If-Modified-Since: Wed, 01 Sep 2004 13:24:52 GMT
The If-None-Match request header field is used with a method to make it conditional.
A client that has one or more entities previously obtained from the resource can verify that none of those entities is current by including a list of their associated entity tags in the If-None-Match header field.
The purpose of this feature is to allow efficient updates of cached information with a minimum amount of transaction overhead. It is also used to prevent a method (e.g. PUT) from inadvertently modifying an existing resource when the client believes that the resource does not exist.
Example of an If-None-Match header :
The ETag response header field provides the current value of the entity tag for the requested variant. The idea behind Etag is to provide a unique value for a resource’s contents.
Example of an Etag header:
The Pragma request header is a legacy header and should no longer be used. Some applications still send headers like Pragma: no-cache but this is for backwards compatibility reasons only.
Any proxy cache should treat Pragma: no-cache as Cache-Control: no-cache, and should not be seen as a reliable header especially when used as a response header.
The Vary response header indicates the response returned by the origin server may vary depending on headers received in the request.
The most common usage of Vary is to use Vary: Accept-Encoding, which tells caches (Varnish included) that the content might look different depending on the Accept-Encoding-header the client sends. In other words: The page can be delivered compressed or uncompressed depending on the client.
The Vary-header is one of the trickiest headers to deal with for a cache. A cache, like Varnish, does not necessarily understand the semantics of a header, or what part triggers different variants of a page.
As a result, using Vary: User-Agent for instance tells a cache that for ANY change in the User-Agent-header, the content might look different. Since there are probably thousands of User-Agent strings out there, this means you will drastically reduce the efficiency of any cache method.
An other example is using Vary: Cookie which is actually not a bad idea. Unfortunately, you can’t issue Vary: Cookie(but only THESE cookies: ...). And since a client will send you a great deal of cookies, this means that just using Vary: Cookie is not necessarily sufficient. We will discuss this further in the Content Composition chapter.
From Varnish version 3, Varnish handles Accept-Encoding and Vary: Accept-Encoding for you. This is because Varnish 3 has support for gzip compression. In Varnish 2 it was necessary to normalize the Accept-Encoding-header, but this is redundant in Varnish 3.
Consider what happens if you let Varnish cache content for a week, because you can easily invalidate the cache Varnish keeps. If you do not change the Age-header, Varnish will happily inform clients that the content is, for example, two days old, and that the maximum age should be no more than fifteen minutes.
Browsers will obey this. They will use the reply, but they will also realize that it has exceeded its max-age, so they will not cache it.
Varnish will do the same, if your web-server emits and Age-header (or if you put one Varnish-server in front of another).
We will see in later chapters how we can handle this in Varnish.
The table below lists HTTP headers seen above and wether they are a request header or a response one.
There is a cache-hit when Varnish returns a page from its cache instead of forwarding the request to the origin server.
There is a cache-miss when Varnish has to forward the request to the origin server so the page can be serviced.
Also consider how you would avoid issues like this to begin with. We do not yet know how to modify Varnish’ response headers, but hopefully you will understand why you may need to do that.
Varnish is not the only part of your web-stack that parses and honors cache-related headers. The primary consumer of such headers are the web browsers, and there might also be other caches along the way which you do not control, like a company-wide proxy server.
By using s-maxage instead of max-age we limit the number of consumers to cache servers, but even s-maxage will be used by caching proxies which you do not control.
In the next few chapters, you will learn how to modify the response headers Varnish sends. That way, your web-server can emit response headers that are only seen and used by Varnish.