HTTP caching basics
It’s important to understand HTTP caching because at some point an HTTP cache will protect your web platform from going down. Whether this HTTP cache is a reverse caching proxy like Varnish or a full-blown CDN, you need to understand the rules of the game, and you need to understand the basics of HTTP caching.
Luckily there are conventions for this. There are even standardized headers that are part of HTTP’s specification that will allow you to control the behavior of a web cache.
What is HTTP caching?
HTTP has caching capabilities built into the protocol to ensure that clients or proxies can store the HTTP response in the cache for a certain amount of time. By caching the response, clients don’t have to connect to the web server every time they want to access that content.
HTTP caching reduces network traffic and server load, which results in lower response times.
Browser cache versus caching proxies
Historically HTTP responses were cached by the web browser to reduce network traffic. In the early days of the web, bandwidth was limited. Being able to cache HTTP responses in the browser avoided expensive HTTP roundtrips.
Unfortunately browser cache is not reliable: users can flush the cache at any time, and they can even disable the cache. Another disadvantage is the fact that the cache is hosted locally, which means there is a cache per user.
By installing proxy servers closer to the user, either in the office or at the internet provider’s data center, clients can retrieve centrally cached copies of the requested content and act on behalf of the origin web server.
As broadband internet became more common, local caching proxy servers were no longer crucial. Instead the increase of bandwidth shifted the pressure from the client to the server: traffic spikes started jeopardizing the stability of servers. As a consequence, caching proxies also shifted.
Nowadays reverse caching proxies are put in front of the origin web platform to protect it against traffic spikes and prevent the platform from caving in under pressure.
HTTP’s caching policies allow HTTP responses to be cached by both clients and proxies using the same syntax. However, there are also specific instructions that only apply to proxies.
HTTP caching concepts
Not all HTTP responses can or should be cached: if the content is private, it should not be stored in the cache. If the type of request (for example an HTTP POST request) implies a change of the resource, it shouldn’t be cached either. If the returned response uses a
Set-Cookie header to change state, the response shouldn’t be cached.
On the one hand you can decide whether or not to store a response in the cache. On the other hand, you can decide whether or not to serve a cached response from the cache.
These rules can be specifically enforced in the implementation or configuration of the cache. However, the HTTP protocol allows you to control the cacheability under the form of specific header syntax.
Cache-Control: no-cache, no-store header is the perfect example of enforcing cacheability. The
no-cache part instructs the cache not to serve any cached responses for this resource and instead fetch the corresponding content from the origin web server. The
no-store part prevents the HTTP response from being stored in the cache.
Public versus private content
The scope of cacheable content is either public or private.
- Publicly cacheable content can be cached by both the requesting client as well as reverse caching proxies.
- Privately cacheable content can only be cached by the requesting client and not by reverse caching proxies.
Cache-Control: public and
Cache-Control: private response headers are used to set the caching scope.
Cached objects are only valid for a limited amount of time. The time to live of a cached object can be defined in the implementation or configuration of the cache. But as expected, the HTTP protocol has ways to enforce the time to live through specific cache header syntax.
Cache-Control: max-age=3600 header is an example of setting the lifetime of cached object to an hour. The HTTP protocol has more directives to set the time to live and these will be covered in the HTTP caching headers section of the tutorial.
As long as the time to live of a cached object has not expired, the content is considered fresh. This means it can be served from the cache to requesting clients.
The remaining lifetime of an object is a value that changes every second. Once it hits zero, the content is no longer fresh. Instead it is considered stale and in need of revalidation.
Cache revalidation is the process of connecting back to the origin web server and fetching potentially updated content. As soon as the revalidation is finished, the object is considered fresh again for as long as the time to live allows.
Revalidation can also be done conditionally. This means that the origin web server will only send the payload if the requested resource has changed. If the resource hasn’t changed, a
304 Not Modified status code will be returned without a response body.
This reduces the amount of data sent over the wire, and it can also result in a lower server resource consumption at the origin level.
When the cache receives an
304 Not Modified response, the time to live that is defined by the
Expires header will be used to set the lifetime of the object after revalidation.
HTTP response headers like
Last-Modified allow web servers to identify when a resource has last changed. These values can be presented by the client or a reverse caching proxy under the form of
If-Modified-Since request headers to compare versions. If these versions differ, a regular
200 OK response will be sent, otherwise the
304 Not Modified status is returned.
If-Modified-Since are covered in the HTTP caching headers section of this tutorial.
Identifiying cached objects
Objects in the cache are generally identified by their URI and
Host header values. These values are part of the HTTP request, as illustrated below:
GET /about HTTP/1.1
This example is a request to
example.com are used to create a hash that identifies the object in the cache.
Sometimes, an HTTP resource can have multiple versions that depend on values coming from request headers. One example of this is a multilingual website that uses the
Accept-Language response header to present the resource in multiple languages.
If a resource has multiple versions, knowing that it is identified by its URI, the cached output can be inconsistent. Only using the URI and
Host header will not suffice and that’s where cache variations come into play.
A cache varation will extend the hash that is used to identify an object in the cache by adding the value of a request header. Per version of the resource, a variation is added to the cache.
The origin web server can issue a
Vary header to tell the cache what request header it should use to base its variations on. In the case of the multilingual website,
Vary: Accept-Language is the logical choice.
The goal is to store enough cache variations to cover the available versions of a resource. But if a resource has too many variations, caching all variations will have a detrimental effect on the hit rate and will fill up the cache.
Make sure you have your variations under control, otherwise you’re better off not caching the response at all.
HTTP caching headers
Here’s an overview of HTTP caching headers that you can leverage to control the cache from your origin web platform.
Cache-Control header is probably the most common HTTP caching header. Its syntax is quite extensive and has directives to control the following aspects of HTTP caching:
- Time To Live
- Scope (
Public and private
public keyword is used to announce that the resource can be cached by both web clients and caching proxies. If
private is used instead, a caching proxy will not store the object in cache whereas a web client will.
Here’s an example where a public resource is announced through the
Here’s the equivalent for private content:
Max-age and s-maxage
max-age directive is used to set the lifetime of an object in the cache. Here’s an example:
Cache-Control: public, max-age=3600
This header instructs the cache to store the object for 3600 seconds, which corresponds to an hour.
s-maxage directive does the same as
max-age, but it is intended for caching proxies rather than for web clients.
Here’s an example where a caching proxy is instructed to cache the object for a day:
Cache-Control: public, s-maxage=86400
It is also possible to combine these directives:
Cache-Control: public, max-age=3600, s-maxage=86400
This will result in the web client caching the resource for an hour and the caching proxy applying a time to live of a day.
stale-while-revalidate directive sets the allowed staleness of a cached object, allowing expired content that has passed its expiration time to be served from the cache.
stale-while-revalidate value sets the amount of seconds past the expiration time that stale content can be served while the cache is revalidating the content.
Here’s an example:
Cache-Control: public, max-age=900, stale-while-revalidate=100
In this example the cached object is considered fresh for 900 seconds. After that revalidation needs to take place. But because of the
stale-while-revalidate=100 directive, the object can be served from the cache for another 100 seconds while the cache is asynchronously revalidating with the origin web server.
When staleness is allowed, the end user will not be impacted by potentially slow backends during the revalidation process. Thanks to
stale-while-revalidate, the cache can be instructed to serve stale data while a new version of the resource is being fetched.
But what happens when the origin web server is down?
As long as the
stale-while-revalidate value is high enough, stale content will be served and the failed revalidation will go unnoticed as far as the user is concerned.
By setting a very high
stale-while-revalidate value, some business rules may be violated in situations where the origin web server is healthy.
stale-if-error directive sets the staleness when the origin is down. Here’s an exammple:
Cache-Control: public, max-age=900, stale-while-revalidate=100,
In this example an object is stored in the cache for 900 seconds and if the origin is healthy, this object may be served up to 100 seconds past the expiration of the object.
If the backend is down, the staleness can be drastically increased. In this case, a stale object may be served a full day past its expiration time.
Must-revalidate and proxy-revalidate
When staleness is not allowed, the
must-revalidate keyword is used to enforce this. Here’s an example:
Cache-Control: public, max-age=3600, must-revalidate
In this case the cached object is fresh for an hour, but as soon as it expires synchronous revalidation is mandatory.
If the web client is allowed to serve stale objects from the cache, but intermediary caches aren’t, you can use the
proxy-revalidate keyword to enforce this.
Cache-Control: public, max-age=3600, stale-while-revalidate=100,
In this case the object is cached for an hour. After that an extra 100 seconds of staleness is allowed while revalidation takes place. Because of
proxy-revalidate, staleness is only allowed by the web client. Caching proxies are not allowed to serve stale content.
No-cache and no-store
If the web server returns an uncacheable response, the
Cache-Control: no-cache, no-store syntax can be used to instruct the cache not to cache this response.
no-cache directive forces the cache not to serve the cached resource to the requesting client and instead revalidate the content with the origin web server.
no-store directive instructs the cache not to store this resource in the cache.
Both directives can be used separately, but they can also be combined.
Expires header can also be used to set the time to live of an object. The
Expires header doesn’t use relative numbers like the
Cache-Control header does. Instead it sets the date and time of expiration.
Here’s an example:
Expires: Sat, 4 May 2024 08:00:00 GMT
This cacheable resource is considered fresh until Saturday May 4th 2024 at 8 o’clock GMT.
By setting a date and time in the past, the
Expires header can instruct the cache not to store the object.
Vary header is used to create cache variations. As explained earlier, cache variations are used to create multiple variations of a cached object, based on a request header.
Here’s an example:
This example will create a cache variation for this resource based on the value of the
Accept-Language request header. This will allow multilingual websites that use the same URL structure for multiple languages to be properly cached.
Here’s another example:
This example will create a cache variation based on the value of the
X-Forwarded-Proto request header. This header is not sent by the client, but by a TLS proxy that terminates the TLS connection. Possible values are
This cache variation ensures that there’s an HTTP and an HTTPS version of each page to avoid mixed content.
Etag and If-None-Match
The Entity tag that is returned through the
Etag response header is used to identify a specific version of the resource.
Etag value could be any value, but it must be unique to the version it represents. Consider it a fingerprint of the content.
When a web server returns an
Etag header, the value can be presented by the client upon subsequent requests under the form of an
If-None-Match request header.
If-None-Match header represents the version of the resource it currently has. This value can be compared to the
Etag that is returned by the web server. If the values are identical, the content hasn’t changed and a
304 Not Modified status code can be returned without attaching a body to the HTTP response.
If the values of
Etag differ, the content has changed and a regular
200 OK response is returned that includes a body.
Last-Modified and If-Last-Modified
Last-Modified response header also identifies a specific version of a resource. Unlike the
Etag header, it uses a last modified date to identify that resource.
Here’s an example:
Last-Modified: Mon, 8 Nov 2021 18:28:00 GMT
This value represents the last time the resource was modified. The value of that response header can be presented to the web server upon subsequent requests under the form of a
Here’s an example:
If-Modified-Since: Sun, 7 Nov 2021 13:18:21 GMT
The value of the
If-Modified-Since header is older than the one presented by the
Last-Modified header. This means the content has changed and a
200 OK response should be returned.
If the values of
Last-Modified were identical, the client has the most recent version of the resource and a bodyless
304 Not Modified response could be returned.
Age header is used to inform the client how long the object has been stored in cache.
Here’s an example:
This means the object has been stored in the cache for 100 seconds.
Imagine the following example:
We know that the
max-age=300 directive sets the Time To Live of the cached object to 300 seconds. The fact that the
Age header is set to 100 seconds means that the cache object has a remaining lifetime of 200 seconds.
HTTP caching flow
When a reverse caching proxy server like Varnish is used to accelerate your origin server, there is a specific flow depending on the scenario.
We’d like to present four scenarios:
- The cache miss flow
- The cache hit flow
- The cache revalidation flow
- The conditional revalidation flow
Cache miss flow
When a client requests content from an empty cache, a cache miss occurs and the cache has to fetch the content from the origin. The following diagram illustrates this process:
Although we try to keep origin fetches to a minimum, a cache miss is not necessarily a bad thing. A cache miss is simply a hit that hasn’t happened yet.
When the origin web server responds, the caching proxy will store the response in the cache with a lifetime that was specified by the
Expires header and will serve the cached object to clients requesting it.
Cache hit flow
Once the object is stored in cache, subsequent requests will result in a cache hit, as illustrated in the diagram below:
As you can see, no connection to the web server is needed. This is by design and takes away the pressure from that origin web server while the caching proxy is serving the cached version of those origin responses.
Cache revalidation flow
At some point the cached object will expire and the content will need to be revalidated with the origin web server.
This involves an origin fetch, just like a cache miss. But unlike the cache miss scenario, the cache can choose to serve the stale content while asynchronously revalidating with the origin.
The diagram below clarifies the revalidation flow:
If you pay close attention, you’ll see that the order of execution is different: the client response can be returned before the origin revalidation response is received.
Cache-Control will enable asynchronous revalidation thanks to its
Cache-Control: public, s-maxage=3600, stale-while-revalidate=200
If we want to serve stale content when the origin web server is down, we could use the following
Cache-Control header and leverage the
Cache-Control: public, s-maxage=3600, stale-if-error=86400
Conditional revalidation flow
Revalidation can also be done conditionally. This means that the caching proxy will identify the version of the object through specific request headers, such as
The values of these headers are come from the
Last-Modified response headers that are part of the cached object.
If the latest version matches the version that is advertised by the proxy, the origin will acknowledge this and not send the full payload. A
304 Not Modified response is returned, the stale content is then considered fresh again and revalidation is paused until the content expires again.
A version matches if the
If-None-Match values are identical or if the
If-Modified-Since values are identical:
If the versions differ, the full response is sent by the origin and the content is considered fresh again:
304 Not Modified response without the body, instead of returning the payload of the cached object.
Conditional revalidation allows backends to consume less bandwidth by only adding payload to the HTTP response if the content has changed. If the origin is optimized for conditional revalidation, CPU, memory and disk I/O consumption can also be reduced.