Basic HTTP caching

Understanding HTTP caching is a necessity because in today’s world, installing and configuring Varnish doesn’t solve all your caching problems. So this involves a little understanding of web caching in general.

Every web service has its own requirements as to what to cache and what not to cache. Further down, you will find a checklist for Varnish caching.

As you get to know Varnish better you will see that as a web caching system Varnish serves as a proxy cache as well as a reverse proxy cache.

Two main reasons why you might be considering using caching: - To reduce network traffic load on servers - To reduce response time to users (making your service look fast)

Caching terminology

Caching manager: The integration resource that manages the caching components.

Origin server: Where the original content lies. It serves as a backup for the cache; that is, it serves content when the cache doesn’t have the relevant content and asks the origin server for the content.

Cache hit ratio: This is a measurement of the ratio of requests that were met through the cache against the total number of requests.

So a high cache hit ratio means most of the content was fetched from the cache. And a cache miss means the content could not be fetched from cache.

Freshness of data: Freshness of data is used to judge whether an object in cache is still fresh enough to serve a client.

Stale content: Stale content is judged based on how old the data is. In general all objects in cache have a time to live (TTL) value and once that expires, it is stale content.

Validation: Validation is checking whether the stale data in the cache is “fresh” in comparison with the origin server content. If it is the same, then the TTL value of the stale or almost expired object is increased.

Invalidation: Invalidation of objects as the name suggests is just the opposite of validating objects. It does the same check and deletes objects that are stale.

What is web caching?

Caching on the web is based on the HTTP protocol. The goal of web caching is to provide a balance between web traffic and web performance by caching relevant data based on client demands.

Common caching steps

  1. A request is sent with a header to the server.
  2. The response’s header tells whether to cache or not.
  3. If the request is authenticated or secure, it won’t be cached by shared caches.
  4. An object is considered fresh if:
    • it has a TTL value that is valid OR
    • it is an object which is infrequently changed
  5. If the object is stale:
    • the origin server will be asked to validate it OR
    • the origin server will have to inform cache whether the copy is fresh enough.
  6. When the network is down, the cache can serve stale responses without checking
    with the origin server.

Pros of caching

Caching is found on almost every level of a packet’s journey from one host to the other. The best-known advantages of caching is high performance via the use of proxy servers.

  • A quality analysis of web access helps improves client demands and caching
  • Caching offers significantly higher speed on the web
  • Reduces overhead costs on servers and use of bandwidth
  • Fast access to cached (most visited) resources, by faster reloads and delivery
  • Proxy servers help improve protocol translation (different browsers and devices)

Cons of caching

Caching can lead to adverse consequences if not properly considered.

  • Confusing and displaying wrong user information (very risky! e.g. subscription emails displaying all client emails)
  • Delivering stale objects (e.g. old news even when there is an update in the database)
  • Slow performance if object is not in cache
  • Caching logs contain user-specific information (user privacy violated)

Therefore when making cache decisions please take all of your company policies into consideration.

Defining cache-control policy

To define an optimal cache-control policy, there are some things to consider.

Is the response from the server reusable ?

If it is reusable, can it be revalidated ? If it is not re-usable, do not store it!

If it can be revalidated, do not cache it! If it cannot be revalidated, can it be cached by immediate caches ?

If it can be cached immediately, it can either be public info or user private private data.

Decide which private and public info should be cached. The next things to decide include:

  • maximum cache lifetime
  • maximum age
  • Add Etag header

Your caching checklist

What to cache and what not to cache?

Cache-friendly content

Content that does not change frequently.

  • Style Sheets, CSS, some unchanged HTML theme codes
  • AJAX and JavaScript files
  • Media files and downloaded content
  • Specific branding, logos, images that don’t change

Cache-unfriendly content

Content that should never be cached!

  • Any kind of personal information, such as logins, authentications, etc.
  • Any kind of sensitive data
  • Any user-specific content

Cache or not

Content that needs analysis before deciding on caching

  • Cookies
  • Frequently changing stylings, such as images, JavaScript and CSS