Varnish Clustering vs Varnish High Availability
In this tutorial we’ll compare Varnish Clustering with VHA (Varnish High Availability) across five dimensions:
- Pull or Push
- Resiliency
- Latency
- Setup
- Scaling
By default, in a Varnish setup with independent caches, the origin server may receive one request per cache node even for the same cacheable object.
Our cluster.vcl
file solves this problem through prescriptive sharding, ensuring only one origin request per object cluster‑wide.
VHA, in contrast, achieves consistency via preemptive cache replication across nodes.
Details are mentioned below, but in nearly all production cases, cluster.vcl
is the preferred solution.
1. Pull or Push
The first dimension we’ll use to compare clustering versus VHA is whether the solution uses a Pull-based model or a Push-based model.
Clustering (Pull-based)
The clustering solution that Varnish Enterprise offers operates using a Pull-based model. This means that clustering happens naturally: as requests reach a Varnish node, the cluster.vcl
figures out where to route the request and what Varnish node is considered the primary node for this piece of content.
Our clustering solution guarantees cluster-wide request coalescing: only one request goes to the origin for each object and the primary Varnish node for that object is the only node that accesses the origin.
The pull-based implementation of our clustering solution also reduces inter-node traffic and cache bloat caused by rarely accessed objects.
Varnish High Availability (Push-based)
VHA (Varnish High Availability) uses a Push-based model where the replication of objects is explicitly triggered when a cache miss occurs on a Varnish node.
In VHA every cacheable request is broadcasted to all nodes, even if never requested again, which makes it less efficient.
VHA also does not guarantee single-origin requests, making it unsuitable for live HTTP-based video streaming or low-TTL use cases.
2. Resiliency
The second dimension of comparison is resiliency: how do both solutions behave when failure occurs.
Clustering
Our clustering solution is efficiently resilient: objects are spread across the cluster and when one node fails, only a portion of the objects is lost.
However, the architecture of our clustering solution is designed in such a way that node failure doesn’t require lost objects to be fetched directly from the origin: on node failure, missing objects are replicated from the primary Varnish node until the replication target is met.
Moderately popular objects quickly replicate to target nodes and different objects may have different replication targets.
Varnish High Availability
Our VHA solution is inefficiently resilient: because every node caches every object, it can tolerate the loss of all but one node and still retain all objects.
There is however limited node failure tolerance: beyond a point surviving nodes become overwhelmed.
3. Latency
The third dimension of comparison is latency: how long does it take to replicate objects to another Varnish nodes?
Let’s see how both solutions stack up.
Clustering
The Varnish Clustering solutions uses self-routing to intelligently find the right node to fetch the content from when a cache miss takes place.
This self-routing mechanism may introduce slightly higher latency:
- Cache misses often go through an extra hop to the primary node.
- This can be mitigated by increasing replication targets for hot content.
Reducing latency at the cluster level requires low inter-node communication latency.
Varnish High Availability
Our VHA solution can have lower latency in some cases:
- VHA broadcasts pre-warm the cache on other Varnish nodes, which means no extra hop needed.
- Because cache misses in a VHA setup trigger a broadcast to replicate content on all Varnish nodes, nodes can be warmed before receiving traffic.
But if multiple nodes miss simultaneously, both must fetch the object from the origin server, increasing latency.
4. Setup
The fourth dimension of comparison is the setup: how complex is a Varnish Clustering setup compared to a Varnish High Availability setup?
Clustering
Setting up a Varnish cluster is simple and straightforward:
- There is no identity management required.
- The entire configuration resides in a single
cluster.vcl
which can be included and loads via a VMOD. - The Varnish nodes that are part of the cluster are either defined as static or dynamic backends, using DNS-based service discovery.
Varnish High Availability
Setting up VHA is complex and error-prone: the replication part is handled by a separate service called Varnish Broadcaster. If that service fails, VHA no longer works.
The Varnish Broadcaster that powers VHA depends on a configuration file that contains a static inventory of all nodes to broadcast to. Changes to the inventory require the configuration file to be changed, and the broadcaster to be reloaded.
In dynamic setups where Varnish nodes come and go, an extra service is required to perform service discovery: The Varnish Discovery service. This is yet another service that needs to be managed and yet another service that can cause failure.
Using VHA mistakes are easy to make, hard to debug, and can go unnoticed.
5. Scalability
The fifth and final dimension to compare Varnish Clustering and Varnish High Availability is scalabilty: how easy is it to scale the solution when traffic surges occur or when the catalog of cached objects increases?
Clustering
The Varnish Clustering solution scales efficiently:
- Memory and disk capacity can be scaled horizontally.
- Hot objects can be replicated to all nodes to reduce inter-node traffic.
- Long-tail objects are selectively replicated to reduce storage bloat.
- Uses efficient Varnish backend fetches for replication.
Varnish High Availability
Unfortunately, VHA scales poorly:
- VHA does not horizontally scale memory or disk.
- VHA broadcasts every cacheable request to all nodes, increasing traffic and bloat.
- VHA relies on the less efficient
vmod_http
module for broadcasts. - VHA is can experience high CPU usage.