Clustering vs VHA

Tags: ops (31)

This document compares Clustering with VHA (Varnish High Availability) across five dimensions: Pull/Push, Resiliency, Latency, Set Up, and Scaling.

By default, in a Varnish setup with independent caches, the origin server may receive one request per cache node even for the same cacheable object. Cluster.vcl solves this problem through prescriptive sharding, ensuring only one origin request per object cluster‑wide. VHA, in contrast, achieves consistency via preemptive cache replication across nodes. Details below, but in nearly all production cases, Cluster.vcl is the preferred solution.

1. Pull/Push

Clustering (Pull-based)

Guarantees cluster-wide request coalescing: only one request goes to the origin.
Reduces inter-node traffic and cache bloat caused by rarely accessed objects.

VHA (Push-based)

Does not guarantee single-origin requests, making it unsuitable for live streaming or low-TTL use cases.
Every cacheable request is broadcast to all nodes, even if never requested again.

2. Resiliency

Clustering

Efficiently resilient.
Moderately popular objects quickly replicate to target nodes.
On node failure, missing objects are replicated from the primary node until the replication target is met.
Different objects may have different replication targets.

VHA

Inefficiently resilient.
Every node caches every object.
Can tolerate the loss of all but one node and still retain all objects.
Limited node failure tolerance, beyond a point, surviving nodes become overwhelmed.

3. Latency

Clustering

Self-routing may introduce slightly higher latency:
- Cache-misses often go through an extra hop to the primary node.
- Can be mitigated by increasing replication targets for hot content.
Requires low inter-node communication latency.

VHA

Can have lower latency in some cases:
- Broadcasts pre-warm other nodes: no extra hop needed.
- Nodes can be warmed before receiving traffic.
But if multiple nodes miss simultaneously, both must fetch from origin, increasing latency.

4. Set Up

Clustering

Simple and straightforward.
No identity management required.
Entire configuration resides in VCL via a VMOD.
Nodes defined as static or dynamic backends.

VHA

Complex and error-prone.
Mistakes are easy to make, hard to debug, and can go unnoticed.
Discovery and broadcaster are separate services.
Nodes defined in a separate file.

5. Scaling

Clustering

Scales efficiently.
Horizontally scales memory and disk capacity.
Hot objects can be replicated to all nodes to reduce inter-node traffic.
Long-tail objects are selectively replicated to reduce storage bloat.
Uses efficient Varnish backend fetches for replication.

VHA

Poor scalability.
Does not horizontally scale memory or disk.
Broadcasts every cacheable request to all nodes, increasing traffic and bloat.
Relies on less efficient vmod_http for broadcasts.
High CPU usage.

Clustering vs VHA

1. Pull/Push#

Clustering (Pull-based)#

VHA (Push-based)#

2. Resiliency#

Clustering#

VHA#

3. Latency#

Clustering#

VHA#

4. Set Up#

Clustering#

VHA#

5. Scaling#

Clustering#

VHA#

1. Pull/Push

Clustering (Pull-based)

VHA (Push-based)

2. Resiliency

Clustering

VHA

3. Latency

Clustering

VHA

4. Set Up

Clustering

VHA

5. Scaling

Clustering

VHA