Caching authenticated Artifactory requests with Varnish Enterprise

Tags: artifactory (1) vcl (20)

Artifactory by JFrog is an artifact management solution that houses and manages build artifacts and external software dependencies. These artifacts and dependencies can be software libraries, packages, executables, container images, scripts, or any other file that is needed during the development and delivery of your software.

Why cache artifacts from Artifactory?

Artifactory is a very intelligent tool that usually scales well, but some setups grow so fast that they become a bottleneck. This can cause automated systems to slow down and in some cases it even causes outages.

The symptoms may include:

  • Reduced responsiveness
  • Increased latency
  • Outages

Using a cache in front of your Artifactory servers can offload pressure from these systems, increase performance, and improve developer efficiency. This can lower your costs while allowing you to continue benefiting from the great Artifactory features.

If you’re using the SaaS version of Artifactory, the associated cloud egress charges will drop when placing Varnish closer to your developers and systems.

Why Varnish Enterprise?

While Varnish Cache, the open-source version of Varnish, is perfectly capable of caching the output of your Artifactory requests, it lacks specific features to maximize the efficiency of the cache.

Varnish Enterprise offers the Massive Storage Engine (MSE), a proprietary storage engine that is capable of caching large volumes of data. In an Artifactory setup with potentially terabytes worth of artifacts, MSE can cache all these assets without compromising on performance.

The native TLS capabilities of Varnish Enterprise ensure higher throughput rates compared to terminated TLS in Varnish Cache. Although we strongly believe in Hitch as a dedicated TLS proxy for Varnish Cache, it will be outperformed by native TLS in Varnish Enterprise at larger scale.

Another important feature of Varnish Enterprise that increases the efficiency of Artifactory caching, is vmod_http. This VMOD allows you to trigger out-of-band HTTP requests to Artifactory, which we use to perform pre-flight authorization calls.

In the next section, we’ll share the custom VCL code we use to increase the efficiency of authorized Artifactory requests. This custom VCL code leverages vmod_http and ensures we can cache authorizations as well as the artifacts, themselves.

Custom Artifactory VCL

The full VCL code is available in the toolbox repository. Rules can be added to include or exclude specific URL patterns from the cache. Let’s start by breaking down the code into snippets and explaining what each snippet does.

Pre-flight authorization

The main capability of the Artifactory VCL is the ability to perform pre-flight authorization requests.

The code that enables this, lives inside the vcl_recv subroutine, as you can see in the snippet below:

sub vcl_recv {
	unset req.http.X-Authorization;
	unset req.http.X-Client-Authorized;
	unset req.http.X-Method;

	# Authorize GET request by looping a HEAD request through Varnish
	if (req.http.Authorization && req.method == "GET") {
		http.init(0);
		http.req_set_url(0, http.varnish_url(req.url));
		http.req_copy_headers(0);
		http.req_set_method(0, "HEAD");
		http.req_send(0);
		http.resp_wait(0);
		if (http.resp_get_status(0) != 200) {
			return (synth(403));
		}

		set req.http.X-Client-Authorized = "true";
	}

	# Stow away the HEAD request method
	if (req.method == "HEAD") {
		set req.http.X-Method = "HEAD";
	}

	# We stow away the Authorization header to avoid return (pass) in builtin.vcl.
	# This is safe because GET requests are authorized with a HEAD request loop,
	# and HEAD requests make the X-Authorization header a part of the cache key.
	if (req.http.Authorization) {
		set req.http.X-Authorization = req.http.Authorization;
		unset req.http.Authorization;
	}
}

We check whether a GET request is made that contains an Authorization header. This implies that an authorized request is made to an artifact from Artifactory.

To be sure that the request is allowed, the VCL code performs a pre-flight authorization request to Artifactory. This is done before checking whether the requested artifact is stored in the cache.

You’ll ask Artifactory if you can serve the request, via a HEAD request, to keep the exchange lightweight. The request is sent to Artifactory through Varnish to cache its response and avoid endlessly verifying authorizations for the same user-object pair.

If the response has a 200 status code, we set the X-Client-Authorized header value to true. Any other status code will result in a synthetic 403 response being returned by Varnish.

Because Varnish automatically converts HEAD requests into GET requests when performing a backend fetch, we’ll store the HEAD request method in a custom X-Method request header for later use.

We apply similar logic to the Authorization header; because Varnish doesn’t cache private content out-of-the-box, we stow away the value of the Authorization header in a custom X-Authorization header to be used later.

Caching the authorization header

As the Varnish built-in VCL states: only the URL and the value of the Host header are used to compose the lookup hash that identifies an object in the cache. So by default, Varnish has no awareness of the Authorzation header.

By default Varnish only uses the Host header and the URL to compute the cache key, so you need to tell it that the method and Authorization header are important too:

sub vcl_hash {
	if (req.http.X-Client-Authorized != "true") {
		hash_data(req.http.X-Authorization);
	}
	if (req.http.X-Method) {
		hash_data(req.http.X-Method);
	}
}

If the value of the X-Client-Authorized doesn’t evaluate to true, the user isn’t yet authorized and we need to cache the authorization.

The values of the custom X-Authorization request header and the X-Method request header are added to the lookup hash. This ensures that the result of the lightweight HEAD request to Artifactory, to authorize the user, is cached on a per-user basis.

As soon as X-Client-Authorized is set to true, it means that the user is already authorized and we can store the artifact in the cache without creating unnecessary cache variations that include the X-Authorization header.

Ensuring the right headers are added during a backend fetch

The final piece of the puzzle is to ensure that we restore the headers that Varnish altered along the way, prior to performing a backend fetch.

As explained earlier, Varnish turns HEAD requests into GET requests for the sake of efficiency; a HEAD is just a GET without the payload, so Varnish figures it can perform a GET backend fetch, store the response in the cache for later use, and only serve the headers to the requesting client that used the HEAD request method in the first place.

In our use case, that doesn’t work because we deliberately want these pre-flight authorization requests to be as lightweight as possible. We don’t want to risk filling up the cache with payload we didn’t need because we only cared about the status code.

That’s why we need to restore the request method to HEAD. The same applies to the Authorization header; we initially stripped it off, because the built-in VCL bypasses the cache for private content.

Here’s the VCL code to do that:

sub vcl_backend_fetch {
	# Restore the Authorization header.
	if (bereq.http.X-Authorization) {
		set bereq.http.Authorization = bereq.http.X-Authorization;
		unset bereq.http.X-Authorization;
	}

	# Restore the original request method (this is changed by varnish core).
	if (bereq.http.X-Method == "HEAD") {
		set bereq.method = "HEAD";
		unset bereq.http.X-Method;
	}
}
  • If the X-Authorization header is set, restore the Authorization header for the backend fetch to Artifactory and assign the value of X-Authorization.
  • If the value of the X-Method header is HEAD, restore the request method to HEAD to ensure Artifactory doesn’t return any payload for authorized requests.