Caching RPM, Python, and PHP with Varnish Enterprise

Tags: red hat (2) python (1) php (1) package caching (1) vcl (31) rpm (1)

This tutorial will demonstrate how to set up Varnish Enterprise to cache for RPM, Python, and PHP packages. By the end of this tutorial you should be able to use one machine and one VCL configuration for all three of these package types.

The general flow of this process is identifying where your local machine stores the location for retrieving these packages, changing this to point to your Varnish Enterprise instance, and then setting that original stored location as a backend for Varnish.

For the sake of space, some of the outputs or logs in this tutorial have been shortened, but the commands should all be clear.

To verify Varnish is caching as you test this, it may be helpful to open a second window and run:

sudo varnishncsa -b -q "BereqMethod eq 'GET'"

This will show you all backend fetches made by Varnish. After the initial fetch, you should see that removing packages and reinstalling them does not change this commands output as they are being served from cache. With that said, let’s get started with RPM.

RPM

First let’s assume you already have Varnish Enterprise installed. With that done, when you run a yum update, you should see all needed package updates, including for our RPM distribution, in my case AlmaLinux.

Upgraded:
almalinux-gpg-keys-9.5-2.el9.x86_64   almalinux-release-9.5-2.el9.x86_64                                    
almalinux-repos-9.5-2.el9.x86_64

What we want to do now is change the registry locations for where our machine looks for these packages, which in this case will be our Varnish Enterprise instance. Since we’re just doing this test on a Digital Ocean instance, we will just be using localhost:6081, but if you are implementing this yourself, then change that as needed.

We can see where these packages are being held by doing:

ls /etc/yum.repos.d/

This is the output:

almalinux-appstream.repo  almalinux-highavailability.repo  almalinux-rt.repo       epel-cisco-openh264.repo     varnish-plus-60.repo
almalinux-baseos.repo     almalinux-nfv.repo               almalinux-sap.repo      epel-testing.repo
almalinux-crb.repo        almalinux-plus.repo              almalinux-saphana.repo  epel.repo
almalinux-extras.repo     almalinux-resilientstorage.repo  droplet-agent.repo      varnish-enterprise-6.0.repo

We can take a look these files with cat and notice that all of our almalinux-{something}.repo files have a line or lines like:

mirrorlist=https://mirrors.almalinux.org/mirrorlist/$releasever/baseos-source

What we want to do is edit this line each time it appears in the desired files to point to Varnish, or in our case http://localhost:6081.

Looking at /etc/yum.repos.d/almalinux-baseos.repo, we can see this should be changed to look like:

[baseos]
name=AlmaLinux $releasever - BaseOS
mirrorlist=http://localhost:6081/mirrorlist/$releasever/baseos
# baseurl=https://repo.almalinux.org/almalinux/$releasever/BaseOS/$basearch/os/
enabled=1
gpgcheck=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-AlmaLinux-9
metadata_expire=86400
enabled_metadata=1

[baseos-debuginfo]
name=AlmaLinux $releasever - BaseOS - Debug
mirrorlist=http://localhost:6081/mirrorlist/$releasever/baseos-debug
# baseurl=https://repo.almalinux.org/vault/$releasever/BaseOS/debug/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-AlmaLinux-9
metadata_expire=86400
enabled_metadata=0

[baseos-source]
name=AlmaLinux $releasever - BaseOS - Source
mirrorlist=http://localhost:6081/mirrorlist/$releasever/baseos-source
# baseurl=https://repo.almalinux.org/vault/$releasever/BaseOS/Source/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-AlmaLinux-9
metadata_expire=86400
enabled_metadata=0

Finally, we want to edit our default.vcl file and set up our backend to pull from https://mirrors.almalinux.org/ like so:

vcl 4.1;

import goto;

backend default none;

sub vcl_init {
    new rpm = goto.dns_director("https://mirrors.almalinux.org/", ip_version = ipv4);
}

sub vcl_recv {
    unset req.http.cache-control;
    unset req.http.pragma;
}

sub vcl_backend_fetch {
    if (bereq.http.User-Agent ~ "libdnf") {
	set bereq.backend = rpm.backend();
    }
    unset bereq.http.host;
}

sub vcl_backend_response {
    # No Last-Modified header? Just use the current time
    if (!beresp.http.last-modified) {
	 set beresp.http.last-modified = now;
    }
    if (beresp.status == 200) {
	    set beresp.ttl = 1h;
	    set beresp.grace = 1s;
	    set beresp.keep = 1y;
    } else {
	    set beresp.ttl = 5s;
	    set beresp.grace = 0s;
    }
}

After restarting Varnish, we can do a few yum update commands, and see that Varnish is caching the package with sudo varnishlog -d:

*   << Request  >> 65540
-   ReqMethod      GET
-   ReqURL         /mirrorlist/9/baseos
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: localhost:6081
-   ReqHeader      User-Agent: libdnf (AlmaLinux 9.5; generic; Linux.x86_64)
-   VCL_call       RECV
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            32771 3096.861751 1.000000 31536000.000000
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     X-Varnish: 65540 32771
-   RespHeader     Age: 503
-   RespHeader     Via: 1.1 varnish (Varnish/6.0)
-   VCL_call       DELIVER
-   VCL_return     deliver
-   End

Python

Python packages are installed with the pip package manager. We need to edit our VCL a bit so that we can also these packages:

vcl 4.1;

import goto;

backend default none;

sub vcl_init {
    new rpm = goto.dns_director("https://mirrors.almalinux.org/", ip_version = ipv4);
    new python = goto.dns_director("https://pypi.org/simple", ip_version = ipv4);
}

sub vcl_recv {
    unset req.http.cache-control;
    unset req.http.pragma;
}

sub vcl_backend_fetch {
    if (bereq.http.User-Agent ~ "libdnf") {
	set bereq.backend = rpm.backend();
    }
    if (bereq.http.User-Agent ~ "^pip/") {
	set bereq.backend = python.backend();
    }
    unset bereq.http.host;
}

sub vcl_backend_response {
    # No Last-Modified header? Just use the current time
    if (!beresp.http.last-modified) {
	 set beresp.http.last-modified = now;
    }
    if (beresp.status == 200) {
	    set beresp.ttl = 1h;
	    set beresp.grace = 1s;
	    set beresp.keep = 1y;
    } else {
	    set beresp.ttl = 5s;
	    set beresp.grace = 0s;
    }
}

With that done, we have to reload our VCL through varnishreload before we can go ahead and install python3 and pip:

sudo varnishreload
sudo yum install python3 python3-pip

And then we can check the versions with:

$ python3 --version
Python 3.9.21
$ pip --version
pip 21.3.1 from /usr/lib/python3.9/site-packages/pip (python 3.9)

Now we want to configure Python to route to Varnish. We can do so temporarily by first uninstalling a package incase it was already installed. Also note the shortened this output:

$ pip uninstall numpy
Found existing installation: numpy 2.0.2
Uninstalling numpy-2.0.2:
  Would remove:
    /usr/local/bin/f2py
    /usr/local/bin/numpy-config
    /usr/local/lib64/python3.9/site-packages/numpy-2.0.2.dist-info/*
Proceed (Y/n)?
  Successfully uninstalled numpy-2.0.2

And then reinstalling it like so:

pip install numpy --index-url http://localhost:6081/simple --verbose

If we uninstall and reinstall for a second time, we can see the cache hit in the logs like so:

*   << Request  >> 32776
-   ReqMethod      GET
-   ReqURL         /simple/numpy/
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: localhost:6081
-   ReqHeader      User-Agent: pip/21.3.1 {"ci":null,"cpu":"x86_64","distro":{"id":"Teal Serval","libc":{"lib":"glibc","version":"2.34"},"name":"AlmaLinux","version":"9.5"},"implementation":{"name":"CPython","version":"3.9.21"},"installer":{"name":"pip","version":"21.3.1"},"openssl_version":"OpenSSL 3.2.2 4 Jun 2024","python":"3.9.21","setuptools_version":"53.0.0","system":{"name":"Linux","release":"5.14.0-284.11.1.el9_2.x86_64"}}
-   VCL_call       RECV
-   VCL_return     hash
-   ReqUnset       Accept-Encoding: gzip, deflate
-   ReqHeader      Accept-Encoding: gzip
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            32771 523.957584 10.000000 0.000000
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     X-Varnish: 32776 32771
-   RespHeader     Age: 76
-   RespHeader     Via: 1.1 varnish (Varnish/6.0)
-   VCL_call       DELIVER
-   VCL_return     deliver
-   End

To make our machine permanently use Varnish, we need to make a directory and configuration file like shown below:

mkdir ~/.pip
nano ~/.pip/pip.conf

And put the following content in ~/.pip/pip.conf:

[global]
index-url = http://localhost:6081/simple

After we uninstall numpy again, we can reinstall without forcing it to use Varnish like before:

$ pip install numpy
Looking in indexes: http://localhost:6081/simple
Collecting numpy
Installing collected packages: numpy
Successfully installed numpy-2.0.2

We will now see another HIT in sudo varnishlog -d.

PHP

PHP packages are installed with the Composer package manager. But before we can install Composer, we need to make sure we have the right dependencies installed:

sudo yum install php php-cli php-common php-mysqlnd php-fpm -y

To verify the install you can check the php version by running php -v:

$ php -v
PHP 8.0.30 (cli) (built: Aug  3 2023 17:13:08) ( NTS gcc x86_64 )
Copyright (c) The PHP Group
Zend Engine v4.0.30, Copyright (c) Zend Technologies
    with Zend OPcache v8.0.30, Copyright (c), by Zend Technologies

Then install Composer:

php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php composer-setup.php
sudo mv composer.phar /usr/local/bin/composer

Let’s make a directory for a project and initiate one:

mkdir php-test
cd php-test/
composer init

By running cat composer.json, you’ll get the following output:

{
    "name": "brian/php-test",
    "description": "test",
    "type": "project",
    "minimum-stability": "stable",
    "require": {
        "guzzlehttp/guzzle": "^7.9"
    }
}

With that done, we want to update the VCL to pull from the backend shown by composer config --list --global:

[repositories.packagist.org.type] composer
[repositories.packagist.org.url] https://repo.packagist.org

The VCL should now look like:

vcl 4.1;

import goto;

backend default none;

sub vcl_init {
    new rpm = goto.dns_director("https://mirrors.almalinux.org/", ip_version = ipv4);
    new python = goto.dns_director("https://pypi.org/simple", ip_version = ipv4);
    new php = goto.dns_director("https://repo.packagist.org", ip_version = ipv4);
}

sub vcl_recv {
    unset req.http.cache-control;
    unset req.http.pragma;
}

sub vcl_backend_fetch {
    if (bereq.http.User-Agent ~ "libdnf") {
	set bereq.backend = rpm.backend();
    }
    if (bereq.http.User-Agent ~ "^pip/") {
	set bereq.backend = python.backend();
    }
    else {
	set bereq.backend = php.backend();
    }
    unset bereq.http.host;
}

sub vcl_backend_response {
    # No Last-Modified header? Just use the current time
    if (!beresp.http.last-modified) {
	 set beresp.http.last-modified = now;
    }
    if (beresp.status == 200) {
	    set beresp.ttl = 1h;
	    set beresp.grace = 1s;
	    set beresp.keep = 1y;
    } else {
	    set beresp.ttl = 5s;
	    set beresp.grace = 0s;
    }
}

After adjusting the default.vcl and reloading the VCL using sudo varnishreload, you can adjust the global variable to look for localhost using:

composer config --global repositories.packagist.org '{"type":"composer", "url":"http://localhost:6081"}'

When you run composer config --list --global again, you’ll get the following output:

[repositories.packagist.org.type] composer
[repositories.packagist.org.url] http://localhost:6081

If you need to allow http instead of https you can do so with:

composer config --global secure-http false

With our machine configured and the VCL in place, we can pull the required packages with:

composer require guzzlehttp/guzzle

We should see our packages be delivered. We can also do sudo varnishlog -d to see the requests coming through Varnish.

To see cache hits, we can clear the objects from the project, and then pull them again:

rm -rf vendor/ composer.lock
composer clear-cache
Cache directory does not exist (cache-vcs-dir):
composer require guzzlehttp/guzzle

Now when we do a sudo varnishlog -d we should see cache hits like so:

*   << Request  >> 229505
-   Begin          req 229495 rxreq
-   ReqMethod      GET
-   ReqURL         /p2/guzzlehttp/streams.json
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: localhost:6081
-   ReqHeader      Accept: */*
-   ReqHeader      Accept-Encoding: deflate, gzip, br
-   ReqHeader      Connection: keep-alive
-   ReqHeader      User-Agent: Composer/2.8.8 (Linux; 5.14.0-284.11.1.el9_2.x86_64; PHP 8.0.30; cURL 7.76.1)
-   ReqHeader      X-Forwarded-For: ::1
-   VCL_call       RECV
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            294980 3101.351442 1.000000 31536000.000000
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespHeader     X-Varnish: 229505 294980
-   RespHeader     Age: 498
-   RespHeader     Via: 1.1 varnish (Varnish/6.0)
-   VCL_call       DELIVER
-   VCL_return     deliver
-   End

For more questions or assistance, please reach out to your Account Manager.