Varnish monitoring
There are multiple tools and metrics available to monitor a Varnish installation. This tutorial provides information on important counters that will assist in monitoring vital aspects of a Varnish installation.
There are various commands that can be used to access additional information. Each option will be explained the first time it’s encountered, but you can find a more complete description in the man
page of each tool.
varnishlog
varnishlog is in charge of presenting transaction logs in the most verbose way possible. This is a prime source of information for debugging.
Save all the current logs into a file:
# -d: dump all the logs present in the buffer, then exit
# -g raw: don't group records (lines) by transactions, just grab everything
# -w FILE: don't print logs, save them to FILE instead
varnishlog -d -g raw -w tmp/varnishlog.raw
Slow client responses:
# -g request: group logs by request, including dependencies.
# Useful to link a client request to a backend
# -q QUERY: only show transaction matching QUERY, in this case,
# responses that took more than a second to be
# delivered
varnishlog -d -g request -q "Timestamp:Resp[2] > 1.0"
Slow backend responses (more than one second to read):
varnishlog -d -g request -q "Timestamp:Beresp[2] > 1.0"
Requests that spent any time on the waiting list:
varnishlog -d -g request -q "Timestamp:Waitinglist[2] > 0.0"
Backend failures, with 5XX responses, and slow responses:
# -r RATE: each minute, only show at most 100 transactions
varnishlog -d -g request -q "RespStatus ~ '^5' or Timestamp:Resp[3] > 10.0 or Error" -R 100/1m
varnishadm
The varnishadm utility establishes a CLI connection to varnishd
(Varnish daemon).
The following are useful commands to troubleshoot a Varnish instance via varnishadm
:
Show an overview of Varnish runtime parameters:
varnishadm param.show
Display the ban list, containing the ban expressions that are used to invalidate the cache:
varnishadm ban.list
Return Varnish panics:
varnishadm -- panic.show
Display the health of the various backends in Varnish:
varnishadm -- backend.list -p
varnishstat
The varnishstat utility collects and displays counter and metrics of a Varnish instance since startup time.
Run varnishstat
once and exit:
# -1: print counter to stdout, instead of using the interactive interface
varnishstat -1
varnishstat
also accepts filters that can be applied as follows:
# -f GLOB: only show counter whose name match GLOB
varnishstat -1 -f 'MAIN.*'
Important Counters
Here’s a selection of important counters, but you can check the varnish-counters
man
page for the full listing.
man varnish-counters
MAIN COUNTERS (MAIN.*)
client_req
Number of parsable client requests received.
cache_hit
Number of cache hits.
cache_miss
Number of cache misses.
threads_limited
Number of times more threads were needed, but limit was reached in a thread pool.
n_object
Number of HTTP objects (headers + body, if present) in the cache.
n_lru_nuked
How many objects have been forcefully evicted from storage to make room for a new object.
bans
Number of all bans in the system, including bans superseded by newer bans and bans already checked by the ban-lurker.
fetch_failed
Backend content fetches failed.
sess_queued
Contains the number of sessions that are queued because there are no available threads immediately. Consider increasing the thread_pool_min parameter.
sess_dropped
Counts how many times sessions are dropped because varnishd hits the maximum thread queue length. Consider increasing the thread_queue_limit Varnish parameter as a solution to drop fewer sessions.
exp_mailed
Number of objects mailed to expiry thread for handling.
exp_received
Number of objects received by expiry thread for handling.
threads
Total number of threads being used by Varnish.
n_lru_nuked
Number of least recently used (LRU) objects thrown out to make room for new objects. If this is zero, there is no reason to enlarge your cache. Otherwise, your cache is evicting objects due to space constraints. In this case, consider increasing the size of your cache.
MSE COUNTERS (MSE.*)
mse.c_bytes
Bytes allocated.
mse.c_freed
Bytes freed.
mse.g_alloc
Allocations outstanding.
mse.g_bytes
Bytes outstanding.
mse.g_space
Bytes available.
mse.insert_timeout
Number of inserts that timed out.
mse.n_lru_nuked
Number of LRU nuked objects.
mse.n_lru_moved
Number of LRU move operations.
mse.c_memcache_hit
Stored objects cache hits.
mse.c_memcache_miss
Stored objects cache misses.
mse.g_ykey_keys
Number of YKeys registered.
mse.c_ykey_purged
Number of objects purged with YKey.
SMA COUNTERS (SMA.*)
g_bytes
Number of bytes allocated from the storage.
g_space
Number of bytes left in the storage.