Advanced cache invalidation strategies

Advanced cache invalidation strategies

“There are only two hard things in Computer Science: cache invalidation and naming things.” -- Phil Karlton

Today, I’ll write about the hardest one of these two; cache invalidation. This used to be rather straightforward. Most websites had a rather simple structure between their content repository and their URL structure. This made it rather simple to purge URLs when the content changed.

 

Today the situation for many sites is a bit more complex. It is not uncommon to have an article being presented in six or ten different variants, under multiple URLs. Some sites operate with separate URL spaces for desktop, mobile and tablet. In addition an article might be partially referenced on other pages. Front- or section pages might refer to a whole lot of articles and the article itself might have content from other articles embedded into it. Keeping track of all these relations can be a pain.

One possible solution for this has been to use the ban function in Varnish. You do this by having the content presentation system list the article IDs on the object in a header, like this:

X-Article-IDs: 39474, 39232,38223,32958

Then you can issue a ban matching on the article ID and it will invalidate every object that references the matching IDs.

ban obj.http.x-article-id ~ “[ ,]$ID\D”

Very nice and very powerful. However, there is a flip side to the bans in Varnish. They are quite expensive. Every ban you issue will be matched against every object in the cache that is older than the ban itself. This will either happen at the time of delivery or will be done by a background worker thread commonly referred to as the ban lurker. So, in terms of scalability we're looking at N x M, where N is the number of objects in memory and M is the number of bans.

This is not a problem if you only issue a ban from time to time, but if you are dealing with a lot of them and you have many objects in memory they will require a lot of CPU time. In addition there are locks that will be subject to contention. For instance, it might take several seconds on a busy site to get the lock necessary to issue the ban.

Enter hashtwo (also known as hashninja)

Hashtwo was developed earlier this year to solve the scalability issues we’ve seen with bans. Hashtwo maintains an additional hash, in addition to the main hash that links objects and content in Varnish. You can put whatever you want into the hash and it will maintain a many-to-many relationship between keys and objects.

So, in the example above the content presentation layer could spit out the same list of articles in a header and hashtwo will maintain pointers from article numbers to cached object. The hash is pretty efficient and won't require much overhead. 

Then you can easily create a HTTP method similar to what you use for PURGE that would take an article ID as input and kill all the objects that reference that article.

The hashtwo module is only available to our Gold and Platinum subscribers. If you have a subscription and you want to give it a spin, send an email to our support and we'll help you get started.

Oh, and if you have an opionion on the name, we'd like to hear your view here in the comments. We're kind of split between hashtwo, hashninja, gatlingpurge and order66. We don't mind a little bikeshedding from time to time. :-)

Photo is (c) 2009 Jeyhun Pashayev and used under a CC license.

Add comment

Refresh Type the characters you see in this picture.
Type the characters you see in the picture; if you can't read them, submit the form and a new image will be generated. Not case sensitive.  Switch to audio verification.

Comments

this very interesting topic, im still learning about how to use varnish because i have a plan to use it on production, but i have a small concern about how to ban an old list(s) of articles if i have a new article? of course this new article is never presented in every lists before.

i'll use your example:

content presentation list:
X-Article-IDs: 39474, 39232,38223,32958

then i create a new article, let say this new article ID is 45000 and it should be presented in above "content presentation list", how to ban or invalidate that list?

thanks.

Per Buer's picture

That is a really good question. There would be no perfect solution to this as there is no way for Varnish to know that is a relationship between the old article and the new.

If you, instead of having article ID relationships, have relationships based on "tags" things might get easier as you have a way to invalidate based on future releationships. Or you could use both, of course.

Cross-posting from http://www.smashingmagazine.com/2014/04/23/cache-invalidation-strategies..., since Smashing Magazine is slow at approving comments:

Nice article, Per — thanks for getting this in front of a big audience — the more people are aware, the faster the web will become :)

I found it particularly interesting that you specifically called out tagging ("Tagging Content For Bans") … because that is precisely what Drupal 8 will ship with!
Drupal 8 sets "cache tags" on each "page part": those cache tags that are associated with the things (entities) being depended upon to generate that "page part" (an article, a user, an image gallery, a comment…). As the page is assembled, the cache tags are bubbled up, which allows Drupal 8 to send a comprehensive X-Drupal-Cache-Tags header (see https://drupal.org/node/2222835 for details).
Precisely what you've described with X-depends-on: 3483 4376 32095 28372, but slightly more advanced: X-Drupal-Cache-Tags: node:5 user:33 file:57 taxonomy_term:42 taxonomy_term:1337 .

I have one (broad) question for you: how does this scale? Could you share numbers around that? Because surely, you will have more data and insight on this than anyone else!
For example, which is the primary bottleneck: the number of tags per response (e.g. >100), or the number of unique tags across all responses cached by Varnish? (Say 10 million.) Or maybe even something else, such as a high ban frequency being problematic? (For example: Drupal 8 will require a ban for every new comment that is posted, and comment frequency varies greatly by site, of course.)

Over at https://www.varnish-software.com/blog/advanced-cache-invalidation-strate..., you say:

Hashtwo was developed earlier this year to solve the scalability issues we’ve seen with bans. Hashtwo maintains an additional hash, in addition to the main hash that links objects and content in Varnish. You can put whatever you want into the hash and it will maintain a many-to-many relationship between keys and objects.

Can you share some performance numbers on how Hashtwo improves upon regular bans?

Per Buer's picture

So cool to hear this is going into D8. :-)

Ban processing is dependant on two things. The number of objects in cache and the number of bans. So, if you have only a few thousand objects in cache you can basically ban all you like. Same if you only do a few bans - it won't matter.

As long as the bans are not made to rely on req.* so the ban lurker can chug along and dispose of a lot of the bans I think it would scale fairly well. I think that a site like nytimes.com or wapo could easily be powered by bans for cache invalidation as long as the user-generated content is kept out of the mix. It's fairly easy to spot when it happens. Varnish will start using CPU and responsetime will increase into millisecond territory. The counters for regex-matches will go into 1M/sec or above.

Hashtwo, or Hashninja as we've renamed it now, makes stuff scale by using a hash structure to keep the references to the tags in. I think the hash structure we use is O(1). So it removes the need for the ban matching at the cost of a tiny bit of memory.

We migrated a major site with user-contributed content from bans to hashninja and the response time dropped from 15ms to below 1ms.

It's already in! :)

If I understand you correctly, you're saying:

  1. 1 million invalidations per second is not a problem ("will go into 1M/sec or above" looks a bit ambiguous to me).
  2. bans will scale fine for the ~99% smallest sites, and even when that ~1% of most dynamic sites gets in trouble, it's only relative trouble, because 15 ms is still pretty fast

I'd cry tears of joy if Drupal would serve most pages in <=15 ms! Looks like hashninja is only needed for the biggest, most complex, most dynamic websites out there. It's of course awesome to know that this exists (and if a website needs it, they should be earning enough to pay you guys to get to sub-millisecond response times!) — thanks for sharing your insight!

Per Buer's picture

1M matches per second is not a problem. Not 1M invalidations. Every ban must match against all objects in cache that are older than the ban so there are quite a few matches to be done if you have, say, 20M object in memory.

But, yes, bans will scale fine for 99%. Sites with user contributed content might run into problems at volumes, yes.