<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=278116885016877&amp;ev=PageView&amp;noscript=1">

, , ,

Jan 26, 2016 | 5 Minute Read

How to Manage Drupal RESTful Cache Invalidation

Table of Contents

Introduction

Previously we discussed how to set up Drupal RESTful cache and how it works for Drupal with the RESTful module. With this piece, we’ll go forward with a sound caching strategy; the very same we used for Legacy.com. Note: regarding all things Drupal RESTful cache, most of what’s here applies specifically to version 1.x.

 Before we look into our caching implementation, let’s go over what causes latency in a typically Memcached RESTful setup.

Memcache Stampeding

Memcache stampeding occurs when a very high load, or multiple concurrent requests, are made for the same cache key which recently expired.  The stampeding requests result in a momentary slow site, due to a lot of writes to your database.

When many requests cause Memcached stampeding, the result can be an excessively slow site. To prevent this, the Memcache module comes with the following settings which are recommended:

[php]$conf['lock_inc'] = 'sites/all/modules/memcache/memcache-lock.inc';
$conf['memcache_stampede_protection'] = TRUE;[/php]

The `lock_inc` defines which locking mechanism to use. The default is the one provided by Memcache module's mechanism. The other setting indicates that we want Memcache module to implement stampede protection. This creates a lock in Memcache semaphore, preventing database write requests from mounting.

This stampede protection lock can cause a delay for Drupal RESTful cache endpoints when the request is waiting for the lock to release. We fixed this in our custom caching strategy.

Writing Your Authentication Mechanism

RESTful ships with a cookie, basic authentication (username and password supplied in request headers), and token-based authentication schemes. The RESTful module also allows adding our custom authentication scheme.

We opted for the custom authentication route because of two reasons:

1. There was a business requirement for API users to use different tokens on a per site and per application basis. This drove the first decision for writing our token management code.

2. The other reason was convenience. We created a user entity catering specifically to the API consumer and assigned a permission system for that. We had to authenticate the request as the "API user" to ensure that there weren't any permission leaks, like API consumers viewing unpublished content.

It is important to note that RESTful allows configuring different authentication mechanisms for different endpoints. Producing this mechanism for use with applications doesn’t prevent us from using a username and password basic authentication method in the future.

When executing performance tests against the Drupal content management system (CMS),  there was a noticeable latency of both cached and uncached API requests—we soon found that the token-based RESTful authentication manager was executing a write to the MySQL database on each request.

Since each request was made with an authentication token, each access was logged as the API user’s last access timestamp in Drupal’s users table. Needless to say, writing the last access timestamp for the website API user was not an intended use case—so we wrote our authentication manager.

Fortunately, with the Drupal RESTful module, extending the base `RESTfulAuthenticationManager` is straightforward. Extending the RESTfulAuthenticationManager "injects" the authentication manager object while constructing the plugin. All we had to do was create our custom authentication manager and inject it in our plugin constructor:

[php]class MyCustomAuthenticationManager extends RestfulAuthenticationManager {
// customize it...
}[/php]

And inside the plugin’s constructor

[php]public function __construct(array $plugin, RestfulAuthenticationManager $auth_manager = NULL, DrupalCacheInterface $cache_controller = NULL, $langcode = NULL) {
parent::__construct($plugin);
$this-&amp;gt;authenticationManager = $auth_manager ? $auth_manager : new MyCustomAuthenticationManager();
}[/php]

This performance optimization for our Drupal RESTful cache implementation drove our latency down across all requests as we stopped unnecessary database writes in every request.

This is one of the many instances where the RESTful module allowed almost infinite customization in any part of the API workflow, thanks to `ctools` plugins and some sane OO architecture.

Selective Purging

Every time an entity CRUD (database create, replace, update, or delete) happens, any payload across any endpoint can change. This implies that we can't have very rigid caching for all of the API endpoints—we have to create a solution where the API resource specifically clears the Drupal RESTful cache where the entity is served from.

In other words, we needed some form of intelligent caching, where, for instance, if we change any taxonomy terms, only the cached payloads of taxonomy term related endpoints will be invalidated.

For ease of discussion, let’s call this "selective purging."

Here's how a cache invalidate for specific endpoints can be executed:

[php]function clear_restful_payload_cache($plugins_to_clear=array()) {
$plugins = restful_get_restful_plugins();

foreach ($plugins as $plugin) {
$resource = $plugin['resource'];
$major_version = $plugin['major_version'];
$minor_version = $plugin['minor_version'];

if((empty($plugins_to_clear)) || (in_array($plugin['name'], $plugins_to_clear))) {
$handler = restful_get_restful_handler($resource, $major_version, $minor_version);
$cache = $handler-&amp;gt;getCacheController();
$handler-&amp;gt;clearResourceRenderedCache();
}
}
}[/php]

The `$plugins` variable contains an array of RESTful plugins which need to be cache invalidated. If this is passed as an empty array, all the endpoints are flushed out.

Here's an example of selective purging in action:

[php]/**
* Implements hook_entity_update().
*/
function mymodule_entity_update($entity, $type) {
$plugins = array(
'tag',
'articles',
);
$wrapper = entity_metadata_wrapper($type, $entity);
$bundle = $wrapper-&amp;gt;getBundle();
if($bundle == 'landing_page') { // add 2 more endpoints to cache invalidate if bundle is &amp;quot;landing page&amp;quot;
array_push($plugins, 'landing_page__1.0', 'panels_display__1.0');
}
clear_restful_payload_cache($plugins);
}[/php]

Now that we can do a selective purging of endpoints, we need not worry about Memcache stampede issues.

Selective purging allows us to rebuild and invalidate our cache intelligently, instead of doing a full cache invalidation every time there is a payload change. This enables us to opt out of stampede protection. This can be done using the below configuration in `settings.php`:

[php]$conf['memcache_stampede_protection_ignore'] = array(
'cache_menu',
'cache_path',
'cache_variable',
);[/php]

This also fixes the latency in requests due to Memcached's lock wait semaphore.

The Many Layers Of Drupal RESTful Cache

With Legacy.com, one of our most important focal points in the selection of web server infrastructure was selecting a caching strategy that would reduce the load on Drupal, and, therefore, reduce costs. With Acquia Cloud, in addition to Memcache, the Varnish cache layer serves cached payloads for a majority of Legacy.com’s traffic. 

The Varnish cache is cleared using the Acquia purge module whenever content is updated within the Drupal CMS. Acquia purge does a clean sweep of all the API endpoints which are configured based on path aliases, or any selective purge logic we have configured. With the `_acquia_purge_service()`, we have the option of doing a selective purge by using Acquia purge API.

[php]$queue = _acquia_purge_service();
$queue-&amp;gt;addPaths(array('api/v1.0/features?category='. $category_name));
$queue-&amp;gt;process();[/php]

Currently, we are working on an end user facing UI for managing RESTful purging per the above process—it will be open sourced soon.

Setting Cache Headers

A proper RESTful application should serve responses with some cache related metadata in response headers. This is particularly useful for CDNs.  `Cache-Control` and `Expires` are most notable directives in response headers. Cache-Control specifies who can cache the response and for how long. A typical example of a Cache-Control header looks like this:

Cache-Control: public, max-age=900

"Expires" specifies when a response will expire.

Expires: Mon, 04 Jan 2016 13:19:40 GMT

RESTful allows setting these headers using a call to setHttpHeaders.

Here's a code snippet which sets both the headers.

[php]$max_age = variable_get('page_cache_maximum_age');
$cache_header = &amp;quot;public, max-age={$max_age}&amp;quot;;
$this-&amp;gt;setHttpHeaders('Cache-Control', $cache_header);
$this-&amp;gt;setHttpHeaders('Expires', gmdate(&amp;quot;D, d M Y H:i:s&amp;quot;, REQUEST_TIME + $max_age) . &amp;quot; GMT&amp;quot;);[/php]

These can also be configured on a per resource basis by setting these values in the .inc file for the resource.

Shortly after we deployed a RESTful 1.x based setup to production, RESTful 2.x was released. Though all of what was discussed here applies to 1.x, the concepts are the same for 2.x. Don't worry about the new release too much as the 1.x version is still supported and used in production. Later on, we'll contrast the differences between both the versions and see what 2.x additionally has to offer.

About the Author
Lakshmi Narasimhan Parthasarathy, Axelerant Alumni
About the Author

Lakshmi Narasimhan Parthasarathy, Axelerant Alumni


Back to Top