Any reasonably large corporation today has multiple needs for its web properties:
- Stay relevant and fresh with modern technologies and trends.
- Leverage the modern content management capabilities of a mature system like Drupal.
- Reduce the total cost of ownership and maintenance costs associated with building and running this system.
One of the ways a large organization looks to offset the cost of maintenance in case of multiple web properties is by using Drupal's multisite system. Drupal's multisite system is one of the oldest features of the CMS which is still popular in its niche. This system allows the developers to build multiple, mostly similar sites using a single codebase.
Building such a system is one thing and running it is something else. A normal Drupal system can get complicated quickly. A multisite and decoupled solution can get very complex. Here, we will talk about how we handled these requirements during a recent project for a large hospitality company.
What did we need to build?
We have two primary hosting needs: the backend and the frontend. We will host the backend which is a multisite Drupal instance on Acquia's site factory which is designed on multisite and allows site owners to spin up new websites as required. A detailed walkthrough on how to set up a Drupal website is beyond the scope of this article and is covered in Acquia's elaborate documentation on Site Factory.
Drupal's multisite makes the hosting requirements for the backend simple, allowing us to run multiple sites from a single instance (but multiple databases). The frontend is not so simple. We need to run as many instances as the number of web properties (websites). It helps to develop all of these websites on a common stack so that it is easier to share the infrastructure resources wherever possible.
We also wanted our websites (frontend) to be "serverless", i.e., we shouldn't manage any servers to deliver the frontend assets. For a Decoupled Drupal setup, we need two things in this model:
- An API endpoint which is available to the static web application above to get data from Drupal.
The first part is beyond the scope of this article. We will mainly talk about the second part and the infrastructure required to run the entire system. Since we need this infrastructure setup multiple times (as we have multiple frontends), we will use Terraform to create these resources on multiple cloud providers to serve different regions.
This is how we did it.
Let's get down to the bolts. We'll start with defining our requirements here.
- The frontend application needs to be deployed to multiple cloud providers.
- This means we need to create the corresponding infrastructure for multiple cloud providers.
- The application should be served in a serverless fashion. Since the static web assets are, well, static, we can just put it on S3 or similar and put a CDN in front of it.
- The application needs to talk to Drupal but not directly. This means we need to build a proxy for each site. Since everything is stateless, the proxy wouldn't be able to figure out which site to request and this means we would need a proxy for each site.
- I am sure we can figure out workarounds here but keeping proxies separate simplifies the infrastructure and the proxy itself.
There's a lot going on here. Let's try to get in one step at a time.
Infrastructure for static web assets
This is simple. All we need is an S3 or similar storage and a CDN distribution with origin set to the S3 bucket. In our case, this is the result of an Angular build for our frontend code. As previously mentioned, this is beyond the scope of this article and we won’t get into details here. Anything that can run completely client-side will work here.
Various providers have more options to control how the files are served. This ranges from simple settings like specifying the index file and 404 handler to making the bucket or object storage private so that a visitor may not access these files without the CDN. A variety of settings are possible here but we won’t go too deep into them here.
The endpoint for the content API
Unless you’re doing a SSR build or a static build like Gatsby (the lines are blurred now anyway), you would need an endpoint to access the API. Since we have different environments, we can’t hardcode this URL in the front-end application. Nor can we put this in something like an environment variable as this will run entirely in the browser. This means we need to hardcode the URL but during the build time.
In Angular, we do this by modifying the environment.prod.ts file just before the build. Jenkins takes care of this for us for most environments. We just needed to do this manually for the production environment (which we could have automated as well but that’s another story).
The proxy for the content API
While we could directly access the Drupal server for content, we wanted to keep a layer in between to isolate the content server (Drupal). This even allowed us to use an IP whitelist for the Drupal server for additional security. All the content is retrieved from a proxy which allows us some additional flexibility. One of the biggest wins is the proxy can be placed behind a CDN which will give us caching at the edge and an instant win for performance and availability. Further, on Amazon Web Services (AWS) we can even add it as a different origin on the same CloudFront distribution as the static files and we don’t have to worry about CORS at all.
The proxy in our case is implemented as a Lambda function on AWS and a FunctionCompute instance on Aliyun. We implemented the function using NodeJs 8 which is supported on both Cloud providers. We used TypeScript for additional type checking and consistency with the rest of our team which was working with Angular websites.
The proxy would be set up with environment variables pointing to the relevant URLs for the Drupal server (depending on the environment). We could have done this with a single service but then we would have to handle identifying the site and environment somehow. Instead, we went with a simpler solution which involved a Lambda or a FunctionCompute instance for each site’s environment. There is no cost involved for each serverless function (only invocations are charged). The code for all of these instances is identical and the only difference is the endpoint URL, which was made available via environment variables.
The challenge with many function instances is in managing it, which brings us to the next section.
We need to maintain very similar infrastructure resources as described above for each site and environment. In our case, we built five websites, each with four environments, and three of these sites were also deployed on Aliyun. This brings the total number of systems to 32. That’s 32 CDN distributions, 32 buckets (or 64 when we count log buckets), 32 Lambda functions, 32 API gateways (which invokes the Lambda function), and several other supporting resources such as IAM roles, permissions, etc.
It is impractical to manage this manually which means we need a way to automate. Terraform is a very good fit for something like this. Terraform allows us to write similar scripts for AWS and Aliyun. They are still different modules but they are written with the same structure and same “language”. There are platform-specific technologies like CloudFormation but this means we would have very different looking “scripts” for different providers. There is value in having a similar structure in terms of readability even if the providers are different.
Terraform is fairly new in its space and a lot of best practices are being defined. Invoking a Terraform run is particularly challenging, especially when there are a lot of parameters that we need to pass in. In our case, we wrote a simple bash script that makes it easier for us to create a variable file which we can pass in for the Terraform run.
During the initial development, while the infrastructure was still being refactored, we ran Terraform as part of our CI pipeline. Eventually, this was unnecessary and we just had to run the Terraform modules once to prepare the environment. The IDs of all the resources created this way were set in a configuration file accessible to all Jenkins runs. This lets us deploy the front-end code from the CI builds.
Terraform maintains the state of all the infrastructure and since we used the same modules to target distinct environments, we need to make sure that the state is distinct. This is another thing that the bash script handles for us seamlessly. These kinds of tasks can prove disastrous when done by hand and it is always a good idea to automate whatever we can.
These are some of our learnings from this project.
This project brought with it its fair share of complexity as well as learning. One of the most important learnings we took away from this relates to the team structure.
|It is not surprising that an efficient decoupled solution is not when we have efficient communication between software systems, but when we have efficient communication among the teams.|
This means that it is more important to build an API definition on which the team agrees than implementation details.
More specific to the infrastructure: contemporary infrastructure practices of IaaC may not fit in into traditional workflows which have matured over time. Before implementing cutting edge techniques such as running Terraform from a CI pipeline, there must be buy-in from across the operations and development teams. Specifically, issues of permissions and access to resources must be discussed and agreed. For example, creating resources with granular access also involves creating IAM roles specific to that environment, which means Terraform needs to use an IAM role that has permissions to create other IAM roles.
Also, cutting edge technologies mean that they may not be supported by all providers. In our case, Aliyun support in Terraform was very basic. We had to do several things manually like setting environment variables and configuring API gateway backend. Even Aliyun CLI tool didn’t support setting this configuration at that time. It is supported at the time of writing this article, though.
This article did not cover everything in detail but presented an overview. My intention is to cover individual details in their separate articles. Meanwhile, I’ll suggest that you watch this session from DrupalCon Nashville that describes a very similar decoupled setup.