Drupal is a popular web-based content management system designed for small to large enterprises with needs such as complex workflows, multilingual content, and enterprise integrations. An increasing number of organizations move to Drupal from their current systems every year and with richer features being added to Drupal 9 and planned for 10, the growth will only accelerate. This means that migrations to Drupal remain an ever-popular topic.
Drupal provides a powerful and flexible migration framework that allows us to “write” migrations in a declarative fashion.
The migration framework supports a variety of sources and the ability to specify custom sources and destinations. Furthermore, the framework provides a powerful pipelined transformation process that allows us to map source content to destination fields declaratively.
Thanks to this framework, migration is more of a business challenge rather than a technical one. The overall process (or workflow) of the migration may differ depending on various business needs and attributes of the current (source) system. Depending on the type of migration, we may plan to reuse in-built migrations (in core or contrib), selectively adapt migrations from different sources, or entirely write new migrations. Further, depending on the source, we may choose to migrate incrementally or at one-time.
The Drupal migration framework is composable, which is why it can be used flexibly in many scenarios. The basic building entity (not to be confused with Drupal entities) is called, migration. Each migration is responsible for bringing over one discrete piece of content from the source to the destination. This definition is more technical than a business one as a “discrete piece” of content is determined by Drupal’s internal content model and may not match what you might expect as an editor.
For example, an editor may see a page as a discrete piece of content, but the page content type may have multiple files or paragraph fields or term references, each of which has to be migrated separately. In this case, we would have a separate migration for files, paragraph fields, and so on, and then eventually for the page itself. The benefit of defining migrations this way is that it allows the migrate framework to handle each of these pieces of the content itself, providing features like mapping IDs and handling rollbacks.
Correspondingly, a migration specifies a source, a destination, and a series of process mappings that define the transformations that a piece of content may go through while being mapped from a source field to a destination field. These are called plugins (because of their internal implementation). We might use different source plugins depending on the source system with the common ones provided by Drupal core (for sources such as Drupal, SQL databases, etc.).
There are dozens of contributed modules available for other systems such as WordPress, CSV files, etc. Similarly, process plugins are diverse and influential in allowing a variety of transformations on the content within the declarative framework. On the other hand, destination plugins are limited because they only deal with Drupal entities.
The Drupal migrate framework supports incremental migrations as long as the source system can identify a certain “highwater mark” which indicates if the content has changed since a recent migration.
A common “highwater mark” is a timestamp indicating when the content was last updated.
If such a field is not present in the source, we could devise another such field as long as it indicates (linearly) that a source content has changed. If such a field cannot be found, then the migration cannot be run incrementally, but other optimizations are still available to avoid a repeat migration.
Handling dependencies and updates
The migrate framework does support dependencies between different migrations, but there are instances where there might be dependencies between two content entities in the same migration. In most cases, the migrate framework can transparently handle this by creating what are known as “stubs.” In more complex cases, we can override this behavior to gain more adequate control on stub creation.
As discussed in the previous section, it is better to use “highwater marks” to handle updates but may not be available in some cases. For these, the migrate framework stores a hash of the source content to track if the migration should be run. Again, this is handled transparently in most cases but can be overridden when required.
Rollbacks and error management
As long as we follow the defined best practices for the migrate framework, it handles fancier features such as rollbacks, migration lookups, and error handling. Migrate maintains a record of each content piece migrated for every migration, its status, hashes, and highwater marks. It uses this information to direct future migrations (and updates) and even allow rollbacks of migrations.
Understanding the content
Another important part of the equation is the way content is generated. Is it multilingual? Is it user-generated content? Can content be frozen/paused during migration? Do we need to migrate the revision history, if available? Should we be cleaning up the content? Should we ignore certain content?
Most of these requirements may not be simple to implement, depending on the content source. For example, the source content may not have any field to indicate how the content is updated and in those cases, an incremental migration may not be possible. Further, if it’s impossible to track updates to source content using simple hashes, we may have to either ignore updates or update all content on every migration. Depending on the size of the source and transformations on the content, this may not be possible and we have to fall back to a one-time migration.
The capabilities of the source dictate the overall migration strategy.
Filtering content is relatively easy. Regardless of the source, we can filter or restructure the content within the migration process or in a custom wrapper on a source plugin. These requirements may not significantly impact the migration strategy.
Refactoring the content structure
A migration can, of course, be a straightforward activity where we map the source content to the destination content types. However, a migration is often a wonderful opportunity to rethink the content strategy and information flow from the perspective of end-users, editors, and other stakeholders. As business needs change, there is a good chance that the current representation of the content may not provide for an ideal workflow for editors and publishers.
Therefore, it is essential to look at the intent of the site and user experience it provides to redefine what content types make sense then and in the near future. At this stage, we also look at common traits that distinguish the content we want to refactor and write mappings accordingly. With this, we can alter the content structure to split or combine content types, split or combine fields, transform free-flowing content to have more structure, and so on. The possibilities are endless, and most of these are simple to implement.
Furthermore, in many cases, the effort involved in actually writing the migration is not significantly different.
Drupal to Drupal migration
This is usually the most straightforward scenario. The core Drupal migrate framework already includes source plugins necessary for reading the database of an older version of Drupal (6 or 7). In fact, if the intention is to upgrade to Drupal 8 or 9 from Drupal 6 or 7, then the core provides migrations to migrate configuration as well as content. This means that we don’t even need to build a new Drupal site in many simple cases. It is simply a question of setting up a new Drupal 8 (or 9) website and running the upgrade.
Drupal is often not used for simple cases or for any non-trivial site needs rebuilding.
A typical example is “views,” which are not covered by migrations. Similarly, page manager pages, panels, etc., need to be rebuilt as they cannot be migrated. Further, Drupal 8 has brought improvements, and updated techniques to build sites and the only option, in that case, is to build the functionality with the new techniques.
In some cases, it is possible to bring over the configuration selectively and remove the features you want to rebuild using a different system (or remove them altogether). This mix-and-match approach enables us to rebuild the Drupal site rapidly and also use the migrations provided in core to migrate the content itself. Furthermore, many contributed modules augment or support the core migration, which means that Drupal core can transparently migrate certain content belonging to contributed modules as well (this often happens in the case of field migrations). If the modules don’t support a migration path at all, this would need to be considered separately, similar to migration from another system (as explained in the next section).
Incremental migrations are simpler to write in case of Drupal-to-Drupal migration as the source system is Drupal and it supports all the metadata fields such as timestamps of content creation or updates. This information is available to the migrate framework, which can use it to enable incremental migrations. If the content is stored in a custom source within the legacy Drupal system and it does not have timestamps, a one-time migration may have to be written in that case. See the previous section on incremental migrations for more details.
While Drupal-to-Drupal migrations can be very straightforward and even simple, it is worth looking into refactoring the content structure to reflect the current business needs and editorial workflow in a better way. See the section on “Refactoring the content structure” for more details.
Migration to Drupal from another system
Migrating from another popular system (such as WordPress) is often accomplished by using popular contrib modules. For instance, there are multiple contrib modules for migrating from WordPress, each of which migrates from a different source or provides different functionalities. Similarly, contrib modules for other systems may offer a simple way to define migration sources.
Alternatively, the migrate framework can directly use the database source to retrieve the content. Drupal core provides a source that can read all MySQL compatible sources and there are contributed modules that allow reading from other databases such as MSSQL.
Similar to the Drupal migration source, features such as incremental migrations, dependencies, and update tracking may be used here as long as their conditions are satisfied. These are covered in detail in earlier sections.
Check out the case study that outlines migrating millions of content items to Drupal from another system.
Migration from unconventional sources
Some systems are complex enough to present a challenge during migration, even with the sophistication of source plugins and custom queries. Or there may be times when the content is not conventionally accessible. In such scenarios, it may be more convenient to have an intermediate format for content such as a CSV file, XML file, or similar formats. These source plugins may not be as flexible as a SQL source plugin (as advanced queries or filtering may not be possible over a CSV data source). However, with migrate’s other features, it is still possible to write powerful migrations.
Due to limitations of such sources, some of the strategies such as incremental migration may not be as seamless; nevertheless, in most cases, they are still possible with some work and automation.
An extreme case is when content is not available in any structured format at all, even as CSVs. One common scenario is when the source is not a system per se, but just a collection of HTML files or an existing web site. These cases are even more challenging as extracting the content from the HTML could be difficult. These migrations need higher resilience and extensive testing. In fact, if the HTML files vary significantly in their markup (it’s expected when the files are hand-edited), it may not be worth trying to automate this. Most people prefer manual migration in this case.
Picking a strategy
Wherever possible, we would like to pick all the bells and whistles afforded to us by the migrate framework, but, as discussed previously, a lot of it depends on the source. We would like every migration to be discrete, efficient with incremental migration support, track updates, and directly use the actual source of the content. However, for all of this to be possible, certain attributes of the source content system must be met as explained in the “Understanding the content” section.
The good news is that we often find a way to solve the problem and it is almost always possible to find workarounds using the Drupal migrate framework.
Hussain Abbas, Director of PHP & Drupal Services
Hussain is a calm ambivert who'll surprise you with his sense of humor (and sublime cooking skills). Our resident sci-fi and fantasy fanatic.