Migrations: Writing “id_map” plugins

Tags: Drupal 8


Recently, while working on a project, we were working on importing content from CSV files. We were using Drupal’s Migration API to achieve this, as it provides a fully featured framework to import/migrate content.

During the content migration process, when a particular content item is being migrated, we wanted to find out whether it has been previously migrated based on one of the fields.

To do this, during the migration process, we would need to look for that field value in the destination site’s node, and return that node id if that value is present.

We implemented this in the following steps:

  1. Implementing our own ID map plugin
  2. Telling migration templates or migration config to use our newly created plugin.

What are ID Map plugins?

Migration’s ID Map plugins keep track of relationships between source content and migration content.

Typically, when we write a migration, we provide mapping with a source ID. Source ID is a unique ID in the source database, CSV, or any other source format.

If you look at the migration template here, you will notice the “Keys” attribute in the “source” mapping block.

source:
  plugin: csv
  path: 'public://example.csv'
  header_row_count: 1
  keys:
	- 'Item-ID'
  fields:
	Item-ID: Unique identifier for each page.
	Page name: Name of the page.
	Page category: Page category. 

 
Here, “keys” specifies a unique ID in the source CSV file. So, ID Map plugin will maintain a relationship between “Item-ID” and the migrated node’s nid. Typically this relationship is stored in tables starting with migrate_map_

For example, if you have a migration with the ID node_product, the table name will be something like migrate_map_node_product.

Implementing ID Map plugin

If you notice the default “id_map” plugin here, Drupal\migrate\Plugin\migrate\id_map, you will come across a few functions that are helpful to us: lookupSourceID(), lookupDestinationId(), lookupDestinationIds(). These functions typically help us to find source ID and destination ID based on a given value.

As we want to check whether the current node is migrated or not, we need to use lookupDestinationId().

To implement our own plugin, we should create a new plugin in our module’s namespace. The class for our new plugin should extend “Sql” class (Sql class is located here: Drupal\migrate\Plugin\migrate\id_map). That way, we can use all the other remaining functionalities of the default Sql id_map plugin.

Let’s say your module is my_module. We'll name our plugin class “ContentUpdate”.

Create a new file called “ContentUpdate.php” in the “Drupal\my_module\Plugin\migrate\id_map" namespace.

Override the “lookupDestinationId()” method given below.

public function lookupDestinationId(array $source_id_values) {
   $results = $this->lookupDestinationIds($source_id_values);
   return $results ? reset($results) : [];
}

 
After that, override the lookupDestinationIds() method in our “ContentUpdate.php” class so we can write our business logic to find out whether the content is migrated or not.

You can implement your business logic to find the node ID based on a given source value. Partial implementation is given below.

public function lookupDestinationIds(array $source_id_values) {
 if (empty($source_id_values)) {
   return [];
 }
 $source_ids = $source_id_values;

 // Now check if there are any existing nodes with same page unique IDs.
 if (isset($source_ids['Item-ID'])) {
   $nodes = \Drupal::entityTypeManager()->getStorage('node')->loadByProperties([
     'field_page_unique_id' => $source_ids['Item-ID'],
   ]);
 // Instead of directly calling \Drupal::entityTypeManager() we should use dependency injection.
   if ($nodes !== NULL) {
     return array_keys($nodes);
   }
 }

 // Add other logic to follow and process content items from function in base class
 // So if id is not found, migration can check in other tables etc.
 // Refer gist mentioned in below description for full implementation.

 return [];
}

 
In this way, we can look for existing IDs from our destination site. Refer to this gist for the entire snippet.

Use the new plugin in migration templates

Now that we have created a new plugin, it’s time to use this in our migration templates.

In our migration templates, we can specify the id_map plugin using the “idMap” key.
See the following snippet to see how we can specify.

id: node_product
label: "Product Content"
migration_tags:
 - content_import

# Notice the ID map plugin here.
idMap:
 plugin: content_update
# ID map specification ends.

source:
 plugin: csv
 path: 'public://example.csv'
 header_row_count: 1
 keys:
   - 'Item-Id'
 fields:
   Item-ID: Item ID
   first_name: First Name
process:
 type:
   plugin: default_value
   default_value: product


# Further mapping items.

 
Refer to this gist for the full migration template.

Conclusion

While we saw one use case of the ID Map plugin, there are several other usages for ID Map plugins. When you are migrating data from a non-SQL database and you have a source database key that is pretty big (bigger then 3072 bytes), you might need to alter the schema and change the width of the table field. You can read about that in the blog post by Matt Glaman here.

Mohit Aghera, Back-end Developer
Posted on Mar 6, 2018 7:49:37 PM by

Mohit Aghera, Back-end Developer

Offline, if he's not wandering around the city with family or friends, you can find him in his home studio painting away.