Complex importers with Drupal 7

When you want to get complex content into Drupal there is really only one valid choice, Migrate. Feeds works fine for simple things but any amount of complexity quickly calls upon Migrate, unless you want to write every aspect of your migration by hand; and you shouldn’t.

The example migrations are a good start alongside the Migrate documentation to understand the fundamentals. What we’ll discuss here is the non-obvious stuff.

Understand how Migrate operates

It’s vital that you understand when each Migrate phase is executed for you to effectively optimize and run your migrations.

For instance, preImport() and postImport() are not run after a migration completes but rather on each batch. That’s one of the reasons the documentation recommends to not run Migrate in the GUI or batch mode, but watch out, if your CLI configuration has a max_execution_time set drush will obey this and you will experience the same issues as in the GUI.

Plan your migration

You should take a critical look at the data you are receiving and critically evaluate if preprocessing it outside Migrate could yield faster results.

For example, a single XML file spanning gigabytes of data will take ages to render migrate-status, unless you tell computeCount() to just not compute. Continuing along that path is still valid if you can process your data from that directly and have lots of RAM. We ran into one such case and wrote a script which takes such a large XML and converts it into a separate SQL database, thus solving not only performance and stability issues but also improving manual debugging possibilities significantly.

Also consider splitting your migrations. While you can work with stubs to link dependent elements, you can often get faster results by just running one migration and then the dependent migration sequentially, especially if their data structure differs significantly and is not contextually present.

Multilingual support

Assuming your content is multilingual you are likely using Entity Translation nowadays, you’ll need a patch for that. Then you’ll need to declare the language property to populate the items and fill the content accordingly. The patch thread will help you along but in general it’s necessary to add the field mapping ands declare the languages:

// Constructor
$this->addFieldMapping('title_field', 'title_field');
$this->addFieldMapping('title_field:language', 'languages');

//prepareRow
$row->languages = array('de', 'en')
$row->title_field = array($your_title['de'], $your_title['en'])

 

As you can see, the data you feed in is just a keyed array according to your :language definition. You can either structure your data to already be provided in the row as such or you have to restructure it in prepareRow() to get there.

Updating content

Migrate assumes by default that you want to get your content into Drupal, maybe not everything at once and possibly do that again and again by rolling back and importing again.

If you want to update existing content continuously, Migrate will do most of the work for you, however, it’s important that you understand which mechanisms exist.

First, there is –update, this will force an update over all items you are iterating over. You might not want to do that, unless you are able to reduce the list of items itself. We did that for a MigrateListJSON to add a lastModified attribute by setting a variable in postImport(), so that we could avoid querying thousands of items each time to see if they had changed since the list itself didn’t contain that information. Migrate will report a nonsensical count of available items (e.g. -354) in those cases but (re)importing works just fine that way.

If you have an SQL migration you might be able to rely on last_imported to detect a change, however, we have found the hash option to be far more reliable in practice.

Finally, don’t get confused by the highwater mark. This will only let your migration quickly find where the last “highest” imported item was to resume quickly. It is not a sufficient solution to determine what you need to import and you are most often better off just being explicit about this via a hash or reducing your query set as we described above.

Debugging tips

Last but not least, here are a four more tips to debug your migrations.

  1. The most important is this: plan your debugging. Especially if you are importing a subset of data from a source it’s important to be able to say “we skipped this entry because x” and not just shrug because prepareRow() returned false.
  2. Use –idlist=”1234” to import just one or more items to quickly debug a corner case, without updating thousands of items (especially useful with MigrateListJSON). Watch out that you don’t write –idlist 1234, for some reason migrate ignores that but doesn’t tell you.
  3. Configure your IDE to be able to debug drush migrations with xdebug (i.e. google “xdebug phpstorm drush”), so that you can see the state of your individual migration.
  4. Don’t overlook the migration messages, they often point to incomplete records or misunderstood values.

Hinterlasse eine Antwort

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *

*

*

Du kannst folgende HTML-Tags benutzen: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>