8/19/2014 - 1:10 PM

Some tips for handling data migrations in Rails

Some tips for handling data migrations in Rails

Data Migrations in Rails Apps

If you need to manipulate existing data when your code is deployed, there are two main ways to do it:

  1. Create a rake task to migrate the data after the code is deployed. This is ideal for more complex data migrations.
  2. Use ActiveRecord models in a migration. This is acceptable for smaller data manipulations.

Regardless of the method you use, make sure to test your migrations before submitting them.

Data Migrations in Models

The problem with putting data migrations in models is that they can error out if model logic changes, which is a big pain when deploying to production. However, sometimes a rake task can be overkill for a simple manipulation. Here are some ways to minimize the risk of updating data in migrations.

Avoid ActiveRecord

SQL doesn't care about validations and all the other logic that comes with ActiveRecord models, so executing a raw query can be less error prone. However, executing raw SQL can also be dangerous.

Stub Out Models

Stubbing out a model in your migrations has two main advantages:

  1. Guards against the case where a model is removed from the codebase but is still being called in a migration.
  2. Prevents validations from being run and eliminates overhead from associations.
class AddStatusToModem < ActiveRecord::Migration
  class Modem < ActiveRecord::Base

  def up
    add_column :modems, :status, :string

    Modem.find_each do |modem|
      modem.status = 'active'!

  def down
    remove_column :modems, :status

The call to reset_column_information ensures that the Modem model is updated and has access to the new status column.

If you are going to use models in your migrations, this is how it should be done.

Data Migrations in Rake Tasks

Handling complex data migrations in a rake task is a good idea

To create a custom rake task:

rails g task data_migration set_user_status

Then populate it with your data migration:

namespace :data_migration do
  desc "Sets the default modem status"
  task set_modem_status: :environment do
    ActiveRecord::Base.record_timestamps = false

    Modem.find_each do |modem|
        modem.status = 'active'!
        puts "Error updating #{}"

    ActiveRecord::Base.record_timestamps = true

There are a few notable things about this task:

  1. Setting ActiveRecord::Base.record_timestamps = false prevents ActiveRecord from updating the timestamps on all the records we are touching.
  2. Wrapping the updates in a begin rescue end block gives us the opportunity to catch errors and report them so we can handle problematic records later.

Stubbing out models can also help minimize the chance of failure in rake tasks.

Testing Data Migrations

First, pull down a dump of the production database with rake repl, then run your migations and verify everything looks right.