For one of our high-profile client projects, the decision was made to migrate databases to PostgreSQL and retire the multiple MySQL and Oracle databases currently being used. The migration was a strategic decision at the agency CIO level to move to open source databases. Since we build a lot of open source applications for this client, it was apt that we were asked to do this work.
Since there were many databases to migrate, we decided to phase in the migration. The Oracle database was the biggest we had (I can say had now), so we started smaller with several MySQL databases. We had one MySQL database that wasn’t enabled in production most of the time, which ended up being handy for migrating something to production as quickly as possible in order to verify firewall rules and whatnot; ensuring that we could read from, and write to, new PostgreSQL VMs from an app on webservers in a DMZ. Anywho, onward…
To migrate MySQL, we used pgloader 3.4.1 (it was installed when the work started and I didn’t want to introduce additional variables into the migration process by upgrading it). Like any other piece of software, it works well when it works.
Note: if you want to exclude specific database tables, there is a line commented out in the data migration config, but you should probably RTFM. If you don’t have tables to exclude, RTFM.
I digressed again, anyway…
Two example pgloader config files are included with the names changed to protect the innocent (or something along those lines):
pgloader.conf.mysql_database_name.schema (if your application ORM isn’t smart enough to build the new database schema in PostgreSQL, no judgement,
just depends what you’re starting point is)
You’re probably better off re-writing triggers by hand.
Hopefully, the triggers you have in MySQL are standard enough that you can write some code to generate them for PostgreSQL given a list of table names and relatively few different types of trigger templates. You want the data to be migrated as identically as possible, so you’ll want triggers disabled during the data migration anyway.
- Rewrite them.
- Put them in a SQL file.
- Recreate them after migrating.
Database Sequences: handled in the schema migration.
Stored Procedures: didn’t have any to migrate. I’m not a fan. Whatever could go in a stored procedure should probably live in application logic elsewhere. It also ensures that business logic is not embedded in the database layer. Obviously, there are exceptions for performance reasons. In part 2 of this blog, we’ll talk about having to rewrite a function and stored procedure because ora2pg couldn’t translate it from oracle to PostgreSQL. We’ll get to that later and that shouldn’t be a surprise anyway. If you depend on stored procedures, you’ll probably end up translating them by hand.
Views: didn’t have any to migrate.
Materialized views: again… didn’t have any to migrate… also not a fan… also more in part 2.
- Take parts of the application offline to prevent any writes to the source database.
- ssh to the PostgreSQL database host
- Become the PostgreSQL linux user
sudo su PostgreSQL
- Migrate the schema (or let your application ORM recreate the schema or load it from a post non-production environment PostgreSQL migration schema dump file. I don’t want to tell you how to live your life.)
- Migrate the data (disabling triggers makes this more accurate and dropping indexes makes it faster, @see the data migration config)
- Recreate triggers
psql (postgres_database_name) < triggers.sql
- Connect to the database to change schema names if desired and set the search path
alter schema (mysql_database_name_lowercase)
rename to (postgres schema name)
alter database (postgres database name)
set search_path to (postgres schema name)
- Run grants (I’m glancing over grants, because you really should RTFM)
\i grants.sql on the Postgres prompt.
- Rebuild and deploy the application with any updates necessary (PostgreSQL drivers, etc.) for talking to the new PostgreSQL database and updated authentication secrets.
- Verify something that’s relevant. In our case, we compared counts from every table and a few records here and there in detail.
- Verify that indexes have moved over
- Verify that constraints have moved over and are working
- Turn the app back on
A few asides…
It’s difficult to say how much verification in step 10 is enough.
There were a few gotchas in our experience:
- Migrating various datetime fields from mysql which didn’t include timezones caused some weirdness. We ended up changing all of them to timestamp with timezone(0) in PostgreSQL. If you end up having to run manual SQL to clean up data post migration, I recommend adding a column and adjusting values in the new column. After verifying the data looks right, drop the old column and rename the new one.
- PostgreSQL turned out to be case sensitive in a few places where MySQL isn’t (e.g. account creation and user authentication using email addresses). Hopefully, your app has enough automated unit and/or functional testing lying around that this won’t cause regression in production.
Tune in, uh, sometime later, for Part 2: Oracle to Postgres migration. It’s a little more involved but has the same results.