11/29/2016 - 10:55 AM

Git friendly MySQL Dump

Git friendly MySQL Dump

With the default mysqldump format, each record dumped will generate an individual INSERT command in the dump file (i.e., the sql file), each on its own line. This is perfect for source control (e.g., svn, git, etc.) as it makes the diff and delta resolution much finer, and ultimately results in a more efficient source control process. However, for significantly sized tables, executing all those INSERT queries can potentially make restoration from the sql file prohibitively slow.

Using the --extended-insert option fixes the multiple INSERT problem by wrapping all the records into a single INSERT command on a single line in the dumped sql file. However, the source control process becomes very inefficient. The entire table contents is represented on a single line in the sql file, and if a single character changes anywhere in that table, source control will flag the entire line (i.e., the entire table) as the delta between versions. And, for large tables, this negates many of the benefits of using a formal source control system.

So ideally, for efficient database restoration, in the sql file, we want each table to be represented by a single INSERT. For an efficient source control process, in the sql file, we want each record in that INSERT command to reside on its own line.

My solution to this is the following back-up script:


cd my_git_directory/

ARGS="--host=myhostname --user=myusername --password=mypassword --opt --skip-dump-date --order-by-primary"
/usr/bin/mysqldump $ARGS --database mydatabase | sed 's$VALUES ($VALUES\n($g' | sed 's$),($),\n($g' > mydatabase.sql

git fetch origin master
git merge origin/master
git add mydatabase.sql
git commit -m "Daily backup."
git push origin master
The result is a sql file INSERT command format that looks like:

(r1c1value, r1c2value, r1c3value),
(r2c1value, r2c2value, r2c3value),
(r3c1value, r3c2value, r3c3value);
Some notes:

password on the command line ... I know, not secure, different discussion.
--opt: Among other things, turns on the --extended-insert option (i.e., one INSERT per table).
--skip-dump-date: mysqldump normally puts a date/time stamp in the sql file when created. This can become annoying in source control when the only delta between versions is that date/time stamp. The OS and source control system will date/time stamp the file and version. Its not really needed in the sql file.
The git commands are not central to the fundamental question (formatting the sql file), but shows how I get my sql file back into source control, something similar can be done with svn. When combining this sql file format with your source control of choice, you will find that when your users update their working copies, they only need to move the deltas (i.e., changed records) across the internet, and they can take advantage of diff utilities to easily see what records in the database have changed.
If you're dumping a database that resides on a remote server, if possible, run this script on that server to avoid pushing the entire contents of the database across the network with each dump.
If possible, establish a working source control repository for your sql files on the same server you are running this script from; check them into the repository from there. This will also help prevent having to push the entire database across the network with every dump.