MySQL database migration: latin1 to utf8 conversion

Until version 4.1, MySQL tables were encoded with the latin1 character set. By default, the character set is now utf8. This is a good thing in terms of non-latin character support, but if you're upgrading from an older database you may run into a lot of character encoding problems.

I've recently migrated several older databases from 4.0 to MySQL 5 and converted a few tables from latin1 to utf8, and I've put together a few tricks that might help you through the same. So far, I've basically run into three main issues: converting a table from the latin1 charset to utf8, upgrading a whole database set that was properly exported with mysqldump, and (the worst) migrating an entire set of database's MYI and MYD files that weren't properly exported.

Converting a latin1 table to utf8
If you've already migrated successfully to the new MySQL version, but you have an older latin1 encoded table and you simply want to change it's character set to utf8, it's a fairly simple exercise. First, export the table with mysqldump:

mysqldump -u username -p database --default-character-set=latin1 table > tableoutput.sql

First make a back up of that file and just edit it to adjust two things. Find the line that contains "SET NAMES latin1" and change that to "SET NAMES utf8". Then, look at the table definition and change "DEFAULT CHARSET=latin1" to "DEFAULT CHARSET=utf8". Finally, reimport the table:

cat tableoutput.sql | mysql --default-character-set=utf8 -u username -p database

When it's imported back in, it will properly be created as a utf8 table and you shouldn't have any character encoding problems. You can do this for a whole database with the same technique and using a search and replace to switch latin1 to utf8.

Upgrading a whole database set that was properly exported with mysqldump
This is a pretty simple scenario. You've exported your entire database from the previous MySQL version with something like this:

mysqldump --default-character-set=latin1 -u username -p database > dboutput.sql

You can then import it either as a latin1 table, or convert it to utf8 as we did above. Just make sure to adjust the "SET NAMES" and "CHARACTER SET" values appropriately, and set the "--default-character-set" parameter to utf8 or latin1, as appropriate.

Note that the default character set is now utf8 unless you change it in the my.cnf file. I've been on systems where it's been set to latin1 for compatability reasons, however, so you can't just assume this. Make sure that all of your database code (in your php, perl, whatever) issues the "SET NAMES utf8" SQL statement before issuing any further SQL commands. This will ensure that what you set in INSERTs and receive in SELECTs isn't mis-encoded. If you are using a legacy latin1 table, make sure you do the same, but with the latin1 setting.

Migrating an entire set of database's MYI and MYD files that weren't properly exported
I was a bonehead the other day and didn't properly export one of my server's databases before upgrading it to MySQL 5. After upgrading, all of my databases were corrupted, including the mysql database, meaning I couldn't even log in. The mysql_upgrade command didn't fix things properly, and it was all because MySQL assumed my tables from that older version were encoded in utf8.

It was, like, a bummer.

I don't know if there is an easier way, but this is how I managed to transition all those MYI and MYD files to the new database, without reverting to the older version and exporting properly.

First, you need to change the mysql config so that when it loads your old databases it assumes they are latin1 encoded. To do this, edit the my.cnf file. Find all the lines that say "default-character-set=utf8" and change them all to say "default-character-set=latin1".

Start up mysqld and have it ignore permissions:

mysqld_safe --skip-grant-tables &

Now, run "mysql_upgrade". When it finishes, do a SELECT on the mysql.user table. You should see that usernames and encoded passwords haven't been truncated in half and replaced with gobbledeguk.

Kill and restart the mysql server. You should be able to log in, meaning the mysql users table is, in fact, uncorrupted. You'll find that all your other tables are fine as well. One thing that you might notice is that the mysql_upgrade script will have updated the mysql databases to use utf8. All of your other databases are still in latin1, however, so you'll need to convert them to utf8 (if you desire) using the instructions above. Also, if you don't require legacy support for older applications, you should probably go back to the my.cnf file and change the default character set back to utf8 for everything.

Your thoughts
From this point forward, everyone should be using utf8 across the board, but it seems like making the jump is more painful than it should be. If you know any good MySQL character encoding tips and tricks, help out your fellow hacker. Share them in comments!


Recent Entries

Comments

Oldest comments listed first.

Posted by: headache on December 18, 2007 at 7:33 AM

Very useful compilation of encoding tricks. I have been searching all over google for solution,however this article solved my problems. Keep the good work.


Posted by: dalu on May 12, 2008 at 4:57 AM

Do you remember if you had databases with different charsets?

like hebrew, chinese, german
or were all your databases in one charset?


Posted by: Lior Iluz on September 5, 2008 at 9:24 PM

I think you are the first to really point out the solution. excellent, thank you.


Posted by: JJ on September 17, 2008 at 11:23 PM

That saved my live.


Posted by: CombatWombat on October 28, 2008 at 2:53 AM

I have just completed this process, with a slightly different set of steps, including a SED script to whack all those weird characters back into line, and the necessary adjustments for your MYSQL server.
http://combatwombat.7doves.com/2008/10/26/mysql-latin1-to-utf8-issues


Leave a comment


Subscribe to MAKE!Subscribe to MAKE Magazine!

Subscribe today, save 42% and get web access to MAKE free. MAKE Digital Edition is available only to subscribers.

$34.95 / 1 year
(4 Quarterly Issues)

Subscribe now


Void your warranty, violate a user agreement, fry a circuit, blow a fuse, poke an eye out. Make: The risk-takers, the doers, the makers of things... Welcome to Make: Online!


CRAFT Maker Shed Maker Faire MAKE television
Holiday Gift Guides from MAKE
Gifts for Dads
Science and Chemistry
Gifts Under $20
More guides: Santa Claus Machines, Geek Toys for Grown Up Girls & Boys


Check out all of the episodes of Make: television

Alex Rider Dream Gadget Contest
Make: Science Room

Connect with MAKE

Be a MAKE fan on Facebook MAKE on Facebook
Visit our Facebook page and become a fan of MAKE!
MAKE on Twitter MAKE on Twitter
Follow our MAKE tweets!
MAKE Flickr Pool MAKE on Flickr
Join our MAKE Flickr Pool!
    make_tips on Twitter




    Maker SHED

    Advertise here with FM.

    Why advertise on MAKE?
    Read what folks are saying about us!

    Click here to advertise on MAKE!



    Subscribe to MAKE Magazine!

    Make: Online authors!

    Gareth BranwynGareth Branwyn
    Senior Editor


    Phillip TorronePhillip Torrone
    Senior Editor
    | AIM | Twitter


    Becky SternBecky Stern
    Associate Editor
    | AIM | Twitter


    Marc de VinckMarc de Vinck
    Contributing Writer
    | AIM | Twitter


    John ParkJohn Park
    Contributing Writer
    | Twitter


    Sean RaganSean Ragan
    Contributing Writer
    | Twitter


    Matt MetsMatt Mets
    Contributing Writer
    | AIM | Twitter


    Dale DoughertyDale Dougherty
    Editor & Publisher
    | Twitter


    Shawn ConnallyShawn Connally
    Managing Editor
    | Twitter


    Goli MohammadiGoli Mohammadi
    Associate Managing Editor

    Kip KayKip Kay
    Weekend Projects
    | AIM | Twitter


    Collin CunninghamCollin Cunningham
    Contributing Writer
    | AIM | Twitter

    Adam FlahertyAdam Flaherty
    Contributing Writer
    | AIM | Twitter



    More contributors: Mark Frauenfelder (Editor-in-Chief, MAKE magazine), Kipp Bradford (Technical Consultant/Writer), Chris Connors (Education), Diana Eng (Guest Author), Peter Horvath (Intern), Brian Jepson (O'Reilly Media), Robert Bruce Thompson (Science Room)

    Suggest a Site!

    Current Podcast

    itunesdl.gif Weekend Project: Beetlebot Simple robot from your parts bin that avoids obstacles. Thanks go to Jerome Demers for the original article in MAKE, Volume 12. To download the Beetlebot video, click here or subscribe in iTunes. Check out the complete Beetlebot article... More...

    Get the Make: Online sent via email
    Enter your email to receive Make: Online each day:



    MAKE Fascination video series brought to you by Dow

    Make: Education
    MAKE: en EspaƱol MAKE: Japan
    Important please read


    Subscribe to MAKE Magazine!

    Recent Posts from the Craft: Blog