Pragmatism in the real world

UTF-8, PHP and MySQL

Everyone else probably already knows this stuff, but I hit an issue today to that took a while to sort out. Fortunately, some kind folks on IRC helped me, but as it’s embarrassing to ask for help on the same issue twice, I’m writing down what I’ve learned!

The problem

Get a £ character stored to MySQL, retrieved and then displayed without any weird characters in front of it using UTF-8.

The solution

Make sure that you are using UTF-8 everywhere!

The browser:

<?php header("Content-type: text/html; charset=utf-8"); ?>

You can also use a meta tag that is redundant in theory:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Also, note that the the <form> element has an 'accept-charset' attribute which should also be set:

<form accept-charset="utf-8" ...>

MySQL:

Make sure that your table’s collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.

And here’s the really important bit: make sure your client connection is also using UTF-8:

For mysql:

mysql_set_charset('utf8');

or for mysqli:

mysqli_set_charset('utf8');

or execute the SQL immediately after connection:

SET NAMES UTF8;

or for PDO:

$handle = new PDO("mysql:host=localhost;dbname=dbname",
    'username', 'password', 
    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));

or for Zend_Db:

$params = array(
    'host' => 'localhost',
    'username' => 'username',
    'password' => 'password',
    'dbname' => 'dbname',
    'driver_options' => array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;');
);
$db = Zend_Db::factory('PDO_MYSQL', $params);

Note that in PHP 5.3.0 and 5.3.1, you cannot use the PDO::MYSQL_ATTR_INIT_COMMAND => ‘SET NAMES UTF8;’ option for PDO as it doesn’t work! See bug 47224 for details.

For a Zend Framework application that uses Zend_Application, add this to your ini file:

resources.db.params.charset = utf8

Now everything works as expected!

(as long as you don’t have an output filter on your view that’s too clever for its own good…)

Also, read About using UTF-8 fields in MySQL by Joshua Thijssen.

47 thoughts on “UTF-8, PHP and MySQL

  1. Ah yeah – that didn't come up today, but I've come across that one before too.

    Thanks for commenting – no doubt it'll help me when I come back here in 6 months time :)

    Regards,

    Rob…

  2. Diego,

    Thanks for the tip :)

    Kreoton,

    untested, but try:
    db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"

    Regards,

    Rob…

    (Update: don't use the above!)

  3. Kreoton & Rob:

    I don't think

    db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"

    will work. I'm using

    db.params.driver_options.1002 = "SET NAMES utf8"

    and that's working like a charm (1002 is the integer value for PDO::MYSQL_ATTR_INIT_COMMAND).

    Best regards

    Stefan

  4. @4,7

    db.params.driver_options.1002 = "SET NAMES utf8"

    will indeed work. I don't think there's any way to use the constants.

    When dealing with utf8 you may also need to set the encoding of your script files, especially if the £ is in that file, and not the database.

  5. Stefan and David,

    Well, it was a complete guess :)

    I trust your information as you've actually used it!

    Regards,

    Rob…

  6. I don't think I've met a web developer who hasn't had encoding issues. It's good to have all these points listed.

    I just want to know why all the latest software releases aren't defaulting to utf-8?!

  7. I'm sure there are many reasons. I know that if utf8 had been the default when I started programming (and using iso-8859-1 for my webpages) it would have caused problems. Probably for compatibility reasons like these

  8. Don't forget to use UTF-8 compatible functions in PHP for string comparison.

    Most string operations won't work for UTf-8 encoded strings because a utf-8 char is 4 bytes where ascii is 1 byte.

  9. Some more:

    – do not write BOM (Byte Order Mark) when saving your PHP files in UTF-8
    – make shure you are using multibyte aware functions (mb_…)
    – make shure your regular expression use the unicode modifier, and that the input to the expression is a valid utf-8 sequence

    If you have access to the servers configuration files, you can tell apache to append the correct header to your scripts and tell MySQL to use utf8 for all the connections.

  10. If you work with Zend Framework, MySQLi as Driver and load your config out of an INI file you can use it this way:

    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"

    OR according to the example above

    $params = array(
    'host' => 'localhost',
    'username' => 'username',
    'password' => 'password',
    'dbname' => 'dbname',
    'driver_options' => array(MYSQLI_INIT_COMMAND => 'SET NAMES UTF8;');
    );
    $db = Zend_Db::factory('Mysqli', $params);

  11. How come this is the first time I hear about this?

    Dammit I've been writing é replace functions for years!!

  12. I'm not sure #20 is correct, I don't think you can use PHP constants in Ini Config files.

    Simply use

    database.params.driver_options.1002 = "SET NAMES utf8"

    rather than

    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"

    1002 is the value of MYSQLI_INIT_COMMAND

  13. David,

    You can use PHP constants in the value part of an INI file, but I agree that I don't think you can you use them in the key part.

    Regards,

    Rob…

  14. Yup. This is the first problem, that every Czech programmer faces, when he learns PHP :) Coz Windows uses crazy Windows-1250 encoding.

    $db->query('SET NAMES UTF8'); also works for Zend_Db

    Also you should use "locale" collation (i.e.: utf8_czech_ci for czech)… otherwise it sorts chars based on their binary value rather then natural language sorting.

  15. Hello,

    To me
    =====
    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
    =====
    doesn't work, but database.params.driver_options.1002 = "SET NAMES utf8" is working perfectly ;-)

  16. David (#22), you're mixing up two things. As Salz` in #20 said, MYSQLI_INIT_COMMAND would be the constant to use for the *mysqli* extension. Its numerical value is 3 in contrast with PDO::MYSQL_ATTR_INIT_COMMAND which you should use with PDO and has a numerical value of 1002. But that mysqli constant itself actually also works. Maybe the only reason the PDO constant doesn't work is because of the colons?

    In any case, you won't need either of those anymore as of Zend Framework 1.8 at least. You can use:

    database.params.charset = utf8

  17. I don't know why but
    ;resources.db.params.driver_options.1002 = "SET NAMES UTF8;"
    works for my fine but
    database.params.charset = utf8
    NOT
    Do you know why?

  18. You rock, just saying, cause you saved me hours of trying to figure out where I was going wrong :)

  19. Your tip about mysql_set_charset('utf8') just transformed my workflow. None of all three of my otherwise excellent O'Reilly manuals for PHP (PHP and MySQL, PHP Cookbook, or Programming PHP) even mention this function.
    I thought I was doing everything correctly but the one missing piece was that PEAR was not set up to handle UTF-8 encoding.
    So simple!

  20. Beware when using:
    resources.db.params.charset
    and
    resources.db.params.driver_options.1002

    When you have the first, the second is overwritten when calling _connect. That sucks and it's most likely an ugly bug, as PDO::MYSQL_ATTR_INIT_COMMAND can be used for more than just setting the charset (which is my case)

  21. Hi,

    Thank you so much for this post. I have spent almost all day trying to figure out how to get correct Lithuanian characters from my MySQL database correctly showing on my php website. Finally, I only needed to add this extra line after making the connection to database:

    mysql_set_charset('utf8');

    Thanks again, you saved a lot of my time. ;)

  22. Hi Rob,
    Today I started a new project and i had to deal with the same issue. My short solution is indeed:

    $view->headMeta()->appendHttpEquiv('Content-Type', 'text/html;charset=utf-8');
    (obviously you need to echo $this->headMeta() in your layout.phtml)

    This is the only place in my code where I define utf-8 charset encoding and database saving is totally OK. And it saves me lots of typing and thus headaches ;)

    greetz from Holland

  23. I've been looking for this answer for a few weeks now. I tried the php header() like you pointed out and it fixed the problem.

    I appreciate this concise blog!

  24. Nice post, some real gems in here. Is there a PostgreSQL equivalent of PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")?

  25. Although this problem should be ancient history, legacy code keeps it alive…

    If you want to check what character set the browser is actually reporting, enter document.characterSet into the console. It turns out that header("Content-type: text/html; charset=utf-8") was the only way to sort out my "windows-1252" problem.

Comments are closed.