UTF8, PHP and MySQL

Everyone else probably already knows this stuff, but I hit an issue today to that took a while to sort out. Fortunately, some kind folks on IRC helped me, but as it's embarrassing to ask for help on the same issue twice, I'm writing down what I've learned!

The problem

Get a £ character stored to MySQL, retrieved and then displayed without any weird characters in front of it using UTF8.

The solution

Make sure that you are using UTF8 everywhere!

The browser:


<?php header("Content-type: text/html; charset=utf-8"); ?>

You can also use a meta tag that is redundant in theory:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Also, note that the the <form> element has an 'accept-charset' attribute which should also be set:

<form accept-charset="utf-8" ...>

MySQL:

Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.

And here's the really important bit: make sure your client connection is also using UTF-8:

For mysql:

mysql_set_charset('utf8');

or for mysqli:

mysqli_set_charset('utf8');

or execute the SQL immediately after connection:

SET NAMES UTF8;

or for PDO:

$handle = new PDO("mysql:host=localhost;dbname=dbname",
'username', 'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));

or for Zend_Db:

$params = array(
'host' => 'localhost',
'username' => 'username',
'password' => 'password',
'dbname' => 'dbname',
'driver_options' => array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;');
);
$db = Zend_Db::factory('PDO_MYSQL', $params);

Note that in PHP 5.3.0 and 5.3.1, you cannot use the PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;' option for PDO as it doesn't work! See bug 47224 for details.

For a Zend Framework application that uses Zend_Application, add this to your ini file:
resources.db.params.charset = utf8

Now everything works as expected!

(as long as you don't have an output filter on your view that's too clever for its own good...)

Also, read About using UTF-8 fields in MySQL by Joshua Thijssen.

46 Responses to “UTF8, PHP and MySQL”

  1. 1 Robert

    Hi Rob!

    Believe it or not, there's one more thing that can ruin the fun (learned the hard way): Apache's AddDefaultCharset http://httpd.apache.org/docs/2.0/mod/core.html#AddDefaultCharset
    I know it's disabled on most servers, but I had the "luck" to come across on a server which had it set to Latin-1 (ISO-8859-1), overwriting every other setting.

    Thought to mention it :)

    Cheers,
    Robert

  2. 2 Rob...

    Ah yeah - that didn't come up today, but I've come across that one before too.

    Thanks for commenting - no doubt it'll help me when I come back here in 6 months time :)

    Regards,

    Rob...

  3. 3 Diego Henrique

    Rob, in ZF i use MySQLi driver and i set my connection charset in this way:

    $dbAdapterMySQLi->getConnection()->set_charset("utf8");

    :)

  4. 4 kreoton

    How to configure this for Zend_Db if I use config.ini file?

  5. 5 Rob...

    Diego,

    Thanks for the tip :)

    Kreoton,

    untested, but try:
    db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"

    Regards,

    Rob...

    (Update: don't use the above!)

  6. 6 Dave Marshall

    A couple of slightly different methods here for Zend_DB

    http://zfsnippets.com/snippets/view/id/13

  7. 7 Stefan Gehrig

    Kreoton & Rob:

    I don't think

    db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"

    will work. I'm using

    db.params.driver_options.1002 = "SET NAMES utf8"

    and that's working like a charm (1002 is the integer value for PDO::MYSQL_ATTR_INIT_COMMAND).

    Best regards

    Stefan

  8. 8 David Caunt

    @4,7

    db.params.driver_options.1002 = "SET NAMES utf8"

    will indeed work. I don't think there's any way to use the constants.

    When dealing with utf8 you may also need to set the encoding of your script files, especially if the £ is in that file, and not the database.

  9. 9 Rob...

    Stefan and David,

    Well, it was a complete guess :)

    I trust your information as you've actually used it!

    Regards,

    Rob...

  10. 10 Kieran Hall

    I don't think I've met a web developer who hasn't had encoding issues. It's good to have all these points listed.

    I just want to know why all the latest software releases aren't defaulting to utf-8?!

  11. 11 David Caunt

    I'm sure there are many reasons. I know that if utf8 had been the default when I started programming (and using iso-8859-1 for my webpages) it would have caused problems. Probably for compatibility reasons like these

  12. 12 Jani Hartikainen

    Good reference post! Bookmarked in case I ever need this =)

  13. 13 Tom

    Don't forget to use UTF-8 compatible functions in PHP for string comparison.

    Most string operations won't work for UTf-8 encoded strings because a utf-8 char is 4 bytes where ascii is 1 byte.

  14. 14 kreoton

    Thanks for Rob, Stefan Gehrig and David Caunt I would test this when I back home.

  15. 15 Gerard Roche

    bookmarked too, thanks.

  16. 16 nasumi

    Don't forget to save your PHP-Files in UTF-8, in case you put your content directly in your PHP-Files.

  17. 17 Goran Jurić

    Some more:

    - do not write BOM (Byte Order Mark) when saving your PHP files in UTF-8
    - make shure you are using multibyte aware functions (mb_...)
    - make shure your regular expression use the unicode modifier, and that the input to the expression is a valid utf-8 sequence

    If you have access to the servers configuration files, you can tell apache to append the correct header to your scripts and tell MySQL to use utf8 for all the connections.

  18. 18 Rob...

    Good tips, thanks.

    Rob...

  19. 19 idont

    Also a really important article to read if you want to convert your database character set correctly (And avoid the wrong how-to's on the web):

    http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/

  20. 20 Salz`

    If you work with Zend Framework, MySQLi as Driver and load your config out of an INI file you can use it this way:

    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"

    OR according to the example above

    $params = array(
    'host' => 'localhost',
    'username' => 'username',
    'password' => 'password',
    'dbname' => 'dbname',
    'driver_options' => array(MYSQLI_INIT_COMMAND => 'SET NAMES UTF8;');
    );
    $db = Zend_Db::factory('Mysqli', $params);

  21. 21 Dave

    How come this is the first time I hear about this?

    Dammit I've been writing é replace functions for years!!

  22. 22 David Caunt

    I'm not sure #20 is correct, I don't think you can use PHP constants in Ini Config files.

    Simply use

    database.params.driver_options.1002 = "SET NAMES utf8"

    rather than

    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"

    1002 is the value of MYSQLI_INIT_COMMAND

  23. 23 Rob...

    David,

    You can use PHP constants in the value part of an INI file, but I agree that I don't think you can you use them in the key part.

    Regards,

    Rob...

  24. 24 Tomas Fejfar

    Yup. This is the first problem, that every Czech programmer faces, when he learns PHP :) Coz Windows uses crazy Windows-1250 encoding.

    $db->query('SET NAMES UTF8'); also works for Zend_Db

    Also you should use "locale" collation (i.e.: utf8_czech_ci for czech)... otherwise it sorts chars based on their binary value rather then natural language sorting.

  25. 25 BEREGU

    Hello,

    To me
    =====
    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
    =====
    doesn't work, but database.params.driver_options.1002 = "SET NAMES utf8" is working perfectly ;-)

  26. 26 Cygal

    Many thanks! mysql_set_charset("utf-8"); worked for me.

  27. 27 Jorrit

    David (#22), you're mixing up two things. As Salz` in #20 said, MYSQLI_INIT_COMMAND would be the constant to use for the *mysqli* extension. Its numerical value is 3 in contrast with PDO::MYSQL_ATTR_INIT_COMMAND which you should use with PDO and has a numerical value of 1002. But that mysqli constant itself actually also works. Maybe the only reason the PDO constant doesn't work is because of the colons?

    In any case, you won't need either of those anymore as of Zend Framework 1.8 at least. You can use:

    database.params.charset = utf8

  28. 28 David

    #27 - Correct, I was confused. 1.8 is king :)

  29. 29 zombux

    thanks to your site, I fixed encoding on mine. wonderful.

  30. 30 Salz`

    Response to: #22,#23,#25

    =====
    database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
    =====

    Works perfekt, cause of line 303->313 in http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Db/Adapter/Mysqli.php

    especially:
    ====
    $option = @constant(strtoupper($option));
    ====

    Oh, and this is true since many version of Zend Framework.

  31. 31 greg606

    I don't know why but
    ;resources.db.params.driver_options.1002 = "SET NAMES UTF8;"
    works for my fine but
    database.params.charset = utf8
    NOT
    Do you know why?

  32. 32 Danne

    In newer Zend Framework just use

    database.params.charset = utf8

    to set the charset in the ini file.

  33. 33 Joe Devon

    greg606 You must be on #ZF prior to version 1.8

  34. 34 Kevin

    Thank You!! It was so frustrating.

  35. 35 Jose

    Thank you all!!!

    In my case, the line
    database.params.charset = utf8
    worked perfect

  36. 36 Matt

    You rock, just saying, cause you saved me hours of trying to figure out where I was going wrong :)

  37. 37 dan

    thanks I was having problems with Ñs

  38. 38 Zac Imboden

    Your tip about mysql_set_charset('utf8') just transformed my workflow. None of all three of my otherwise excellent O'Reilly manuals for PHP (PHP and MySQL, PHP Cookbook, or Programming PHP) even mention this function.
    I thought I was doing everything correctly but the one missing piece was that PEAR was not set up to handle UTF-8 encoding.
    So simple!

  39. 39 Jean

    Thanks for this tip.

  40. 40 Pascal

    It works like a charm, Thanks Rob, thanks everybody!

  41. 41 Esteban

    Beware when using:
    resources.db.params.charset
    and
    resources.db.params.driver_options.1002

    When you have the first, the second is overwritten when calling _connect. That sucks and it's most likely an ugly bug, as PDO::MYSQL_ATTR_INIT_COMMAND can be used for more than just setting the charset (which is my case)

  42. 42 indrek

    Hi,

    Thank you so much for this post. I have spent almost all day trying to figure out how to get correct Lithuanian characters from my MySQL database correctly showing on my php website. Finally, I only needed to add this extra line after making the connection to database:

    mysql_set_charset('utf8');

    Thanks again, you saved a lot of my time. ;)

  43. 43 bassgun

    Hi Rob,
    Today I started a new project and i had to deal with the same issue. My short solution is indeed:

    $view->headMeta()->appendHttpEquiv('Content-Type', 'text/html;charset=utf-8');
    (obviously you need to echo $this->headMeta() in your layout.phtml)

    This is the only place in my code where I define utf-8 charset encoding and database saving is totally OK. And it saves me lots of typing and thus headaches ;)

    greetz from Holland

  44. 44 Mathieu

    Don't forget to save your files according to the charset you want to user either ;-)

    It's a common mistake among beginners...

    http://web-o-blog.blogspot.com/2011/04/utf-8-with-phpmysql.html

  45. 45 StevenF

    I've been looking for this answer for a few weeks now. I tried the php header() like you pointed out and it fixed the problem.

    I appreciate this concise blog!

  46. 46 John D

    Nice post, some real gems in here. Is there a PostgreSQL equivalent of PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")?

The views expressed in these comments are not the views of the publisher. However, we believe in the rights of others to express their legitimate views and concerns. Any legitimate complaint emailed to rob@akrabat.com will be seriously considered and the post reviewed as desirable and necessary.