UTF8, PHP and MySQL
Everyone else probably already knows this stuff, but I hit an issue today to that took a while to sort out. Fortunately, some kind folks on IRC helped me, but as it's embarrassing to ask for help on the same issue twice, I'm writing down what I've learned!
The problem
Get a £ character stored to MySQL, retrieved and then displayed without any weird characters in front of it using UTF8.
The solution
Make sure that you are using UTF8 everywhere!
The browser:
<?php header("Content-type: text/html; charset=utf-8"); ?>
You can also use a meta tag that is redundant in theory:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Also, note that the the <form> element has an 'accept-charset' attribute which should also be set:
<form accept-charset="utf-8" ...>
Mysql:
Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.
And here's the really important bit: make sure your client connection is also using UTF-8:
For mysql:
mysql_set_charset('utf8');
or for mysqli:
mysqli_set_charset('utf8');
or execute the SQL immediately after connection:
SET NAMES UTF8;
or for PDO:
$handle = new PDO("mysql:host=localhost;dbname=dbname", 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
or for Zend_Db:
$params = array( 'host' => 'localhost', 'username' => 'username', 'password' => 'password', 'dbname' => 'dbname', 'driver_options' => array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;'); ); $db = Zend_Db::factory('PDO_MYSQL', $params);
Note that in PHP 5.3.0 and 5.3.1, you cannot use the PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;' option for PDO as it doesn't work! See bug 47224 for details.
Now everything works as expected!
(as long as you don't have an output filter on your view that's too clever for its own good...)

March 18th, 2009 at 20:46 #
Hi Rob!
Believe it or not, there's one more thing that can ruin the fun (learned the hard way): Apache's AddDefaultCharset http://httpd.apache.org/docs/2.0/mod/core.html#AddDefaultCharset
I know it's disabled on most servers, but I had the "luck" to come across on a server which had it set to Latin-1 (ISO-8859-1), overwriting every other setting.
Thought to mention it :)
Cheers,
Robert
March 18th, 2009 at 20:50 #
Ah yeah - that didn't come up today, but I've come across that one before too.
Thanks for commenting - no doubt it'll help me when I come back here in 6 months time :)
Regards,
Rob...
March 18th, 2009 at 22:15 #
Rob, in ZF i use MySQLi driver and i set my connection charset in this way:
$dbAdapterMySQLi->getConnection()->set_charset("utf8");
:)
March 19th, 2009 at 06:22 #
How to configure this for Zend_Db if I use config.ini file?
March 19th, 2009 at 06:57 #
Diego,
Thanks for the tip :)
Kreoton,
untested, but try:
db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"Regards,
Rob...
(Update: don't use the above!)
March 19th, 2009 at 08:43 #
A couple of slightly different methods here for Zend_DB
http://zfsnippets.com/snippets/view/id/13
March 19th, 2009 at 10:18 #
Kreoton & Rob:
I don't think
db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"
will work. I'm using
db.params.driver_options.1002 = "SET NAMES utf8"
and that's working like a charm (1002 is the integer value for PDO::MYSQL_ATTR_INIT_COMMAND).
Best regards
Stefan
March 19th, 2009 at 10:39 #
@4,7
db.params.driver_options.1002 = "SET NAMES utf8"
will indeed work. I don't think there's any way to use the constants.
When dealing with utf8 you may also need to set the encoding of your script files, especially if the £ is in that file, and not the database.
March 19th, 2009 at 11:08 #
Stefan and David,
Well, it was a complete guess :)
I trust your information as you've actually used it!
Regards,
Rob...
March 19th, 2009 at 11:54 #
I don't think I've met a web developer who hasn't had encoding issues. It's good to have all these points listed.
I just want to know why all the latest software releases aren't defaulting to utf-8?!
March 19th, 2009 at 12:19 #
I'm sure there are many reasons. I know that if utf8 had been the default when I started programming (and using iso-8859-1 for my webpages) it would have caused problems. Probably for compatibility reasons like these
March 19th, 2009 at 13:00 #
Good reference post! Bookmarked in case I ever need this =)
March 19th, 2009 at 13:57 #
Don't forget to use UTF-8 compatible functions in PHP for string comparison.
Most string operations won't work for UTf-8 encoded strings because a utf-8 char is 4 bytes where ascii is 1 byte.
March 19th, 2009 at 14:24 #
Thanks for Rob, Stefan Gehrig and David Caunt I would test this when I back home.
March 19th, 2009 at 16:13 #
bookmarked too, thanks.
March 20th, 2009 at 08:39 #
Don't forget to save your PHP-Files in UTF-8, in case you put your content directly in your PHP-Files.
March 21st, 2009 at 11:03 #
Some more:
- do not write BOM (Byte Order Mark) when saving your PHP files in UTF-8
- make shure you are using multibyte aware functions (mb_...)
- make shure your regular expression use the unicode modifier, and that the input to the expression is a valid utf-8 sequence
If you have access to the servers configuration files, you can tell apache to append the correct header to your scripts and tell MySQL to use utf8 for all the connections.
March 21st, 2009 at 11:33 #
Good tips, thanks.
Rob...
March 23rd, 2009 at 02:02 #
Also a really important article to read if you want to convert your database character set correctly (And avoid the wrong how-to's on the web):
http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/
March 23rd, 2009 at 12:58 #
If you work with Zend Framework, MySQLi as Driver and load your config out of an INI file you can use it this way:
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
OR according to the example above
$params = array(
'host' => 'localhost',
'username' => 'username',
'password' => 'password',
'dbname' => 'dbname',
'driver_options' => array(MYSQLI_INIT_COMMAND => 'SET NAMES UTF8;');
);
$db = Zend_Db::factory('Mysqli', $params);
March 24th, 2009 at 04:48 #
How come this is the first time I hear about this?
Dammit I've been writing é replace functions for years!!
April 7th, 2009 at 11:11 #
I'm not sure #20 is correct, I don't think you can use PHP constants in Ini Config files.
Simply use
database.params.driver_options.1002 = "SET NAMES utf8"
rather than
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
1002 is the value of MYSQLI_INIT_COMMAND
April 8th, 2009 at 05:44 #
David,
You can use PHP constants in the value part of an INI file, but I agree that I don't think you can you use them in the key part.
Regards,
Rob...
April 9th, 2009 at 10:16 #
Yup. This is the first problem, that every Czech programmer faces, when he learns PHP :) Coz Windows uses crazy Windows-1250 encoding.
$db->query('SET NAMES UTF8'); also works for Zend_Db
Also you should use "locale" collation (i.e.: utf8_czech_ci for czech)... otherwise it sorts chars based on their binary value rather then natural language sorting.
April 13th, 2009 at 01:19 #
Hello,
To me
=====
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
=====
doesn't work, but database.params.driver_options.1002 = "SET NAMES utf8" is working perfectly ;-)
April 23rd, 2009 at 21:11 #
Many thanks! mysql_set_charset("utf-8"); worked for me.
May 8th, 2009 at 12:27 #
David (#22), you're mixing up two things. As Salz` in #20 said, MYSQLI_INIT_COMMAND would be the constant to use for the *mysqli* extension. Its numerical value is 3 in contrast with PDO::MYSQL_ATTR_INIT_COMMAND which you should use with PDO and has a numerical value of 1002. But that mysqli constant itself actually also works. Maybe the only reason the PDO constant doesn't work is because of the colons?
In any case, you won't need either of those anymore as of Zend Framework 1.8 at least. You can use:
database.params.charset = utf8
May 8th, 2009 at 13:25 #
#27 - Correct, I was confused. 1.8 is king :)
May 19th, 2009 at 06:18 #
thanks to your site, I fixed encoding on mine. wonderful.
May 22nd, 2009 at 20:16 #
Response to: #22,#23,#25
=====
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
=====
Works perfekt, cause of line 303->313 in http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Db/Adapter/Mysqli.php
especially:
====
$option = @constant(strtoupper($option));
====
Oh, and this is true since many version of Zend Framework.
August 12th, 2009 at 10:33 #
I don't know why but
;resources.db.params.driver_options.1002 = "SET NAMES UTF8;"
works for my fine but
database.params.charset = utf8
NOT
Do you know why?
October 2nd, 2009 at 07:18 #
In newer Zend Framework just use
database.params.charset = utf8
to set the charset in the ini file.
November 5th, 2009 at 17:13 #
greg606 You must be on #ZF prior to version 1.8
November 23rd, 2009 at 00:46 #
Thank You!! It was so frustrating.
February 5th, 2010 at 11:27 #
Thank you all!!!
In my case, the line
database.params.charset = utf8
worked perfect