UTF-8, PHP and MySQL
Everyone else probably already knows this stuff, but I hit an issue today to that took a while to sort out. Fortunately, some kind folks on IRC helped me, but as it’s embarrassing to ask for help on the same issue twice, I’m writing down what I’ve learned!
The problem
Get a £ character stored to MySQL, retrieved and then displayed without any weird characters in front of it using UTF-8.
The solution
Make sure that you are using UTF-8 everywhere!
The browser:
<?php header("Content-type: text/html; charset=utf-8"); ?>
You can also use a meta tag that is redundant in theory:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Also, note that the the <form> element has an 'accept-charset' attribute which should also be set:
<form accept-charset="utf-8" ...>
MySQL:
Make sure that your table’s collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.
And here’s the really important bit: make sure your client connection is also using UTF-8:
For mysql:
mysql_set_charset('utf8');
or for mysqli:
mysqli_set_charset('utf8');
or execute the SQL immediately after connection:
SET NAMES UTF8;
or for PDO:
$handle = new PDO("mysql:host=localhost;dbname=dbname", 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
or for Zend_Db:
$params = array( 'host' => 'localhost', 'username' => 'username', 'password' => 'password', 'dbname' => 'dbname', 'driver_options' => array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;'); ); $db = Zend_Db::factory('PDO_MYSQL', $params);
Note that in PHP 5.3.0 and 5.3.1, you cannot use the PDO::MYSQL_ATTR_INIT_COMMAND => ‘SET NAMES UTF8;’ option for PDO as it doesn’t work! See bug 47224 for details.
For a Zend Framework application that uses Zend_Application, add this to your ini file:
resources.db.params.charset = utf8
Now everything works as expected!
(as long as you don’t have an output filter on your view that’s too clever for its own good…)
Also, read About using UTF-8 fields in MySQL by Joshua Thijssen.
Hi Rob!
Believe it or not, there's one more thing that can ruin the fun (learned the hard way): Apache's AddDefaultCharset http://httpd.apache.org/docs/2.0/mod/core.html#AddDefaultCharset
I know it's disabled on most servers, but I had the "luck" to come across on a server which had it set to Latin-1 (ISO-8859-1), overwriting every other setting.
Thought to mention it :)
Cheers,
Robert
Ah yeah – that didn't come up today, but I've come across that one before too.
Thanks for commenting – no doubt it'll help me when I come back here in 6 months time :)
Regards,
Rob…
Rob, in ZF i use MySQLi driver and i set my connection charset in this way:
$dbAdapterMySQLi->getConnection()->set_charset("utf8");
:)
How to configure this for Zend_Db if I use config.ini file?
Diego,
Thanks for the tip :)
Kreoton,
untested, but try:
db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"Regards,
Rob…
(Update: don't use the above!)
A couple of slightly different methods here for Zend_DB
http://zfsnippets.com/snippets/view/id/13
Kreoton & Rob:
I don't think
db.params.driver_options.0.PDO::MYSQL_ATTR_INIT_COMMAND = "SET NAMES UTF8;"
will work. I'm using
db.params.driver_options.1002 = "SET NAMES utf8"
and that's working like a charm (1002 is the integer value for PDO::MYSQL_ATTR_INIT_COMMAND).
Best regards
Stefan
@4,7
db.params.driver_options.1002 = "SET NAMES utf8"
will indeed work. I don't think there's any way to use the constants.
When dealing with utf8 you may also need to set the encoding of your script files, especially if the £ is in that file, and not the database.
Stefan and David,
Well, it was a complete guess :)
I trust your information as you've actually used it!
Regards,
Rob…
I don't think I've met a web developer who hasn't had encoding issues. It's good to have all these points listed.
I just want to know why all the latest software releases aren't defaulting to utf-8?!
I'm sure there are many reasons. I know that if utf8 had been the default when I started programming (and using iso-8859-1 for my webpages) it would have caused problems. Probably for compatibility reasons like these
Good reference post! Bookmarked in case I ever need this =)
Don't forget to use UTF-8 compatible functions in PHP for string comparison.
Most string operations won't work for UTf-8 encoded strings because a utf-8 char is 4 bytes where ascii is 1 byte.
Thanks for Rob, Stefan Gehrig and David Caunt I would test this when I back home.
bookmarked too, thanks.
Don't forget to save your PHP-Files in UTF-8, in case you put your content directly in your PHP-Files.
Some more:
– do not write BOM (Byte Order Mark) when saving your PHP files in UTF-8
– make shure you are using multibyte aware functions (mb_…)
– make shure your regular expression use the unicode modifier, and that the input to the expression is a valid utf-8 sequence
If you have access to the servers configuration files, you can tell apache to append the correct header to your scripts and tell MySQL to use utf8 for all the connections.
Good tips, thanks.
Rob…
Also a really important article to read if you want to convert your database character set correctly (And avoid the wrong how-to's on the web):
http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/
If you work with Zend Framework, MySQLi as Driver and load your config out of an INI file you can use it this way:
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
OR according to the example above
$params = array(
'host' => 'localhost',
'username' => 'username',
'password' => 'password',
'dbname' => 'dbname',
'driver_options' => array(MYSQLI_INIT_COMMAND => 'SET NAMES UTF8;');
);
$db = Zend_Db::factory('Mysqli', $params);
How come this is the first time I hear about this?
Dammit I've been writing é replace functions for years!!
I'm not sure #20 is correct, I don't think you can use PHP constants in Ini Config files.
Simply use
database.params.driver_options.1002 = "SET NAMES utf8"
rather than
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
1002 is the value of MYSQLI_INIT_COMMAND
David,
You can use PHP constants in the value part of an INI file, but I agree that I don't think you can you use them in the key part.
Regards,
Rob…
Yup. This is the first problem, that every Czech programmer faces, when he learns PHP :) Coz Windows uses crazy Windows-1250 encoding.
$db->query('SET NAMES UTF8'); also works for Zend_Db
Also you should use "locale" collation (i.e.: utf8_czech_ci for czech)… otherwise it sorts chars based on their binary value rather then natural language sorting.
Hello,
To me
=====
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
=====
doesn't work, but database.params.driver_options.1002 = "SET NAMES utf8" is working perfectly ;-)
Many thanks! mysql_set_charset("utf-8"); worked for me.
David (#22), you're mixing up two things. As Salz` in #20 said, MYSQLI_INIT_COMMAND would be the constant to use for the *mysqli* extension. Its numerical value is 3 in contrast with PDO::MYSQL_ATTR_INIT_COMMAND which you should use with PDO and has a numerical value of 1002. But that mysqli constant itself actually also works. Maybe the only reason the PDO constant doesn't work is because of the colons?
In any case, you won't need either of those anymore as of Zend Framework 1.8 at least. You can use:
database.params.charset = utf8
#27 – Correct, I was confused. 1.8 is king :)
thanks to your site, I fixed encoding on mine. wonderful.
Response to: #22,#23,#25
=====
database.params.driver_options.MYSQLI_INIT_COMMAND = "SET NAMES utf8"
=====
Works perfekt, cause of line 303->313 in http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Db/Adapter/Mysqli.php
especially:
====
$option = @constant(strtoupper($option));
====
Oh, and this is true since many version of Zend Framework.
I don't know why but
;resources.db.params.driver_options.1002 = "SET NAMES UTF8;"
works for my fine but
database.params.charset = utf8
NOT
Do you know why?
In newer Zend Framework just use
database.params.charset = utf8
to set the charset in the ini file.
greg606 You must be on #ZF prior to version 1.8
Thank You!! It was so frustrating.
Thank you all!!!
In my case, the line
database.params.charset = utf8
worked perfect
You rock, just saying, cause you saved me hours of trying to figure out where I was going wrong :)
thanks I was having problems with Ñs
Your tip about mysql_set_charset('utf8') just transformed my workflow. None of all three of my otherwise excellent O'Reilly manuals for PHP (PHP and MySQL, PHP Cookbook, or Programming PHP) even mention this function.
I thought I was doing everything correctly but the one missing piece was that PEAR was not set up to handle UTF-8 encoding.
So simple!
Thanks for this tip.
It works like a charm, Thanks Rob, thanks everybody!
Beware when using:
resources.db.params.charset
and
resources.db.params.driver_options.1002
When you have the first, the second is overwritten when calling _connect. That sucks and it's most likely an ugly bug, as PDO::MYSQL_ATTR_INIT_COMMAND can be used for more than just setting the charset (which is my case)
Hi,
Thank you so much for this post. I have spent almost all day trying to figure out how to get correct Lithuanian characters from my MySQL database correctly showing on my php website. Finally, I only needed to add this extra line after making the connection to database:
mysql_set_charset('utf8');
Thanks again, you saved a lot of my time. ;)
Hi Rob,
Today I started a new project and i had to deal with the same issue. My short solution is indeed:
$view->headMeta()->appendHttpEquiv('Content-Type', 'text/html;charset=utf-8');
(obviously you need to echo $this->headMeta() in your layout.phtml)
This is the only place in my code where I define utf-8 charset encoding and database saving is totally OK. And it saves me lots of typing and thus headaches ;)
greetz from Holland
Don't forget to save your files according to the charset you want to user either ;-)
It's a common mistake among beginners…
http://web-o-blog.blogspot.com/2011/04/utf-8-with-phpmysql.html
I've been looking for this answer for a few weeks now. I tried the php header() like you pointed out and it fixed the problem.
I appreciate this concise blog!
Nice post, some real gems in here. Is there a PostgreSQL equivalent of PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")?
Although this problem should be ancient history, legacy code keeps it alive…
If you want to check what character set the browser is actually reporting, enter document.characterSet into the console. It turns out that header("Content-type: text/html; charset=utf-8") was the only way to sort out my "windows-1252" problem.