Last week I went skiing for a couple of days at Bolognola. We had a great time, but it all ended in tragedy (or could have)! Check out the pictures and the funny video. (more...)
ascending
Last week I went skiing for a couple of days at Bolognola. We had a great time, but it all ended in tragedy (or could have)! Check out the pictures and the funny video. (more...)
Usually using Unicode characters when designing a website (or a multilingual desktop application) is considered a good thing. By default, when designing a standard windows client application or a PHP website, all output encoded using a limited set of characters. Usually ISO-8859-1 in countries using latin characters or a superset thereof (like the "euro" symbol, which is not included by default, the german "umlaut" and neolatin accents).
This usually works pretty well, without hassle for the programmer... at least until you have output (or input) in multiple languages (like localized language files or comments on a website). In this case, you'll have to deal - sooner or later - with a character encoding scheme that supports more characters than the default one.
Usually, the best choice is UTF-8, a variable length Unicode encoding which is retrocompatible with ASCII encoding. Most of the common latin characters take only 8 bits (as in ASCII), accented letters take 2 bytes and more complex characters take more (check out Wikipedia for more info).
The main problem of PHP is that the current version (v. 5) does not actively support UTF output. That means that string operations (like "substr" and so on) don't work reliably with Unicode strings (same thing goes for the standard C runtime functions). But since PHP basically pushes all output to the webserver ignoring its encoding, if you write ("echo") Unicode character strings, they will be shown correctly in the browser.
Just remember to use string functions only on ASCII strings and to use the special Unicode functions on the rest. Hopefully, the upcoming PHP 6 will support UTF-8 encoded strings automatically.
Also remember to tell the web server that the page you're outputting is composed of UTF-8 characters, for instance by starting the page with:
Header('Content-Type: text/html; charset=UTF-8');
MySQL fortunately supports Unicode encoding and it's really easy to set everything up correctly. Just define the correct "CHARSET" when creating your tables:
CREATE TABLE `name` (
... columns ...
) DEFAULT CHARSET=utf8;
This is a problem I only recently solved on this website. Fact is, the source data is encoded in UTF-8, PHP outputs everything correctly, the output data is correctly interpreted as UTF-8, but all special characters still show up as boxes or question marks in the browser! ![]()
The problem lies in how PHP and MySQL communicate: the connection between the two servers actually sends and receives text strings (your SQL queries and their results) which of course are encoded in some way. It turns out, they usually are encoded with the default PHP setting (which usually is not Unicode!).
To solve the issue, simply put this MySQL query after you open the database connection and before you execute any other query:
mysql_query("SET NAMES 'utf8'");
It is this simple, and solved any encoding problem I ever had! ![]()
Perhaps you already have a database full of data and used the default (latin1) encoding to store it. When INSERTing a new row in the database with UTF encoded strings, your data is likewise corrupted and then not shown correctly then you fetch it again.
The easiest way to convert the whole database (if it isn't too big) is:
Of course this won't work if you have 8 gigas of data on your server, but hey! ![]()
Just remember to INSERT new data using UTF-8 (browser will usually give you the user's input using the same encoding of the page they loaded).
Busy week! Exams are approaching and I keep working on my useless side-projects when I should be studying... ![]()
Anyway, I found the time to update my pet project "OnTopReplica". You can download it directly from this site (it should update itself automatically). Enjoy!
And by the way, I learned some really interesting stuff about application settings and XML serialization: I'll post it soon.