Vigil@nce: PHP, incorrect decoding of utf8_decode
November 2010 by Vigil@nce
This bulletin was written by Vigil@nce : http://vigilance.fr/
SYNTHESIS OF THE VULNERABILITY
When an application uses the utf8_decode() or xml_utf8_decode()
functions, UTF-8 characters are incorrectly decoded, so an
attacker can for example bypass a filter.
– Severity: 2/4
– Creation date: 02/11/2010
DESCRIPTION OF THE VULNERABILITY
The PHP utf8_decode() and xml_utf8_decode() decode UTF-8 character
sequences.
The UTF-8 encoding can be used to represent Unicode characters on
several bytes:
– 1 to 7 bits : 0xxxxxxx
– 8 to 11 bits : 110xxxxx 10xxxxxx
– 12 to 16 bits : 1110xxxx 10xxxxxx 10xxxxxx
– 17 to 21 bits : 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
UTF-8 limits the encoding to 4 bytes and forbids usage of more
bytes than necessary.
However, the utf8_decode() and xml_utf8_decode() function do not
correctly decode malformed encoding between 17 and 21 bits. A
valid character is thus generated, instead of the unknown (’?’)
character.
When an application uses the utf8_decode() or xml_utf8_decode()
functions, UTF-8 characters are therefore incorrectly decoded, so
an attacker can for example bypass a filter.
ACCESS TO THE COMPLETE VIGIL@NCE BULLETIN
http://vigilance.fr/vulnerability/PHP-incorrect-decoding-of-utf8-decode-10092