Skip to content

$#%#! UTF-8 in Python

This is not a post about using UTF-8 properly in Python, but doing evil, evil things.

Python dutifully respects the $LANG environment variable on the terminal. It turns out that a lot of the time this variable is totally wrong, it’s set to something like C even though the terminal is UTF-8 encoding.

The problem is that there is no easy way to change a file’s encoding after it’s open, well until this horrible hack! The following code will force the output encoding of stdout to UTF-8 even if started with LANG=C.

# License: MIT
try:
    print u"\u263A"
except Exception, e:
    print e
 
import sys
print sys.stdout.encoding
 
from ctypes import pythonapi, py_object, c_char_p
PyFile_SetEncoding = pythonapi.PyFile_SetEncoding
PyFile_SetEncoding.argtypes = (py_object, c_char_p)
if not PyFile_SetEncoding(sys.stdout, "UTF-8"):
    raise ValueError
 
try:
    print u"\u263A"
except Exception, e:
    print e

Post a Comment

Your email is never published nor shared. Required fields are marked *