unicode
Original source: http://farmdev.com/talks/unicode/
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 10: ordinal not in range(128)
s.decode(encoding)
<type 'str'> to <type 'unicode'>
u.encode(encoding)
<type 'unicode'> to <type 'str'>
unicode vs decode: the unicode constructor can take other types apart from strings. For the bytestring case, however, the two forms are mostly equivalent. Stackoverflow.
.encode([encoding], [errors='strict'])
The errors parameter is the same as the parameter of the unicode() constructor, with one additional possibility; as well as 'strict'
, 'ignore'
, and 'replace'
, you can also pass 'xmlcharrefreplace'
which uses XML’s character references. Python Docs. Example: u.encode('ascii', 'replace')
# Converts to unicode object if it's string
def to_unicode_or_bust(obj, encoding='utf-8'):
if isinstance(obj, basestring):
if not isinstance(obj, unicode):
obj = unicode(obj, encoding)
return obj
import codecs
f = codecs.open('/tmp/ivan_utf8.txt', 'w', encoding='utf-8')
f = codecs.open('/tmp/ivan_utf8.txt', 'r', encoding='utf-8')