Python: solving unicode hell with unidecode -

i have been working on ways flatten text ascii. ā -> a , ñ -> n, etc.

unidecode has been fantastic this.

# -*- coding: utf-8 -*- unidecode import unidecode print(unidecode(u"ā, ī, ū, ś, ñ")) print(unidecode(u"estado de são paulo"))

produces:

a, i, u, s, n estado de sao paulo

however, can't duplicate result data input file.

content of test.txt file:

ā, ī, ū, ś, ñ estado de são paulo

# -*- coding: utf-8 -*- unidecode import unidecode open("test.txt", 'r') inf:     line in inf:         print unidecode(line.strip())

produces:

a, a<<, a<<, a, a+- estado de sapso paulo

and:

runtimewarning: argument not unicode object. passing encoded string have unexpected results.

question: how can read these lines in unicode can pass them unidecode?

with codecs.open("test.txt", 'r', 'utf-8') inf:

Sp