samedi 27 juin 2015

Some Decoding Issue With String in Python

I'm trying to write the HTML Code string from Google into file in Python 3.4

#coding=utf-8
try:
    from urllib.request import Request, urlopen  # Python 3
except:
    from urllib2 import Request, urlopen  # Python 2

useragent = 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'

#Generate URL
url = 'http://ift.tt/1Icd5wI'
query = str(input('Google It! :'))
full_url = url+query


#Request Data
data = Request(full_url)
data.add_header('User-Agent', useragent)
dataRequested = urlopen(data).read()
dataRequested = str(dataRequested.decode('utf-8'))


print(dataRequested)

#Write Data Into File
file = open('Google - '+query+'.html', 'w')
file.write(dataRequested)

It can print the string correctly, but when it write to file, it will show

file.write(dataRequested)
UnicodeEncodeError: 'cp950' codec can't encode character '\u200e' in position 97658: illegal multibyte sequence

I tried to change the decode way but it doesn't work. And i tried to replace \u200e too,but it will comes more encode charater error.

Aucun commentaire:

Enregistrer un commentaire