Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the http server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.
The following W3C document, http://www.w3.org/International/O-charset, lists the various ways in which a (X)HTML or a XML document could have specified its encoding information.
As the python.org website uses utf-8 encoding as specified in it’s meta tag, we will use the same for decoding the bytes object.

Комментариев нет:
Отправить комментарий