Skip to content

Commit aca8fd7

Browse files
committed
Documentation updates for urllib package. Modified the documentation for the
urllib,urllib2 -> urllib.request,urllib.error urlparse -> urllib.parse RobotParser -> urllib.robotparser Updated tutorial references and other module references (http.client.rst, ftplib.rst,contextlib.rst) Updated the examples in the urllib2-howto Addresses Issue3142.
1 parent d11a443 commit aca8fd7

File tree

12 files changed

+565
-593
lines changed

12 files changed

+565
-593
lines changed

Doc/howto/urllib2.rst

Lines changed: 69 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
************************************************
2-
HOWTO Fetch Internet Resources Using urllib2
3-
************************************************
1+
*****************************************************
2+
HOWTO Fetch Internet Resources Using urllib package
3+
*****************************************************
44

55
:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
66

@@ -24,14 +24,14 @@ Introduction
2424

2525
A tutorial on *Basic Authentication*, with examples in Python.
2626

27-
**urllib2** is a `Python <http://www.python.org>`_ module for fetching URLs
27+
**urllib.request** is a `Python <http://www.python.org>`_ module for fetching URLs
2828
(Uniform Resource Locators). It offers a very simple interface, in the form of
2929
the *urlopen* function. This is capable of fetching URLs using a variety of
3030
different protocols. It also offers a slightly more complex interface for
3131
handling common situations - like basic authentication, cookies, proxies and so
3232
on. These are provided by objects called handlers and openers.
3333

34-
urllib2 supports fetching URLs for many "URL schemes" (identified by the string
34+
urllib.request supports fetching URLs for many "URL schemes" (identified by the string
3535
before the ":" in URL - for example "ftp" is the URL scheme of
3636
"ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
3737
This tutorial focuses on the most common case, HTTP.
@@ -40,43 +40,43 @@ For straightforward situations *urlopen* is very easy to use. But as soon as you
4040
encounter errors or non-trivial cases when opening HTTP URLs, you will need some
4141
understanding of the HyperText Transfer Protocol. The most comprehensive and
4242
authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
43-
not intended to be easy to read. This HOWTO aims to illustrate using *urllib2*,
43+
not intended to be easy to read. This HOWTO aims to illustrate using *urllib*,
4444
with enough detail about HTTP to help you through. It is not intended to replace
45-
the :mod:`urllib2` docs, but is supplementary to them.
45+
the :mod:`urllib.request` docs, but is supplementary to them.
4646

4747

4848
Fetching URLs
4949
=============
5050

51-
The simplest way to use urllib2 is as follows::
51+
The simplest way to use urllib.request is as follows::
5252

53-
import urllib2
54-
response = urllib2.urlopen('http://python.org/')
53+
import urllib.request
54+
response = urllib.request.urlopen('http://python.org/')
5555
html = response.read()
5656

57-
Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
57+
Many uses of urllib will be that simple (note that instead of an 'http:' URL we
5858
could have used an URL starting with 'ftp:', 'file:', etc.). However, it's the
5959
purpose of this tutorial to explain the more complicated cases, concentrating on
6060
HTTP.
6161

6262
HTTP is based on requests and responses - the client makes requests and servers
63-
send responses. urllib2 mirrors this with a ``Request`` object which represents
63+
send responses. urllib.request mirrors this with a ``Request`` object which represents
6464
the HTTP request you are making. In its simplest form you create a Request
6565
object that specifies the URL you want to fetch. Calling ``urlopen`` with this
6666
Request object returns a response object for the URL requested. This response is
6767
a file-like object, which means you can for example call ``.read()`` on the
6868
response::
6969

70-
import urllib2
70+
import urllib.request
7171

72-
req = urllib2.Request('http://www.voidspace.org.uk')
73-
response = urllib2.urlopen(req)
72+
req = urllib.request.Request('http://www.voidspace.org.uk')
73+
response = urllib.request.urlopen(req)
7474
the_page = response.read()
7575

76-
Note that urllib2 makes use of the same Request interface to handle all URL
76+
Note that urllib.request makes use of the same Request interface to handle all URL
7777
schemes. For example, you can make an FTP request like so::
7878

79-
req = urllib2.Request('ftp://example.com/')
79+
req = urllib.request.Request('ftp://example.com/')
8080

8181
In the case of HTTP, there are two extra things that Request objects allow you
8282
to do: First, you can pass data to be sent to the server. Second, you can pass
@@ -94,28 +94,28 @@ your browser does when you submit a HTML form that you filled in on the web. Not
9494
all POSTs have to come from forms: you can use a POST to transmit arbitrary data
9595
to your own application. In the common case of HTML forms, the data needs to be
9696
encoded in a standard way, and then passed to the Request object as the ``data``
97-
argument. The encoding is done using a function from the ``urllib`` library
98-
*not* from ``urllib2``. ::
97+
argument. The encoding is done using a function from the ``urllib.parse`` library
98+
*not* from ``urllib.request``. ::
9999

100-
import urllib
101-
import urllib2
100+
import urllib.parse
101+
import urllib.request
102102

103103
url = 'http://www.someserver.com/cgi-bin/register.cgi'
104104
values = {'name' : 'Michael Foord',
105105
'location' : 'Northampton',
106106
'language' : 'Python' }
107107

108-
data = urllib.urlencode(values)
109-
req = urllib2.Request(url, data)
110-
response = urllib2.urlopen(req)
108+
data = urllib.parse.urlencode(values)
109+
req = urllib.request.Request(url, data)
110+
response = urllib.request.urlopen(req)
111111
the_page = response.read()
112112

113113
Note that other encodings are sometimes required (e.g. for file upload from HTML
114114
forms - see `HTML Specification, Form Submission
115115
<http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
116116
details).
117117

118-
If you do not pass the ``data`` argument, urllib2 uses a **GET** request. One
118+
If you do not pass the ``data`` argument, urllib.request uses a **GET** request. One
119119
way in which GET and POST requests differ is that POST requests often have
120120
"side-effects": they change the state of the system in some way (for example by
121121
placing an order with the website for a hundredweight of tinned spam to be
@@ -127,18 +127,18 @@ GET request by encoding it in the URL itself.
127127

128128
This is done as follows::
129129

130-
>>> import urllib2
131-
>>> import urllib
130+
>>> import urllib.request
131+
>>> import urllib.parse
132132
>>> data = {}
133133
>>> data['name'] = 'Somebody Here'
134134
>>> data['location'] = 'Northampton'
135135
>>> data['language'] = 'Python'
136-
>>> url_values = urllib.urlencode(data)
136+
>>> url_values = urllib.parse.urlencode(data)
137137
>>> print(url_values)
138138
name=Somebody+Here&language=Python&location=Northampton
139139
>>> url = 'http://www.example.com/example.cgi'
140140
>>> full_url = url + '?' + url_values
141-
>>> data = urllib2.open(full_url)
141+
>>> data = urllib.request.open(full_url)
142142

143143
Notice that the full URL is created by adding a ``?`` to the URL, followed by
144144
the encoded values.
@@ -150,7 +150,7 @@ We'll discuss here one particular HTTP header, to illustrate how to add headers
150150
to your HTTP request.
151151

152152
Some websites [#]_ dislike being browsed by programs, or send different versions
153-
to different browsers [#]_ . By default urllib2 identifies itself as
153+
to different browsers [#]_ . By default urllib identifies itself as
154154
``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
155155
numbers of the Python release,
156156
e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
@@ -160,8 +160,8 @@ pass a dictionary of headers in. The following example makes the same
160160
request as above, but identifies itself as a version of Internet
161161
Explorer [#]_. ::
162162

163-
import urllib
164-
import urllib2
163+
import urllib.parse
164+
import urllib.request
165165
166166
url = 'http://www.someserver.com/cgi-bin/register.cgi'
167167
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
@@ -170,9 +170,9 @@ Explorer [#]_. ::
170170
'language' : 'Python' }
171171
headers = { 'User-Agent' : user_agent }
172172
173-
data = urllib.urlencode(values)
174-
req = urllib2.Request(url, data, headers)
175-
response = urllib2.urlopen(req)
173+
data = urllib.parse.urlencode(values)
174+
req = urllib.request.Request(url, data, headers)
175+
response = urllib.request.urlopen(req)
176176
the_page = response.read()
177177

178178
The response also has two useful methods. See the section on `info and geturl`_
@@ -182,7 +182,7 @@ which comes after we have a look at what happens when things go wrong.
182182
Handling Exceptions
183183
===================
184184

185-
*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
185+
*urllib.error* raises ``URLError`` when it cannot handle a response (though as usual
186186
with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
187187
be raised).
188188

@@ -199,9 +199,9 @@ error code and a text error message.
199199

200200
e.g. ::
201201

202-
>>> req = urllib2.Request('http://www.pretend_server.org')
203-
>>> try: urllib2.urlopen(req)
204-
>>> except URLError, e:
202+
>>> req = urllib.request.Request('http://www.pretend_server.org')
203+
>>> try: urllib.request.urlopen(req)
204+
>>> except urllib.error.URLError, e:
205205
>>> print(e.reason)
206206
>>>
207207
(4, 'getaddrinfo failed')
@@ -214,7 +214,7 @@ Every HTTP response from the server contains a numeric "status code". Sometimes
214214
the status code indicates that the server is unable to fulfil the request. The
215215
default handlers will handle some of these responses for you (for example, if
216216
the response is a "redirection" that requests the client fetch the document from
217-
a different URL, urllib2 will handle that for you). For those it can't handle,
217+
a different URL, urllib.request will handle that for you). For those it can't handle,
218218
urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
219219
found), '403' (request forbidden), and '401' (authentication required).
220220

@@ -305,12 +305,12 @@ dictionary is reproduced here for convenience ::
305305
When an error is raised the server responds by returning an HTTP error code
306306
*and* an error page. You can use the ``HTTPError`` instance as a response on the
307307
page returned. This means that as well as the code attribute, it also has read,
308-
geturl, and info, methods. ::
308+
geturl, and info, methods as returned by the ``urllib.response`` module::
309309

310-
>>> req = urllib2.Request('http://www.python.org/fish.html')
310+
>>> req = urllib.request.Request('http://www.python.org/fish.html')
311311
>>> try:
312-
>>> urllib2.urlopen(req)
313-
>>> except URLError, e:
312+
>>> urllib.request.urlopen(req)
313+
>>> except urllib.error.URLError, e:
314314
>>> print(e.code)
315315
>>> print(e.read())
316316
>>>
@@ -334,7 +334,8 @@ Number 1
334334
::
335335

336336

337-
from urllib2 import Request, urlopen, URLError, HTTPError
337+
from urllib.request import Request, urlopen
338+
from urllib.error import URLError, HTTPError
338339
req = Request(someurl)
339340
try:
340341
response = urlopen(req)
@@ -358,7 +359,8 @@ Number 2
358359

359360
::
360361

361-
from urllib2 import Request, urlopen, URLError
362+
from urllib.request import Request, urlopen
363+
from urllib.error import URLError
362364
req = Request(someurl)
363365
try:
364366
response = urlopen(req)
@@ -377,7 +379,8 @@ info and geturl
377379
===============
378380

379381
The response returned by urlopen (or the ``HTTPError`` instance) has two useful
380-
methods ``info`` and ``geturl``.
382+
methods ``info`` and ``geturl`` and is defined in the module
383+
``urllib.response``.
381384

382385
**geturl** - this returns the real URL of the page fetched. This is useful
383386
because ``urlopen`` (or the opener object used) may have followed a
@@ -397,7 +400,7 @@ Openers and Handlers
397400
====================
398401

399402
When you fetch a URL you use an opener (an instance of the perhaps
400-
confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
403+
confusingly-named :class:`urllib.request.OpenerDirector`). Normally we have been using
401404
the default opener - via ``urlopen`` - but you can create custom
402405
openers. Openers use handlers. All the "heavy lifting" is done by the
403406
handlers. Each handler knows how to open URLs for a particular URL scheme (http,
@@ -466,24 +469,24 @@ The top-level URL is the first URL that requires authentication. URLs "deeper"
466469
than the URL you pass to .add_password() will also match. ::
467470

468471
# create a password manager
469-
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
472+
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
470473

471474
# Add the username and password.
472475
# If we knew the realm, we could use it instead of ``None``.
473476
top_level_url = "http://example.com/foo/"
474477
password_mgr.add_password(None, top_level_url, username, password)
475478

476-
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
479+
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
477480

478481
# create "opener" (OpenerDirector instance)
479-
opener = urllib2.build_opener(handler)
482+
opener = urllib.request.build_opener(handler)
480483

481484
# use the opener to fetch a URL
482485
opener.open(a_url)
483486

484487
# Install the opener.
485-
# Now all calls to urllib2.urlopen use our opener.
486-
urllib2.install_opener(opener)
488+
# Now all calls to urllib.request.urlopen use our opener.
489+
urllib.request.install_opener(opener)
487490

488491
.. note::
489492

@@ -505,46 +508,46 @@ not correct.
505508
Proxies
506509
=======
507510

508-
**urllib2** will auto-detect your proxy settings and use those. This is through
511+
**urllib.request** will auto-detect your proxy settings and use those. This is through
509512
the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
510513
a good thing, but there are occasions when it may not be helpful [#]_. One way
511514
to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
512515
is done using similar steps to setting up a `Basic Authentication`_ handler : ::
513516

514-
>>> proxy_support = urllib2.ProxyHandler({})
515-
>>> opener = urllib2.build_opener(proxy_support)
516-
>>> urllib2.install_opener(opener)
517+
>>> proxy_support = urllib.request.ProxyHandler({})
518+
>>> opener = urllib.request.build_opener(proxy_support)
519+
>>> urllib.request.install_opener(opener)
517520

518521
.. note::
519522

520-
Currently ``urllib2`` *does not* support fetching of ``https`` locations
521-
through a proxy. However, this can be enabled by extending urllib2 as
523+
Currently ``urllib.request`` *does not* support fetching of ``https`` locations
524+
through a proxy. However, this can be enabled by extending urllib.request as
522525
shown in the recipe [#]_.
523526

524527

525528
Sockets and Layers
526529
==================
527530

528-
The Python support for fetching resources from the web is layered. urllib2 uses
529-
the http.client library, which in turn uses the socket library.
531+
The Python support for fetching resources from the web is layered.
532+
urllib.request uses the http.client library, which in turn uses the socket library.
530533

531534
As of Python 2.3 you can specify how long a socket should wait for a response
532535
before timing out. This can be useful in applications which have to fetch web
533536
pages. By default the socket module has *no timeout* and can hang. Currently,
534-
the socket timeout is not exposed at the http.client or urllib2 levels.
537+
the socket timeout is not exposed at the http.client or urllib.request levels.
535538
However, you can set the default timeout globally for all sockets using ::
536539

537540
import socket
538-
import urllib2
541+
import urllib.request
539542

540543
# timeout in seconds
541544
timeout = 10
542545
socket.setdefaulttimeout(timeout)
543546

544-
# this call to urllib2.urlopen now uses the default timeout
547+
# this call to urllib.request.urlopen now uses the default timeout
545548
# we have set in the socket module
546-
req = urllib2.Request('http://www.voidspace.org.uk')
547-
response = urllib2.urlopen(req)
549+
req = urllib.request.Request('http://www.voidspace.org.uk')
550+
response = urllib.request.urlopen(req)
548551

549552

550553
-------

Doc/library/contextlib.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,9 +98,9 @@ Functions provided:
9898
And lets you write code like this::
9999

100100
from contextlib import closing
101-
import urllib
101+
import urllib.request
102102

103-
with closing(urllib.urlopen('http://www.python.org')) as page:
103+
with closing(urllib.request.urlopen('http://www.python.org')) as page:
104104
for line in page:
105105
print(line)
106106

Doc/library/fileformats.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ that aren't markup languages or are related to e-mail.
1313

1414
csv.rst
1515
configparser.rst
16-
robotparser.rst
1716
netrc.rst
1817
xdrlib.rst
1918
plistlib.rst

Doc/library/ftplib.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@
1313
This module defines the class :class:`FTP` and a few related items. The
1414
:class:`FTP` class implements the client side of the FTP protocol. You can use
1515
this to write Python programs that perform a variety of automated FTP jobs, such
16-
as mirroring other ftp servers. It is also used by the module :mod:`urllib` to
17-
handle URLs that use FTP. For more information on FTP (File Transfer Protocol),
18-
see Internet :rfc:`959`.
16+
as mirroring other ftp servers. It is also used by the module
17+
:mod:`urllib.request` to handle URLs that use FTP. For more information on FTP
18+
(File Transfer Protocol), see Internet :rfc:`959`.
1919

2020
Here's a sample session using the :mod:`ftplib` module::
2121

0 commit comments

Comments
 (0)