urlopen - Here is an explanation about how to handle...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Here is an explanation about how to handle password protected sites. Similar approaches can also be used to handle/detect other types of exceptions, like broken links. *************************************************************** Password protected sites (handling HTTP Error 401 -- authentication required) Two functions Python offers for opening a web page - urlopen() and urlretrieve() - are very convenient and hide the details of the HTTP protocol from the programmer. However they have one unacceptable (for a crawler) feature: when we try to open password protected sites they prompt the user for username and password. To override such a strange way of handling authentication errors we need to know how urlopen() and urlretrieve() organized are internally. If we look at the source code of the urllib module we will see that both urlopen() and urlretrieve() use a helper class FancyURLopener. Basically all urlopen() and urlretrieve() do is to call methods open() and retrieve() of the helper class FancyURLopener. Nothing prevents us from using url openers
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/07/2010 for the course CS 6913 taught by Professor Torsensuel during the Spring '10 term at NYU Poly.

Page1 / 2

urlopen - Here is an explanation about how to handle...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online