Notes

Using urllib2 in Python to get content from web pages

Edit on GitHub

Python

First of all, you’d need to import urllib2. Add import urllib2 at the beginning of your python file.

Opening a URL

1import urllib2
2
3myurl = urllib2.urlopen('http://aamnah.com')

The contents of the page http://aamnah.com are now stored in the variable ‘myurl’.

At this point if you print myurl it’ll output <addinfourl at 4420156736 whose fp = <socket._fileobject object at 0x10759c7d0>> which basically tells you that the content exists instead of outputting the content. It’s a message from Python telling you taht you have an object stored in your variable.

1contents = myurl.readlines()
2print contents

will actually ‘read’ the contents of the web page and output that.

How to Parse HTML

  • how to turn a web page to text
  • how to return only links from that text