在我用浏览器打开URL之前,程序不会让我做任何事情!
选择 | 换行 | 行号
- import urllib
- from urllib2 import urlopen
- from gzip import GzipFile
- from cStringIO import StringIO
- import re
- import urllib2
- def download(url):
- s = urlopen(url).read()
- if s[:2] == '\x1f\x8b': # assume it's gzipped data
- with GzipFile(mode='rb', fileobj=StringIO(s)) as ifh:
- s = ifh.read()
- return s
- s = download('http://www.locationary.com/place/en/US/Virginia/Richmond-page20/?ACTION_TOKEN=NumericAction')
- findLoc = re.compile('http://www\.locationary\.com/place/en/US/Virginia/Richmond/.{1,100}\.jsp')
- findLocL = re.findall(findLoc,s)
- for i in range(0,25):
- def download(url):
- s = urlopen(url).read()
- if s[:2] == '\x1f\x8b': # assume it's gzipped data
- with GzipFile(mode='rb', fileobj=StringIO(s)) as ifh:
- s = ifh.read()
- return s
- b = download(findLocL[i])
- findYP = re.compile('http://www\.yellowpages\.com/.{1,100}\d{1,100}')
- findYPL = re.findall(findYP,b)
- for c in range(1):
- print findYPL[c]
这就是它给我的错误:
回溯(最近一次呼叫):
文件"C:\Users\Robert\Documents\j-a-c-o-b\locationary.py",第65行,在
打印findYPL[c]
IndexError:列表索引超出范围
然而,当我打开Google Chrome并打开所有链接(在程序中称为("findLocL[i]"或"b"),然后运行该程序时,它就工作了……
为什么会发生这种事?