在我用浏览器打开URL之前，程序不会让我做任何事情！

作者: admin

时间: 22/11/29 18:01:03

选择 | 换行 | 行号

 import urllib
from urllib2 import urlopen
from gzip import GzipFile
from cStringIO import StringIO
import re
import urllib2
 
def download(url):
    s = urlopen(url).read()
    if s[:2] == '\x1f\x8b': # assume it's gzipped data
        with GzipFile(mode='rb', fileobj=StringIO(s)) as ifh:
            s = ifh.read()
    return s
 
s = download('http://www.locationary.com/place/en/US/Virginia/Richmond-page20/?ACTION_TOKEN=NumericAction')
 
findLoc = re.compile('http://www\.locationary\.com/place/en/US/Virginia/Richmond/.{1,100}\.jsp')
 
findLocL = re.findall(findLoc,s)
 
for i in range(0,25):
 
    def download(url):
        s = urlopen(url).read()
        if s[:2] == '\x1f\x8b': # assume it's gzipped data
            with GzipFile(mode='rb', fileobj=StringIO(s)) as ifh:
                s = ifh.read()
        return s
 
    b = download(findLocL[i])
 
    findYP = re.compile('http://www\.yellowpages\.com/.{1,100}\d{1,100}')
 
    findYPL = re.findall(findYP,b)
 
    for c in range(1):
 
        print findYPL[c]
 
 

这就是它给我的错误：
回溯(最近一次呼叫)：
文件"C：\Users\Robert\Documents\j-a-c-o-b\locationary.py"，第65行，在
打印findYPL[c]
IndexError：列表索引超出范围
然而，当我打开Google Chrome并打开所有链接(在程序中称为("findLocL[i]"或"b")，然后运行该程序时，它就工作了……
为什么会发生这种事？

# 回答1

嗨
顺便说一句，您不需要两次定义下载。
如果您将第36行更改为：

选择 | 换行 | 行号

for c in range(len(findYPL)):

或者可能是这样的：

选择 | 换行 | 行号

for c in range(min([len(findYPL),1])):

也许会更好：

选择 | 换行 | 行号

 for c in findYPL:
    print c
 

这可能会有帮助。当然，我不知道你到底想要达到什么目的，所以如果你提供一些背景知识，我们或许能帮到你。论坛一般都是这样的。我们不是读心术。

在我用浏览器打开URL之前，程序不会让我做任何事情！

添加新评论

最新文章

分类

最近回复

归档

其它