urllib2 使用 http 代理

文章目录

    需要验证的问题

    • 如何使用代理
    • 设置代理之后是否对 urllib2 有全局影响

    测试程序

    api.py

    # -*- coding: utf-8 -*-
    
    import urllib2
    
    
    def get_rsp(url):
        response = urllib2.urlopen(url)
        status_code = response.code
        content = response.read(300)
        print "status code: %s" % status_code
        print "content: %s" % content
    

    proxy.py

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    
    import urllib2
    import api
    import uuid
    
    
    PROXY = "222.176.112.31:80"
    TEST_URL = "http://www.sunzhongwei.com"
    TIMEOUT = 5
    
    
    def access_with_proxy():
        proxy_handler = urllib2.ProxyHandler({"http": PROXY})
        opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler)
        opener.addheaders = [
            ("User-Agent", "proxy tester"),
        ]
        urllib2.install_opener(opener)
        response = urllib2.urlopen(TEST_URL, timeout=TIMEOUT)
        status_code = response.code
        content = response.read(300)
        print "status code: %s" % status_code
        print "content: %s" % content
    
    
    def access():
        response = urllib2.urlopen(TEST_URL, timeout=TIMEOUT)
        status_code = response.code
        content = response.read(300)
        print "status code: %s" % status_code
        print "content: %s" % content
    
    
    def call_api():
        api.get_rsp(TEST_URL)
    
    
    if '__main__' == __name__:
        print "--------1---------"
        access()
        print "--------2---------"
        access_with_proxy()
        print "--------3---------"
        call_api()
    

    远端服务器上查看 Nginx 日志

    xxx.xxx.xxx.xx - - [15/Apr/2016:11:02:18 +0800] "GET / HTTP/1.1" 200 63499 "-" "Python-urllib/2.7"
    222.176.112.17 - - [15/Apr/2016:11:02:19 +0800] "GET / HTTP/1.1" 200 88679 "-" "proxy tester"
    222.176.112.17 - - [15/Apr/2016:11:02:19 +0800] "GET / HTTP/1.1" 200 88679 "-" "proxy tester"
    

    可以看到

    • urllib2 的确是全局影响的
    • 源 IP 并不一定是代理 IP

    另外,需要注意的是

    • 代理可能对同一链接有缓冲
    • 每个代理对请求频率应该有限制(例如,连续请求会报错 socket.error: [Errno 104] Connection reset by peer)

    关于作者 🌱

    我是来自山东烟台的一名开发者,有感兴趣的话题,或者软件开发需求,欢迎加微信 zhongwei 聊聊,或者关注我的个人公众号“大象工具”, 查看更多联系方式