urllib2 使用 http 代理

更新日期: 2016-04-15 阅读次数: 10291 分类: Python

需要验证的问题

  • 如何使用代理
  • 设置代理之后是否对 urllib2 有全局影响

测试程序

api.py

# -*- coding: utf-8 -*-

import urllib2


def get_rsp(url):
    response = urllib2.urlopen(url)
    status_code = response.code
    content = response.read(300)
    print "status code: %s" % status_code
    print "content: %s" % content

proxy.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import urllib2
import api
import uuid


PROXY = "222.176.112.31:80"
TEST_URL = "http://www.sunzhongwei.com"
TIMEOUT = 5


def access_with_proxy():
    proxy_handler = urllib2.ProxyHandler({"http": PROXY})
    opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler)
    opener.addheaders = [
        ("User-Agent", "proxy tester"),
    ]
    urllib2.install_opener(opener)
    response = urllib2.urlopen(TEST_URL, timeout=TIMEOUT)
    status_code = response.code
    content = response.read(300)
    print "status code: %s" % status_code
    print "content: %s" % content


def access():
    response = urllib2.urlopen(TEST_URL, timeout=TIMEOUT)
    status_code = response.code
    content = response.read(300)
    print "status code: %s" % status_code
    print "content: %s" % content


def call_api():
    api.get_rsp(TEST_URL)


if '__main__' == __name__:
    print "--------1---------"
    access()
    print "--------2---------"
    access_with_proxy()
    print "--------3---------"
    call_api()

远端服务器上查看 Nginx 日志

xxx.xxx.xxx.xx - - [15/Apr/2016:11:02:18 +0800] "GET / HTTP/1.1" 200 63499 "-" "Python-urllib/2.7"
222.176.112.17 - - [15/Apr/2016:11:02:19 +0800] "GET / HTTP/1.1" 200 88679 "-" "proxy tester"
222.176.112.17 - - [15/Apr/2016:11:02:19 +0800] "GET / HTTP/1.1" 200 88679 "-" "proxy tester"

可以看到

  • urllib2 的确是全局影响的
  • 源 IP 并不一定是代理 IP

另外,需要注意的是

  • 代理可能对同一链接有缓冲
  • 每个代理对请求频率应该有限制(例如,连续请求会报错 socket.error: [Errno 104] Connection reset by peer)

关于作者 🌱

我是来自山东烟台的一名开发者,有敢兴趣的话题,或者软件开发需求,欢迎加微信 zhongwei 聊聊, 查看更多联系方式