python教程

当前位置:首页 > requests爬虫 > 当前文章

requests爬虫

requests请求超时处理与异常总结

2019-07-18 61赞 老董笔记

  有时候在上网的时候打开1个网页非常卡,浏览器要转半天才转出来,这种情况在用代码请求网页的时候也会遇到。所以requests模块提供了1个timeout参数来设定请求时间(秒数),超出秒数以后就会抛出requests.exceptions.Timeout异常。

  PS:正常所有的爬虫代码都应该使用timeout参数。如果不使用,可能会出现某个请求一直在等服务器的响应而导致整个程序阻塞,程序看起来卡着不动了!

# -*- coding: utf-8 -*-
url = 'https://github.com'
try:
  requests.get('https://github.com',timeout=0.0001)
except requests.exceptions.Timeout  as e:
  print(e)
HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to github.com timed out. (connect timeout=0.0001)'))

  timeout仅对连接过程有效,与响应体的下载无关。timeout并不是整个下载响应的时间限制,而是如果服务器在 timeout 秒内没有应答,将会引发一个异常(更精确地说,是在 timeout 秒内没有从基础套接字上接收到任何字节的数据时)If no timeout is specified explicitly, requests do not time out.

  上网过程可能出现各种各样的问题,比如网站服务器挂了,域名已经过期了无法访问,国外的敏感网站国内不能访问,网络断线了等等,所以访问网页过程中遇到的异常不只有超时1种,下面简单介绍下。

  requests异常介绍

  ConnectionError 异常:遇到网络问题(如:DNS 查询失败、拒绝连接等时,Requests会抛出。

# -*- coding: utf-8 -*-
try:
  requests.get('https://google.com',timeout=5)
except requests.exceptions.ConnectionError  as e:
  print(e)
('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054,
None))

  ConnectTimeoutError异常:若请求超时,则抛出。

# -*- coding: utf-8 -*-
try:
  requests.get('https://baidu.com',timeout=0.00001)
except requests.exceptions.Timeout  as e:
  print(e)
HTTPSConnectionPool(host='baidu.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to baidu.com timed out. (connect timeout=1e-05)'))

  TooManyRedirects异常:若请求超过了设定的最大重定向次数,则会抛出。(requests设置最大重定向测试很多方式并没有生效)

# -*- coding: utf-8 -*-
def get_html(url,retry=1):
    try:
        r = requests.get(url=url,headers=my_header, timeout=5)
    except requests.exceptions.RequestException as e:
        print(e)
        if retry > 0:
            get_html(url, retry - 1)
    else:
        html = r.text
        return html

  所有Requests显式抛出的异常都继承自requests.exceptions.RequestException,可以直接用这个基类来捕获。

# -*- coding: utf-8 -*-
try:
  requests.get('https://360.com',timeout=0.00001)
except requests.exceptions.RequestException  as e:
  print(e)
HTTPSConnectionPool(host='360.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to 360.com timed out. (connect timeout=1e-05)'))

  也可以用python内置的Exception类来捕获。

# -*- coding: utf-8 -*-
try:
  requests.get("https://360.com",timeout=0.00001)
except  Exception  as e:
  print(e)
HTTPSConnectionPool(host='360.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to 360.com timed out. (connect timeout=1e-05)'))

感兴趣直接点击图片获取>>

文章评论

requests请求超时处理与异常总结文章写得不错,值得赞赏