python教程

当前位置:首页 > requests爬虫 > 当前文章

requests爬虫

requests的get请求url传参及无效参数

2019-07-18 59赞 老董笔记

  很多网站的url是带有参数的(http://www.xxx.com/get?key1=val1&key2=val2),比如在百度搜索www.python66.com,然后搜索结果页的url是很长的一串,取部分参数也可以访问如:https://www.baidu.com/s?tn=50000021_hao_pg&word=python66.com,requests对于这种带参数的url如何实现请求呢?

  1、如何进行url传参

官方原文:You often want to send some sort of data in the URL’s query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dictionary of strings, using the params keyword argument. As an example, if you wanted to pass key1=value1 and key2=value2 to httpbin.org/get, you would use the following code:

 译文:Requests 允许你使用params关键字参数,以一个字符串字典来提供这些参数。举例来说,如果你想传递 key1=value1 和 key2=value2 到 httpbin.org/get ,那么你可以使用如下代码

# -*- coding: utf-8 -*-
import requests

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get("http://httpbin.org/get", params=payload)

  同理,如果访问python66.com的百度搜索结果页就可以这样:

headers = {
'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44',
}
payload = {'word': '50000021_hao_pg', 'tn': 'python66.com'}

r = requests.get("https://www.baidu.com", params=payload,headers=headers)

  2、url传参的本质

# -*- coding: utf-8 -*-
headers = {
'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44',
}
payload = {'word': '50000021_hao_pg', 'tn': 'python66.com'}

r = requests.get("https://www.baidu.com", params=payload,headers=headers)
print(r.url)
https://www.baidu.com/?word=50000021_hao_pg&tn=python66.com

  观察上面的代码结果,url传参实际上和直接访问拼接好的url没有区别,只不过是requests在内部进行了处理,上述访问百度搜索结果页的例子可以直接按如下来写

# -*- coding: utf-8 -*-
import requests

headers = {
'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44',
}
payload = {'word': '50000021_hao_pg', 'tn': 'python66.com'}

r = requests.get("https://www.baidu.com/s?tn=50000021_hao_pg&word=python66.com", headers=headers)

  3、无效传参

Note that any dictionary key whose value is None will not be added to the URL’s query string

  PS:注意字典里值为None的键都不会被添加到URL的查询字符串里,也就是说字典的一个键的值为None,实际上等于url不添加这个参数

# -*- coding: utf-8 -*-
headers = {
'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44',
}
payload = {'word': '50000021_hao_pg', 'tn': None}

r = requests.get("https://www.baidu.com", params=payload,headers=headers)
print(r.url)
https://www.baidu.com/?word=50000021_hao_pg

感兴趣直接点击图片获取>>

文章评论

requests的get请求url传参及无效参数文章写得不错,值得赞赏