您的位置: 网站首页> SEO工具> 当前文章
python多线程百度mo关键词和url一对一排名查询
老董-我爱我家房产SEO2020-08-24180围观,140赞
之前有百度PC的排名查询,查询百度mo端的排名也不难,原理是一样的。本文查询排名是指定url和关键词一对一查询!线程数默认是1,现在百度反爬比之前严重!线程最好是1。【多线程写同一个文件需要加锁否则可能数据错乱】
1、kwd_url.txt,每行关键词和url一对,中间用制表符(直接从excel复制)隔开,url必须加http或者https
2、区分http和https
3、区分http://aaa/bbb/和http://aaa/bbb
# ‐*‐ coding: utf‐8 ‐*‐
"""
kwd和url一对一查询 仅查前十名
kwd_url.txt,每行关键词和url一对,中间用制表符(直接从excel复制)隔开,url必须加http或者https
区分http和https
区分http://aaa/bbb/和http://aaa/bbb
"""
import requests
from pyquery import PyQuery as pq
import threading
import queue
import gc
import json
class BdmoRank(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
# 读取txt文件 获取待查询url
@staticmethod
def read_txt(filepath):
q = queue.Queue()
for line in open(filepath, encoding='utf-8'):
kwd_url = line.strip().split(' ')
q.put(kwd_url)
return q
# 获取某词的serp源码
def get_html(self, url, retry=2):
try:
r = requests.get(url=url, headers=user_agent, timeout=5)
except Exception as e:
print('获取源码失败', url, e)
if retry > 0:
self.get_html(url, retry - 1)
else:
html = r.text
return html
# 获取某词的serp源码上包含排名url的div块
def get_data_logs(self, html):
data_logs = []
if html and '百度' in html:
doc = pq(html)
try:
div_list = doc('.c-result').items()
except Exception as e:
print('提取div块失败', e)
else:
for div in div_list:
data_log = div.attr('data-log')
data_logs.append(data_log) if data_log is not None else data_logs
return data_logs
# 检查链接是否首页有排名
def check_include(self, url, data_logs=[]):
rank = None
for data_log in data_logs:
# json字符串要以双引号表示
data_log = json.loads(data_log.replace("'", '"'))
if url == data_log['mu']:
rank = data_log['order']
return url,rank
return url,rank
# 线程函数
def run(self):
while 1:
kwd_url = q.get()
try:
kwd = kwd_url[0]
url_check = kwd_url[1]
url = "https://m.baidu.com/s?ie=utf-8&word={0}".format(kwd)
html = self.get_html(url)
data_logs = self.get_data_logs(html)
url,rank = self.check_include(url_check,data_logs)
print(kwd,url,rank)
f.write(kwd + url + ' ' + str(rank) + '
')
del kwd
del url_check
gc.collect()
except Exception as e:
print(e)
finally:
q.task_done()
if __name__ == "__main__":
user_agent = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Mobile Safari/537.36'}
q = BdmoRank.read_txt('kwd_url.txt')
f = open('bdmo_rank1.txt','w',encoding='utf-8')
# 设置线程数
for i in list(range(1)):
t = BdmoRank()
t.setDaemon(True)
t.start()
q.join()
f.flush()
f.close()
北京二手车出售 https://m.renrenche.com/bj/ershouche/ 2 鞍山二手宝沃 https://m.renrenche.com/cn/baowo_baowoBX7/ None 鞍山二手北汽新能源 https://m.renrenche.com/cn/beiqixinnengyuan/jishou/ None 鞍山二手北汽威旺 https://m.renrenche.com/as/ None 鞍山华泰新能源二手车报价 https://m.renrenche.com/cn/huataixinnengyuan/ 5
python多线程百度mo关键词和url一对一排名查询代码如上,有问题请及时反馈给我。
很赞哦!
python编程网提示:转载请注明来源www.python66.com。
有宝贵意见可添加站长微信(底部),获取技术资料请到公众号(底部)。同行交流请加群
相关文章
文章评论
-
python多线程百度mo关键词和url一对一排名查询文章写得不错,值得赞赏


