您的位置: 网站首页> 大家问> 当前文章
python提取url的顶级域名及域名后缀
老董2020-04-25188围观,138赞
python的urlparse模块可以解析域名,但是有个缺陷无法提取url的顶级域名。如果要做一些复杂的提取可以用tld模块。安装方式就是pip命令安装。
pip install tld
以下代码我们用该模块获取一个url的顶级域名(不含后缀部分)、域名后缀、顶级域名(带后缀部分)、子域名部分(不含后缀)
提醒:必须含有协议,比如http或者https,否则会报错。如下:
Is not a valid URL www.python66.com!
# -*- coding: utf-8 -*- import tld url = 'www.python66.com' obj = tld.get_tld(url,as_object=True)
Traceback (most recent call last): File "D:/pyscript/py3script/python66/test/a.py", line 6, inobj = tld.get_tld(url,as_object=True) File "D:python3installlibsite-packages ldutils.py", line 490, in get_tld parser_class=parser_class File "D:python3installlibsite-packages ldutils.py", line 328, in process_url raise TldBadUrl(url=url) tld.exceptions.TldBadUrl: Is not a valid URL www.python66.com!
1、一个普通的域名
# -*- coding: utf-8 -*- import tld url = 'http://www.python66.com' obj = tld.get_tld(url,as_object=True) print(obj.domain) print(obj.extension) print(obj.fld) print(obj.subdomain) print(obj.suffix)
python66 com python66.com www com
2、一个层级较多的子域名
# -*- coding: utf-8 -*- import tld url = 'http://www.python66.com.cn.uk' obj = tld.get_tld(url,as_object=True) print(obj.domain) print(obj.extension) print(obj.fld) print(obj.subdomain) print(obj.suffix)
cn uk cn.uk www.python66.com uk
3、一个特殊后缀的域名(如果你写的后悔比较冷门,tld库本身没有记录就会报错)
didn't match any existing TLD name!
# -*- coding: utf-8 -*- import tld url = 'http://www.anjuke.co.ui' obj = tld.get_tld(url,as_object=True) print(obj.domain) print(obj.extension) print(obj.fld) print(obj.subdomain) print(obj.suffix)
Traceback (most recent call last): File "D:/pyscript/py3script/python66/test/a.py", line 6, inobj = tld.get_tld(url,as_object=True) File "D:python3installlibsite-packages ldutils.py", line 490, in get_tld parser_class=parser_class File "D:python3installlibsite-packages ldutils.py", line 378, in process_url raise TldDomainNotFound(domain_name=domain_name) tld.exceptions.TldDomainNotFound: Domain www.anjuke.co.ui didn't match any existing TLD name!
很赞哦!
python编程网提示:转载请注明来源www.python66.com。
有宝贵意见可添加站长微信(底部),获取技术资料请到公众号(底部)。同行交流请加群
文章评论
-
python提取url的顶级域名及域名后缀文章写得不错,值得赞赏