您的位置: 网站首页> 大家问> 当前文章

python提取url的顶级域名及域名后缀

老董2020-04-25154围观,100赞

  python的urlparse模块可以解析域名,但是有个缺陷无法提取url的顶级域名。如果要做一些复杂的提取可以用tld模块。安装方式就是pip命令安装。

	  pip install tld

  以下代码我们用该模块获取一个url的顶级域名(不含后缀部分)、域名后缀、顶级域名(带后缀部分)、子域名部分(不含后缀)

  提醒:必须含有协议,比如http或者https,否则会报错。如下:

  Is not a valid URL www.python66.com!

# -*- coding: utf-8 -*-

import tld

url = 'www.python66.com'
obj = tld.get_tld(url,as_object=True)

Traceback (most recent call last):
  File "D:/pyscript/py3script/python66/test/a.py", line 6, in 
    obj = tld.get_tld(url,as_object=True)
  File "D:python3installlibsite-packages	ldutils.py", line 490, in get_tld
    parser_class=parser_class
  File "D:python3installlibsite-packages	ldutils.py", line 328, in process_url
    raise TldBadUrl(url=url)
tld.exceptions.TldBadUrl: Is not a valid URL www.python66.com!


  1、一个普通的域名

# -*- coding: utf-8 -*-

import tld

url = 'http://www.python66.com'
obj = tld.get_tld(url,as_object=True)

print(obj.domain)
print(obj.extension)
print(obj.fld)
print(obj.subdomain)
print(obj.suffix)


python66
com
python66.com
www
com


  2、一个层级较多的子域名

# -*- coding: utf-8 -*-

import tld


url = 'http://www.python66.com.cn.uk'
obj = tld.get_tld(url,as_object=True)

print(obj.domain)
print(obj.extension)
print(obj.fld)
print(obj.subdomain)
print(obj.suffix)

cn
uk
cn.uk
www.python66.com
uk



  3、一个特殊后缀的域名(如果你写的后悔比较冷门,tld库本身没有记录就会报错)

  didn't match any existing TLD name!

# -*- coding: utf-8 -*-

import tld

url = 'http://www.anjuke.co.ui'
obj = tld.get_tld(url,as_object=True)
print(obj.domain)
print(obj.extension)
print(obj.fld)
print(obj.subdomain)
print(obj.suffix)

Traceback (most recent call last):
  File "D:/pyscript/py3script/python66/test/a.py", line 6, in 
    obj = tld.get_tld(url,as_object=True)
  File "D:python3installlibsite-packages	ldutils.py", line 490, in get_tld
    parser_class=parser_class
  File "D:python3installlibsite-packages	ldutils.py", line 378, in process_url
    raise TldDomainNotFound(domain_name=domain_name)
tld.exceptions.TldDomainNotFound: Domain www.anjuke.co.ui didn't match any existing TLD name!


很赞哦!

python编程网提示:转载请注明来源www.python66.com。
有宝贵意见可添加站长微信(底部),获取技术资料请到公众号(底部)。同行交流请加群 python学习会

文章评论

    python提取url的顶级域名及域名后缀文章写得不错,值得赞赏

站点信息

  • 网站程序:Laravel
  • 客服微信:a772483200