来源:python中国网 时间:2020-04-25

  python的urlparse模块可以解析域名,但是有个缺陷无法提取url的顶级域名。如果要做一些复杂的提取可以用tld模块。安装方式就是pip命令安装。

	  pip install tld

  以下代码我们用该模块获取一个url的顶级域名(不含后缀部分)、域名后缀、顶级域名(带后缀部分)、子域名部分(不含后缀)

  提醒:必须含有协议,比如http或者https,否则会报错。如下:

  Is not a valid URL www.python66.com!

# -*- coding: utf-8 -*-

import tld

url = 'www.python66.com'
obj = tld.get_tld(url,as_object=True)

Traceback (most recent call last):
  File "D:/pyscript/py3script/python66/test/a.py", line 6, in 
    obj = tld.get_tld(url,as_object=True)
  File "D:python3installlibsite-packages	ldutils.py", line 490, in get_tld
    parser_class=parser_class
  File "D:python3installlibsite-packages	ldutils.py", line 328, in process_url
    raise TldBadUrl(url=url)
tld.exceptions.TldBadUrl: Is not a valid URL www.python66.com!


  1、一个普通的域名

# -*- coding: utf-8 -*-

import tld

url = 'http://www.python66.com'
obj = tld.get_tld(url,as_object=True)

print(obj.domain)
print(obj.extension)
print(obj.fld)
print(obj.subdomain)
print(obj.suffix)


python66
com
python66.com
www
com


  2、一个层级较多的子域名

# -*- coding: utf-8 -*-

import tld


url = 'http://www.python66.com.cn.uk'
obj = tld.get_tld(url,as_object=True)

print(obj.domain)
print(obj.extension)
print(obj.fld)
print(obj.subdomain)
print(obj.suffix)

cn
uk
cn.uk
www.python66.com
uk



  3、一个特殊后缀的域名(如果你写的后悔比较冷门,tld库本身没有记录就会报错)

  didn't match any existing TLD name!

# -*- coding: utf-8 -*-

import tld

url = 'http://www.anjuke.co.ui'
obj = tld.get_tld(url,as_object=True)
print(obj.domain)
print(obj.extension)
print(obj.fld)
print(obj.subdomain)
print(obj.suffix)

Traceback (most recent call last):
  File "D:/pyscript/py3script/python66/test/a.py", line 6, in 
    obj = tld.get_tld(url,as_object=True)
  File "D:python3installlibsite-packages	ldutils.py", line 490, in get_tld
    parser_class=parser_class
  File "D:python3installlibsite-packages	ldutils.py", line 378, in process_url
    raise TldDomainNotFound(domain_name=domain_name)
tld.exceptions.TldDomainNotFound: Domain www.anjuke.co.ui didn't match any existing TLD name!


每篇文章旨在解决一个问题,有帮助,赏瓶水。

疑难杂症

Tips:不要怕、多坚持、多动手、付出必有回报。