Giter Site home page Giter Site logo

psl's Introduction

publicsuffixlist

Public Suffix List parser implementation for Python 2.5+/3.x.

  • Compliant with TEST DATA
  • Support IDN (unicode or punycoded).
  • Support Python2.5+ and Python 3.x
  • Shipped with built-in PSL and update scripts.
  • Written in Pure Python. No library dependencies.

Install

publicsuffixlist can be installed via pip or pip3.

$ sudo pip install publicsuffixlist

If you are on a bit old destribution (RHEL/CentOS6.x), you may need to update pip itself before install.

$ sudo pip install -U pip

Usage

from publicsuffixlist import PublicSuffixList

psl = PublicSuffixList()
# uses built-in PSL file

psl.publicsuffix("www.example.com")   # "com"
# longest public suffix part

psl.privatesuffix("www.example.com")  # "example.com"
# shortest domain assigned for a registrant

psl.privatesuffix("com") # None
# None if no private (non-public) part found


psl.publicsuffix("www.example.unknownnewtld") # "unkownnewtld"
# new TLDs are valid public suffix by default

psl.publicsuffix(u"www.example.香港")   # u"香港"
# accept unicode

psl.publicsuffix("www.example.xn--j6w193g") # "xn--j6w193g"
# accept punycoded IDNs by default

Latest PSL can be passed as a file like line-iterable object.

with open("latest_psl.dat", "rb") as f:
    psl = PublicSuffixList(f)

Works with both Python 2.x and 3.x.

$ python -m publicsuffixlist.test
...............
----------------------------------------------------------------------
Ran 15 tests in 2.898s

OK
$ python3 -m publicsuffixlist.test
...............
----------------------------------------------------------------------
Ran 15 tests in 2.562s

OK

Drop-in compat code to replace publicsuffix

# from publicsuffix import PublicSuffixList
from publicsuffixlist.compat import PublicSuffixList

psl = PublicSuffixList()
psl.suffix("www.example.com")   # return "example.com"
psl.suffix("com")               # return ""

Limitation

publicsuffixlist do NOT provide domain name validation. In DNS protocol, most of 8-bit charactors are valid label of domain name. ICANN compliant registries do not accept domain names that have _ (underscore) but hostname may have. (DMARC records, for example.)

Users need to confirm input is valid based on the users' context.

License

  • This module is licensed under Mozilla Public License 2.0.
  • Public Suffix List maintained by Mozilla Foundation is licensed under Mozilla Public License 2.0.
  • PSL testcase dataset is public domain (CC0).

Source / Link

psl's People

Contributors

ko-zu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.