Giter Site home page Giter Site logo

validators's Introduction

validators - Python Data Validation for Humans™

PyCQA SAST Docs Version Downloads

Python has all kinds of data validation tools, but every one of them seems to require defining a schema or form. I wanted to create a simple validation library where validating a simple value does not require defining a form or a schema.

>>> import validators
>>> 
>>> validators.email('[email protected]')
True

Resources


Python 3.8 reaches EOL in October 2024.

validators's People

Contributors

adrienthiery avatar automationator avatar darkdragon-001 avatar dependabot[bot] avatar imperosol avatar jjjjw avatar jpvanhal avatar khink avatar kingbuzzman avatar kvesteri avatar lconceicao avatar mondeja avatar msamsami avatar piewpiew avatar prousso avatar reahaas avatar riconnon avatar rossmacarthur avatar shaunpud avatar shouhei avatar simonit avatar timb07 avatar timonpeng avatar tpatja avatar tswfi avatar vanzhiganov avatar vphilippon avatar vuolter avatar woodruffw avatar yozachar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

validators's Issues

Incorrect E-Mail validation

Hello guys,

I think something is wrong with e-mail validation. Can we brainstorm this issue?

Python 3.6.1 (default, Apr  4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> validators.email("[email protected]")
True
>>> validators.email("[email protected]")
True
>>> validators.email("[email protected]<script>q")
True
>>> validators.email("test@testqqqmq<script>q")
ValidationFailure(func=email, args={'value': 'test@testqqqmq<script>q', 'whitelist': None})

url validators fails for misformed url when public=True

Hi,

>>> validators.url("http://10.0.0.1", public=True)
ValidationFailure(func=url, args={'value': 'http://10.0.0.1', 'public': True})

So far so good. But:

>>> validators.url("foo://10.0.0.1", public=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-13>", line 2, in url
  File "/home/vagrant/data/pyrest/flask/lib/python3.4/site-packages/validators/utils.py", line 81, in wrapper
    value = func(*args, **kwargs)
  File "/home/vagrant/data/pyrest/flask/lib/python3.4/site-packages/validators/url.py", line 95, in url
    if match_result.groupdict()['private_ip']:
AttributeError: 'NoneType' object has no attribute 'groupdict'

An easy fix should be to modify the end url.py that way:

     match_result = pattern.match(value)

-    if match_result.groupdict()['private_ip']:
-        return False
-
-    return match_result
+    return match_result and not match_result.groupdict()['private_ip']

url validator fails for 'localhost'

I'm trying to use validators.url() with addresses on localhost, and it fails. For example:

In [8]: validators.url('http://localhost')
Out[8]: ValidationFailure(func=url, args={'public': False, 'value': 'http://localhost'})

Is this intentional?

Text string that causes validators.domain to lock at 100% CPU usage

The following string makes validators 0.12.2 get lock at 100% CPU forever.

crazy_string='p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'

Logs:

root@db8fc13972fc:/myapp# pip list | grep validators
validators         0.12.2     
root@db8fc13972fc:/myapp# python --version
Python 2.7.15
root@db8fc13972fc:/myapp# python
Python 2.7.15 (default, May  5 2018, 03:27:20) 
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> crazy_string='p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'
>>> validators.domain(crazy_string)

(The validators.domain(crazy_string) never ends. If I go to htop, the python process is at 100% CPU)

With version 0.12.1 this doesn't happen.

Logs:

root@db8fc13972fc:/myapp# pip list | grep validators
validators         0.12.1     
root@db8fc13972fc:/myapp# python
Python 2.7.15 (default, May  5 2018, 03:27:20) 
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> crazy_string='p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'
>>> validators.domain(crazy_string)
ValidationFailure(func=domain, args={'value': 'p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'})

The environment for tests was python:2.7-stretch Docker image from 2 days ago.

Temporary fix:
uninstall 0.12.2 and install 0.12.1

URL validator fails with one digit port

URL validator fails when validating urls with one digit port

>>> validators.url("http://google.com/test")
True
>>> validators.url("http://google.com:80/test")
True
>>> validators.url("http://google.com:10/test")
True
>>> validators.url("http://google.com:9/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:9/test'})
>>> validators.url("http://google.com:8/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:8/test'})
>>> validators.url("http://google.com:7/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:7/test'})
>>> validators.url("http://google.com:2/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:2/test'})
>>> validators.url("http://google.com:1/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:1/test'})

I think the problem is in this line, forcing the port number having 2 to 5 digits
https://github.com/kvesteri/validators/blob/8cf1e8fb5ed3af3d428b0230c50d63d55dd0939a/validators/url.py#L45

IPv4 formatted IP address returning True on ipv6

Looks like 0.12.0 is introducing support for IPv4-compatible + IPv4-mapped with #56 .

But it looks like any IPv4 address will now return True on IPv6, regardless of whether it has been prefixed to be IPv6 compatible.

>>> import validators
>>> validators.ipv6('::192.0.2.128')
True
>>> validators.ipv6('192.0.2.128')
True

Unicode chacter problem

I have a encode domain like "xn----gtbspbbmkef.xn--p1ai"
it is valid domain but validators raise ValidationFailure

a = idna.decode('xn----gtbspbbmkef.xn--p1ai')
validators.domain(a)
ValidationFailure(func=domain, args={'value': 'доктор-ост.рф'})

Another problem, underline subnet domains

validators.domain('victor_caffarena.tripod.com')
ValidationFailure(func=domain, args={'value': 'victor_caffarena.tripod.com'})

valid domain return an invalid error

Inputting a valid domain name "ktbooks.1.v77.faidns.com", I received a invalid domain error. A domain name can include only one digit between 2 dots. So the pattern could be changed like this:

pattern = re.compile(
r'^(:?(([a-zA-Z0-9]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|' # domain pt.1
r'([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|' # domain pt.2
r'([a-zA-Z0-9][-_a-zA-Z0-9]{0,61}[a-zA-Z0-9])).)+' # domain pt.3
r'([a-zA-Z]{2,13}|(xn--[a-zA-Z0-9]{2,30}))$' # TLD
)

validate if a URL is an image

I have a feature request: validate a URL if it's an image or not.

Rationale: I'm writing a scraper that extracts all links. From this list I want to filter out just the image links. I think it could be done simply by analyzing the URL string: if it contains .jpg or .png or .gif then it's an image.

Edit: here is my current solution

IMAGE_EXTENSIONS = ['jpg', 'jpeg', 'png', 'gif']    # can be extended

def is_image(path):
    """
    Path can be a URL or a local file. Decide if it's an image or not.
    """
    path = path.lower()
    for x in IMAGE_EXTENSIONS:
        if x in path:
            return True
    #
    return False

Domain validator

I would love to see a domain validator. It could probably be split out from the e-mail validator.

Something along the lines of:

^(?!\-)(?:[a-zA-Z\d\-]{0,62}[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$

domain validator validates domain with double dots.

$ python
Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators

>>> print validators.__version__
0.11.2
>>> validators.domain("ietf..org")
ValidationFailure(func=domain, args={'value': 'ietf..org'})
>>> validators.domain("www..ietf.org")
True
>>> 
$ host www..ietf.org
host: 'www..ietf.org' is not a legal name (unexpected end of input)
$ dig www..ietf.org
dig: 'www..ietf.org' is not a legal name (unexpected end of input)

Double dots, which seem to be illegal in dns names [1], are not rejected unless they are between the last two labels.

[1] RFC 1912:

DNS domain names consist of "labels" separated by single dots.

fresh installation fails

Important Could this be fixed and released in a patch version so that our users don't have an issue ? We have a FOSS Asia stall where we expect a lot of new users.

If you like, I can submit a patch ?

Hi !
I'm a maintainer of coala - https://github.com/coala-analyzer/coala and we recently started using your package in coala, because it's awesome.

We just did a release yesterday and when doing some basic testing, we found a bug in your package.

You use setuptools in setup.py but don't depend on it in install_requires. Hence, if I don't have setuptools already installed, I'm unable to install your package.

$ python -c "import setuptools"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'setuptools'
$ pip install validators
Collecting validators
  Using cached validators-0.10.tar.gz
Could not import setuptools which is required to install from a source distribution.
Please install setuptools.
You are using pip version 8.0.2, however version 8.1.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

@ coala we normally add setuptools>=19.2 to our install_requires and it works fine.
For example, this is what installing pyprint (one of our tools) without setuptools does:

$ pip install PyPrint
Collecting PyPrint
  Using cached PyPrint-0.2.3-py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): termcolor~=1.1.0 in /home/ajk/.pyenv/versions/3.5.1/lib/python3.5/site-packages (from PyPrint)
Requirement already satisfied (use --upgrade to upgrade): colorama~=0.3.6 in /home/ajk/.pyenv/versions/3.5.1/lib/python3.5/site-packages (from PyPrint)
Collecting setuptools>=19.2 (from PyPrint)
  Using cached setuptools-20.3-py2.py3-none-any.whl
Installing collected packages: setuptools, PyPrint
Successfully installed PyPrint-0.2.3 setuptools-20.3
You are using pip version 8.0.2, however version 8.1.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

You can check out pyprint at https://github.com/coala-analyzer/pyprint

Url validator mishandles upper case and repeated hyphens.

I claim both of these should work:

Case-insensitivity in hostnames:

>>> validators.url('http://www.GOOGLE.com/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://www.GOOGLE.com/'})

Ref: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names

Multiple sequential hyphens:

>>> validators.url('http://xn--bcher-kva.ch/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--bcher-kva.ch/'})

Ref: https://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_IDNA_encoding

validator.url('http://127.0.0.1:8080/') fails

Version: 0.10.2

Wondering if this is a new behavior? I didn't have a chance to look at the code, though..

from validators import url

if url('http://127.0.0.1:8080/'):
    print("success")
else:
    print("failure")
failure

URL validation arbitrarily supports FTP, but not other protocols

A URL can have any of an arbitrary number of protocols, but only HTTP, HTTPS and FTP are supported. If we have to arbitrarily restrict which protocols are allowed, we should at least follow the principle of least surprise and only allow HTTP(S). FTP is a weird addition.

Alternatively, supported protocols could be supplied.

Test for 3.7 is failing to start

The test for python 3.7 is not even succeeding to download the python interpret in travis.

I think this is because 3.7 requires a workaround with sudo: true in travis to work properly in 3.7. I can create a PR for this if you want help.

domain('foo.bar/baz') validates as 'True'

# bin/pip3 show validators
Name: validators
Version: 0.12.0
...
# bin/python3
Python 3.6.1 (default, Apr  4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from validators import domain
>>> domain('foo.bar.baz')
True
>>> domain('foo/bar.baz')
ValidationFailure(func=domain, args={'value': 'foo/bar.baz'})
>>> domain('foo.bar/baz')
True
>>> domain('foo.bar/')
ValidationFailure(func=domain, args={'value': 'foo.bar/'})
>>>

Using Deprecated function in inspect

/usr/local/lib/python3.5/dist-packages/validators/utils.py:43: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
inspect.getargspec(func)[0],

Valid domain name taken as invalid

The following domain name is valid but validators take it as invalid

>>> validators.domain("0-1-0-0-1-0-0-0-1-0-1-1-0-1-1-1-1-0-1-1-1-0-0-0-1-1-1-1-1-1-1-.0-0-0-0-0-0-0-0-0-0-0-0-0-60-0-0-0-0-0-0-0-0-0-0-0-0-0.info")
ValidationFailure(func=domain, args={'value': '0-1-0-0-1-0-0-0-1-0-1-1-0-1-1-1-1-0-1-1-1-0-0-0-1-1-1-1-1-1-1-.0-0-0-0-0-0-0-0-0-0-0-0-0-60-0-0-0-0-0-0-0-0-0-0-0-0-0.info'})

image

image

Methods using regex ignore the last \n .

I think it would be better to fix those regular expressions from $ to \Z .

$ pip freeze | grep validators
validators==0.12.2
$ python
Python 3.6.5 (default, Jun 16 2018, 01:20:19)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> validators.domain("""example.com
... """)
True

url without http

would be nice if there would be an option to validate a url without http(s) at the beginning

Valid URLs detected as invalid

These two URLs should be valid, but validators doesn't like the double hyphen -- and it doesn't like the domain name xn--p1ai neither.

>>> validators.url("http://pharma--partners.com/bfayz/shit.exe")
ValidationFailure(func=url, args={'public': False, 'value': 'http://pharma--partners.com/bfayz/shit.exe'})
>>> validators.url("http://xn--k1acdflk8dk.xn--p1ai/daa4wb/")
ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--k1acdflk8dk.xn--p1ai/daa4wb/'})

validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels)

As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing valid IDNs in A-label form:

In [1]: import validators
In [2]: validators.url('http://xn--j1ail.xn--p1ai')
Out[2]: ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--j1ail.xn--p1ai'})

This failure is caused by the fact that the regex for validators.url only allows for repetition of hyphens as part of larger groups within the host and domain name sections. These groups must begin with a non-hyphen character, thus preventing sequential hyphens. For the TLD section no such group even exists; hyphens aren't permitted at all. The relevant portion of the regex is found on lines 36-41 of url.py:

# host name
u"(?:(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)"
# domain name
u"(?:\.(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)*"
# TLD identifier
u"(?:\.(?:[a-z\u00a1-\uffff]{2,}))"

The issue also occurs when processing URLs of valid domains that have consecutive hyphens in their name. While such domain names are less common and may be frowned upon by certain registries, they are still technically valid according to the RFC. Here are the dig and whois results for one such domain:

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @8.8.8.8 online--trading.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31443
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;online--trading.com.		IN	A

;; ANSWER SECTION:
online--trading.com.	899	IN	A	195.110.124.133

;; Query time: 167 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Apr 03 15:03:25 PDT 2018
;; MSG SIZE  rcvd: 64
Domain Name: ONLINE--TRADING.COM
Registry Domain ID: 2171387112_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.register.it
Registrar URL: http://www.register.it
Updated Date: 2017-10-06T18:54:58Z
Creation Date: 2017-10-06T18:54:58Z
Registry Expiry Date: 2018-10-06T18:54:58Z
Registrar: Register.it SPA
Registrar IANA ID: 168
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +39.5520021555
Domain Status: ok https://icann.org/epp#ok
Name Server: NS1.REGISTER.IT
Name Server: NS2.REGISTER.IT
DNSSEC: unsigned
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/

It's arguable whether domains like this should pass validators.url since they're somewhat of an edge case for everyday users. It may not be worth letting potentially erroneous URLs through just to prevent a few oddball domains from failing validation. The IDNA A-labels are a different story though -- those should absolutely pass without requiring the user to convert them beforehand. Python's built-in IDNA decoder cannot properly convert IDNA domains that are contained within URLs, so it's fairly onerous to expect the user to do that before using validators.url.

Modifying the regex to match anything that follows the IDNA A-label format is not an ideal solution since invalid A-labels can be generated using valid characters (e.g. "xn--aaaa"). Since the existing regex already checks for the Unicode characters used by IDNA U-labels, I think the ideal solution would be to isolate and convert possible IDNA hostnames before reassembling the URL and matching it against the existing regex. I've made a version of url.py that should make this fairly painless; expect my PR shortly.

Simplify implementations of UUID and URL validators.

A few thoughts that I think would help improve flexibility of your code:

For the UUID validator

Instead of using a pattern matcher which doesn’t accept valid strings without hyphens (e.g. "1219ffcf45c78964b04a3290cf84183a"), why not just use Python’s own uuid module?

@validator
def uuid(value):
    try:
        return uuid.UUID(value) is not None
    except ValueError:
        return False

For the URL validator

Again an impressive funk of patterns 😉Why not use the rfc3987 module like so:

@validator
def url(value, public=False):
    try:
        result = rfc3987.parse(value)
        # Do `public` stuff here.
    except ValueError:
        return False

Note that this module also gives you a whole heap of regexes which you could use here instead of duplicating code.

And…

To simplify the above code, you’d also move the exception handling into the decorator:

def validator(func, *args, **kwargs):
    def wrapper(func, *args, **kwargs):
        try:
            value = func(*args, **kwargs)
            if not value:
                return ValidationFailure(
                    func, func_args_as_dict(func, args, kwargs)
                )
            return True
        except Exception:  # In the above two cases, ValueError.
           return ValidationFailure(func, func_args_as_dict(func, args, kwargs))
    return decorator(wrapper, func)

validators.email issue with `"`

""tavi.ivat."@yahoo.com"

In [1]: import validators
In [2]: validators.email(""john.doe."@yahoo.com")
Out[2]: True

This doesn't look right.

Improvements to URL validation

A few recommendations:

  • include re.IGNORECASE for URL validation
  • URL validation does not handle internationalized TLDs at the moment (as it only expects [a-z] in TLDs)
  • URL validation does not handle port numbers in netlocs

validators.url hang

When validating the following url, the call gets blocked and CPU spikes to 100%

$ pip freeze | grep validator
validators==0.12.0
$ python
Python 2.7.10 (default, Feb  7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> validators.url('http://172.20.201.135-10.10.10.1656172.20.11.80-10.10.10.1746172.16.9.13-192.168.17.68610.10.10.226-192.168.17.64610.10.10.226-192.168.17.63610.10.10.226-192.168.17.62610.10.10.226-192.168.17.61610.10.10.226-192.168.17.60610.10.10.226-192.168.17.59610.10.10.226-192.168.17.58610.10.10.226-192.168.17.57610.10.10.226-192.168.17.56610.10.10.226-192.168.17.55610.10.10.226-192.168.17.54610.10.10.226-192.168.17.53610.10.10.226-192.168.17.52610.10.10.226-192.168.17.51610.10.10.195-10.10.10.2610.10.10.194-192.168.17.685172.20.11.52-10.10.10.195510.10.10.226-192.168.17.50510.10.10.186-172.20.11.1510.10.10.165-198.41.0.54192.168.84.1-192.168.17.684192.168.222.1-192.168.17.684172.20.11.52-10.10.10.174410.10.10.232-172.20.201.198410.10.10.228-172.20.201.1983192.168.17.135-10.10.10.1423192.168.17.135-10.10.10.122310.10.10.224-172.20.201.198310.10.10.195-172.20.11.1310.10.10.160-172.20.201.198310.10.10.142-192.168.17.1352192.168.22.207-10.10.10.2242192.168.17.66-10.10.10.1122192.168.17.135-10.10.10.1122192.168.17.129-10.10.10.1122172.20.201.198-10.10.10.2282172.20.201.198-10.10.10.2242172.20.201.1-10.10.10.1652172.20.11.2-10.10.10.1412172.16.8.229-12.162.170.196210.10.10.212-192.168.22.133')

Control+C breaking the execution shows the following stacks:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-18>", line 2, in url
  File "/Users/librah/virtualenv/w4cs/lib/python2.7/site-packages/validators/utils.py", line 78, in wrapper
    value = func(*args, **kwargs)
  File "/Users/librah/virtualenv/w4cs/lib/python2.7/site-packages/validators/url.py", line 108, in url
    result = pattern.match(value)
KeyboardInterrupt

Domain Validator - International Domains

I think that the domain validator does not word for all cases of international domains, eg. thepiratebay.xn--fiqs8s is a working domain but the validator says it is invalid.

I located a fix in the code I believe should fix the issue - in the regex pattern:

pattern = re.compile(
r'^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|'
r'([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|'
r'([a-zA-Z0-9][-_.a-zA-Z0-9]{0,61}[a-zA-Z0-9])).'
r'([a-zA-Z]{2,13}|[a-zA-Z0-9-]{2,30}.[a-zA-Z0-9]{2,3})$'
)

Cheers

URL validation fails on Russian punycode TLD

>>> url('http://президент.рф/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://\xd0\xbf\xd1\x80\xd0\xb5\xd0\xb7\xd0\xb8\xd0\xb4\xd0\xb5\xd0\xbd\xd1\x82.\xd1\x80\xd1\x84/'})
>>> url('http://xn--d1abbgf6aiiy.xn--p1ai/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--d1abbgf6aiiy.xn--p1ai/'})

Yet these are valid URLs, see https://en.wikipedia.org/wiki/.%D1%80%D1%84

The domain has an ASCII representation of xn--p1ai derived as Punycode

Why msn.comm validates?

I have the following code:

import validators

url = 'msn.comm'

if (validators.domain(url)) or (validators.url(url)):
    print("VALIDATION PASSED")
else:
    print("Please insert a domain name in this format: example.com")

And the msn.comm validates. Why is that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.