python-validators / validators Goto Github PK

Python Data Validation for Humans™.

License: MIT License

Python 98.17% PowerShell 0.97% Shell 0.86%

validators's Introduction

validators - Python Data Validation for Humans™

Python has all kinds of data validation tools, but every one of them seems to require defining a schema or form. I wanted to create a simple validation library where validating a simple value does not require defining a form or a schema.

>>> import validators
>>> 
>>> validators.email('[email protected]')
True

Resources

Python 3.8 reaches EOL in October 2024.

validators's People

Contributors

Stargazers

Watchers

Forkers

khink boneyao miigotu gokhanm adamchainz lconceicao ii0 rickey-g randy-ran suzaku sinistral2099 jjjjw blueroutecn johannestaas losintikfos vladimirdotk rmarsollier jtprince pombredanne vuolter bytearchive musale wongxinjie ajgabz enzymz edward01 timb07 tjungbauer piewpiew ni-knight pmbi nullripper johndlong gptcod shouhei gallaecio quantus iamanikeev benyaminsalimi vxsh4d0w riconnon teemu dehort agiletechnologist marmeladema yannjun woodruffw-forks pythonthings securityanalysts hazhzeng brl0 jiezhigang shivashankerreddy jmeridth little-dude reahaas seahrh mondeja fancycade automationator 05bit vphilippon jdvala knazin letsch22 pr0fg piercema joshuajcoronado mfsy tpatja aymericderbois xrmx robertoberto tomguyatt norbiox renkai95 vuduclyunitn xyzlat xxxvxxx world4jason ktdreyer pkjmesra brandonserna vanzhiganov rudy-reyn iotspace balshgit alimcmaster1 simonit yhcodes warifp totoro2205 gurkin33 jonasbjork djs4642 k3vral drocpdp ivanreen iisaka51 rcorbish

validators's Issues

Incorrect E-Mail validation

Hello guys,

I think something is wrong with e-mail validation. Can we brainstorm this issue?

Python 3.6.1 (default, Apr  4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> validators.email("[email protected]")
True
>>> validators.email("[email protected]")
True
>>> validators.email("[email protected]<script>q")
True
>>> validators.email("test@testqqqmq<script>q")
ValidationFailure(func=email, args={'value': 'test@testqqqmq<script>q', 'whitelist': None})

url validators fails for misformed url when public=True

Hi,

>>> validators.url("http://10.0.0.1", public=True)
ValidationFailure(func=url, args={'value': 'http://10.0.0.1', 'public': True})

So far so good. But:

>>> validators.url("foo://10.0.0.1", public=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-13>", line 2, in url
  File "/home/vagrant/data/pyrest/flask/lib/python3.4/site-packages/validators/utils.py", line 81, in wrapper
    value = func(*args, **kwargs)
  File "/home/vagrant/data/pyrest/flask/lib/python3.4/site-packages/validators/url.py", line 95, in url
    if match_result.groupdict()['private_ip']:
AttributeError: 'NoneType' object has no attribute 'groupdict'

An easy fix should be to modify the end url.py that way:

     match_result = pattern.match(value)

-    if match_result.groupdict()['private_ip']:
-        return False
-
-    return match_result
+    return match_result and not match_result.groupdict()['private_ip']

[email protected]|whatisthis

This doesn't look right (again):

In [4]: validators.email('[email protected]|whatisthis')
Out[4]: True

In [5]: validators.__version__
Out[5]: '0.10.1'

url validator fails for 'localhost'

I'm trying to use validators.url() with addresses on localhost, and it fails. For example:

In [8]: validators.url('http://localhost')
Out[8]: ValidationFailure(func=url, args={'public': False, 'value': 'http://localhost'})

Is this intentional?

Text string that causes validators.domain to lock at 100% CPU usage

The following string makes validators 0.12.2 get lock at 100% CPU forever.

crazy_string='p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'

Logs:

root@db8fc13972fc:/myapp# pip list | grep validators
validators         0.12.2     
root@db8fc13972fc:/myapp# python --version
Python 2.7.15
root@db8fc13972fc:/myapp# python
Python 2.7.15 (default, May  5 2018, 03:27:20) 
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> crazy_string='p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'
>>> validators.domain(crazy_string)

(The validators.domain(crazy_string) never ends. If I go to htop, the python process is at 100% CPU)

With version 0.12.1 this doesn't happen.

Logs:

root@db8fc13972fc:/myapp# pip list | grep validators
validators         0.12.1     
root@db8fc13972fc:/myapp# python
Python 2.7.15 (default, May  5 2018, 03:27:20) 
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> crazy_string='p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'
>>> validators.domain(crazy_string)
ValidationFailure(func=domain, args={'value': 'p.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.wo.'})

The environment for tests was python:2.7-stretch Docker image from 2 days ago.

Temporary fix:
uninstall 0.12.2 and install 0.12.1

URLs with invalid characters in userinfo part are incorrectly validated

According to RFCs 3986 and 3987 (for IRIs), certain characters aren't permitted in the userinfo part (the optional username:password@ bit). Currently the URL validator doesn't check those characters, so invalid URLs such as http://example.com/">[email protected] are returned as valid.

validator fails with url containing double hyphens for

validator 11.1 fails validating a URL with double hyphens for example

http://my--domain.com fails .. However there are many sites using double dashes in their domain name ..

Thanks for the python module -- I use it regularly.

validators.domain('a......b.com') returns True

I guess this is an error.

domain validator fails for äöü

if a domain contains german letters like ä, ö or ü, which would be valid, the validator fails.

URL validator fails with one digit port

URL validator fails when validating urls with one digit port

>>> validators.url("http://google.com/test")
True
>>> validators.url("http://google.com:80/test")
True
>>> validators.url("http://google.com:10/test")
True
>>> validators.url("http://google.com:9/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:9/test'})
>>> validators.url("http://google.com:8/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:8/test'})
>>> validators.url("http://google.com:7/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:7/test'})
>>> validators.url("http://google.com:2/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:2/test'})
>>> validators.url("http://google.com:1/test")
ValidationFailure(func=url, args={'public': False, 'value': 'http://google.com:1/test'})

I think the problem is in this line, forcing the port number having 2 to 5 digits
https://github.com/kvesteri/validators/blob/8cf1e8fb5ed3af3d428b0230c50d63d55dd0939a/validators/url.py#L45

IPv4 formatted IP address returning True on ipv6

Looks like 0.12.0 is introducing support for IPv4-compatible + IPv4-mapped with #56 .

But it looks like any IPv4 address will now return True on IPv6, regardless of whether it has been prefixed to be IPv6 compatible.

>>> import validators
>>> validators.ipv6('::192.0.2.128')
True
>>> validators.ipv6('192.0.2.128')
True

validators.url Internationalisation validation fail

Please refer to some internationalisation
https://mathiasbynens.be/demo/url-regex

underscore is also part of internationalisation process for hostname (not just domain name)

For example, this is an valid hostname

http://adobe_photoshop.es.downloadastro.com/

Unicode chacter problem

I have a encode domain like "xn----gtbspbbmkef.xn--p1ai"
it is valid domain but validators raise ValidationFailure

a = idna.decode('xn----gtbspbbmkef.xn--p1ai')
validators.domain(a)
ValidationFailure(func=domain, args={'value': 'доктор-ост.рф'})

Another problem, underline subnet domains

validators.domain('victor_caffarena.tripod.com')
ValidationFailure(func=domain, args={'value': 'victor_caffarena.tripod.com'})

valid domain return an invalid error

Inputting a valid domain name "ktbooks.1.v77.faidns.com", I received a invalid domain error. A domain name can include only one digit between 2 dots. So the pattern could be changed like this:

pattern = re.compile(
r'^(:?(([a-zA-Z0-9]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|' # domain pt.1
r'([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|' # domain pt.2
r'([a-zA-Z0-9][-_a-zA-Z0-9]{0,61}[a-zA-Z0-9])).)+' # domain pt.3
r'([a-zA-Z]{2,13}|(xn--[a-zA-Z0-9]{2,30}))$' # TLD
)

invalid email addresses but validators return True

>>>import validators
>>>print validators.email('うえあいお@email.com')
>>>True

if you check this url email chacker with うえあいお@address.com , this is not valid email address ,

Thanks,
Ramin

validate if a URL is an image

I have a feature request: validate a URL if it's an image or not.

Rationale: I'm writing a scraper that extracts all links. From this list I want to filter out just the image links. I think it could be done simply by analyzing the URL string: if it contains .jpg or .png or .gif then it's an image.

Edit: here is my current solution

IMAGE_EXTENSIONS = ['jpg', 'jpeg', 'png', 'gif']    # can be extended

def is_image(path):
    """
    Path can be a URL or a local file. Decide if it's an image or not.
    """
    path = path.lower()
    for x in IMAGE_EXTENSIONS:
        if x in path:
            return True
    #
    return False

A false positive URL

'http://w..com' should be false but it returns true.

Domain validator

I would love to see a domain validator. It could probably be split out from the e-mail validator.

Something along the lines of:

^(?!\-)(?:[a-zA-Z\d\-]{0,62}[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$

Allows commas in domain validator

I receive a True response when pushing in a domain that has a comma.

EG: jgp.com,br

domain validator validates domain with double dots.

$ python
Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators

>>> print validators.__version__
0.11.2
>>> validators.domain("ietf..org")
ValidationFailure(func=domain, args={'value': 'ietf..org'})
>>> validators.domain("www..ietf.org")
True
>>> 
$ host www..ietf.org
host: 'www..ietf.org' is not a legal name (unexpected end of input)
$ dig www..ietf.org
dig: 'www..ietf.org' is not a legal name (unexpected end of input)

Double dots, which seem to be illegal in dns names [1], are not rejected unless they are between the last two labels.

[1] RFC 1912:

DNS domain names consist of "labels" separated by single dots.

feature request: validators.uri()

it may be worth adding a separate method, validators.uri()

https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

validators.url() could then be validators.uri.url() (while retaining validators.url() for backwards compatibility).

this would also allow file paths to be validated, e.g.

validators.uri.file('file:///tmp/file.txt')

IRC connections:

validators.uri.irc('irc://irc.freenode.org:6667/#somechannel

etc.

For official specification, see RFC3986 § 3.

Support for mailto: and tel: protocols ?

Not sure if these protocols are in the URL RFC, but we're tripping over this at the moment...

url validator fails on 1st and last IP address

ValueError: "http://x.x.x.255:83/" is not a valid url

This is a valid URL. networks are not always /24.

validators.url() fails for alternate ports

validators.url() fails if an alternate port is specified, e.g.

validators.url('http://sub.domain.tld:8080/index.php')

fresh installation fails

Important Could this be fixed and released in a patch version so that our users don't have an issue ? We have a FOSS Asia stall where we expect a lot of new users.

If you like, I can submit a patch ?

Hi !
I'm a maintainer of coala - https://github.com/coala-analyzer/coala and we recently started using your package in coala, because it's awesome.

We just did a release yesterday and when doing some basic testing, we found a bug in your package.

You use setuptools in setup.py but don't depend on it in install_requires. Hence, if I don't have setuptools already installed, I'm unable to install your package.

$ python -c "import setuptools"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'setuptools'
$ pip install validators
Collecting validators
  Using cached validators-0.10.tar.gz
Could not import setuptools which is required to install from a source distribution.
Please install setuptools.
You are using pip version 8.0.2, however version 8.1.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

@ coala we normally add setuptools>=19.2 to our install_requires and it works fine.
For example, this is what installing pyprint (one of our tools) without setuptools does:

$ pip install PyPrint
Collecting PyPrint
  Using cached PyPrint-0.2.3-py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): termcolor~=1.1.0 in /home/ajk/.pyenv/versions/3.5.1/lib/python3.5/site-packages (from PyPrint)
Requirement already satisfied (use --upgrade to upgrade): colorama~=0.3.6 in /home/ajk/.pyenv/versions/3.5.1/lib/python3.5/site-packages (from PyPrint)
Collecting setuptools>=19.2 (from PyPrint)
  Using cached setuptools-20.3-py2.py3-none-any.whl
Installing collected packages: setuptools, PyPrint
Successfully installed PyPrint-0.2.3 setuptools-20.3
You are using pip version 8.0.2, however version 8.1.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

You can check out pyprint at https://github.com/coala-analyzer/pyprint

Url validator mishandles upper case and repeated hyphens.

I claim both of these should work:

Case-insensitivity in hostnames:

>>> validators.url('http://www.GOOGLE.com/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://www.GOOGLE.com/'})

Ref: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names

Multiple sequential hyphens:

>>> validators.url('http://xn--bcher-kva.ch/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--bcher-kva.ch/'})

Ref: https://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_IDNA_encoding

validator.url('http://127.0.0.1:8080/') fails

Version: 0.10.2

Wondering if this is a new behavior? I didn't have a chance to look at the code, though..

from validators import url

if url('http://127.0.0.1:8080/'):
    print("success")
else:
    print("failure")

failure

URL validation arbitrarily supports FTP, but not other protocols

A URL can have any of an arbitrary number of protocols, but only HTTP, HTTPS and FTP are supported. If we have to arbitrarily restrict which protocols are allowed, we should at least follow the principle of least surprise and only allow HTTP(S). FTP is a weird addition.

Alternatively, supported protocols could be supplied.

Test for 3.7 is failing to start

The test for python 3.7 is not even succeeding to download the python interpret in travis.

I think this is because 3.7 requires a workaround with sudo: true in travis to work properly in 3.7. I can create a PR for this if you want help.

Magnet validator

It would be nice to have a magnet link validator.

Some examples in: https://es.wikipedia.org/wiki/Magnet

IPv6 unspecified address is not accepted

:: is the equivalent to 0.0.0.0 and used to specify to which IPv6 a server is bound.

The validator should accept that address.

cf. https://tools.ietf.org/html/rfc4291#section-2.5.2

domain('foo.bar/baz') validates as 'True'

# bin/pip3 show validators
Name: validators
Version: 0.12.0
...
# bin/python3
Python 3.6.1 (default, Apr  4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from validators import domain
>>> domain('foo.bar.baz')
True
>>> domain('foo/bar.baz')
ValidationFailure(func=domain, args={'value': 'foo/bar.baz'})
>>> domain('foo.bar/baz')
True
>>> domain('foo.bar/')
ValidationFailure(func=domain, args={'value': 'foo.bar/'})
>>>

Using Deprecated function in inspect

/usr/local/lib/python3.5/dist-packages/validators/utils.py:43: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
inspect.getargspec(func)[0],

Valid domain name taken as invalid

The following domain name is valid but validators take it as invalid

>>> validators.domain("0-1-0-0-1-0-0-0-1-0-1-1-0-1-1-1-1-0-1-1-1-0-0-0-1-1-1-1-1-1-1-.0-0-0-0-0-0-0-0-0-0-0-0-0-60-0-0-0-0-0-0-0-0-0-0-0-0-0.info")
ValidationFailure(func=domain, args={'value': '0-1-0-0-1-0-0-0-1-0-1-1-0-1-1-1-1-0-1-1-1-0-0-0-1-1-1-1-1-1-1-.0-0-0-0-0-0-0-0-0-0-0-0-0-60-0-0-0-0-0-0-0-0-0-0-0-0-0.info'})

Methods using regex ignore the last \n .

I think it would be better to fix those regular expressions from $ to \Z .

$ pip freeze | grep validators
validators==0.12.2
$ python
Python 3.6.5 (default, Jun 16 2018, 01:20:19)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> validators.domain("""example.com
... """)
True

url without http

would be nice if there would be an option to validate a url without http(s) at the beginning

Valid URLs detected as invalid

These two URLs should be valid, but validators doesn't like the double hyphen -- and it doesn't like the domain name xn--p1ai neither.

>>> validators.url("http://pharma--partners.com/bfayz/shit.exe")
ValidationFailure(func=url, args={'public': False, 'value': 'http://pharma--partners.com/bfayz/shit.exe'})
>>> validators.url("http://xn--k1acdflk8dk.xn--p1ai/daa4wb/")
ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--k1acdflk8dk.xn--p1ai/daa4wb/'})

ValidationFailure on real domains with accent characters

I have a know legit domain that contains the German accent ä. This domain is registered, public whois record etc.

Running it through validators.domain(target) returns a ValidationError.

Apparently such domains are valid since 2003: https://www.iis.se/english/domains/se/idn/

Is there epoch validator?

validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels)

As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing valid IDNs in A-label form:

In [1]: import validators
In [2]: validators.url('http://xn--j1ail.xn--p1ai')
Out[2]: ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--j1ail.xn--p1ai'})

This failure is caused by the fact that the regex for validators.url only allows for repetition of hyphens as part of larger groups within the host and domain name sections. These groups must begin with a non-hyphen character, thus preventing sequential hyphens. For the TLD section no such group even exists; hyphens aren't permitted at all. The relevant portion of the regex is found on lines 36-41 of url.py:

# host name
u"(?:(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)"
# domain name
u"(?:\.(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)*"
# TLD identifier
u"(?:\.(?:[a-z\u00a1-\uffff]{2,}))"

The issue also occurs when processing URLs of valid domains that have consecutive hyphens in their name. While such domain names are less common and may be frowned upon by certain registries, they are still technically valid according to the RFC. Here are the dig and whois results for one such domain:

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @8.8.8.8 online--trading.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31443
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;online--trading.com.		IN	A

;; ANSWER SECTION:
online--trading.com.	899	IN	A	195.110.124.133

;; Query time: 167 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Apr 03 15:03:25 PDT 2018
;; MSG SIZE  rcvd: 64

Domain Name: ONLINE--TRADING.COM
Registry Domain ID: 2171387112_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.register.it
Registrar URL: http://www.register.it
Updated Date: 2017-10-06T18:54:58Z
Creation Date: 2017-10-06T18:54:58Z
Registry Expiry Date: 2018-10-06T18:54:58Z
Registrar: Register.it SPA
Registrar IANA ID: 168
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +39.5520021555
Domain Status: ok https://icann.org/epp#ok
Name Server: NS1.REGISTER.IT
Name Server: NS2.REGISTER.IT
DNSSEC: unsigned
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/

It's arguable whether domains like this should pass validators.url since they're somewhat of an edge case for everyday users. It may not be worth letting potentially erroneous URLs through just to prevent a few oddball domains from failing validation. The IDNA A-labels are a different story though -- those should absolutely pass without requiring the user to convert them beforehand. Python's built-in IDNA decoder cannot properly convert IDNA domains that are contained within URLs, so it's fairly onerous to expect the user to do that before using validators.url.

Modifying the regex to match anything that follows the IDNA A-label format is not an ideal solution since invalid A-labels can be generated using valid characters (e.g. "xn--aaaa"). Since the existing regex already checks for the Unicode characters used by IDNA U-labels, I think the ideal solution would be to isolate and convert possible IDNA hostnames before reassembling the URL and matching it against the existing regex. I've made a version of url.py that should make this fairly painless; expect my PR shortly.

Simplify implementations of UUID and URL validators.

A few thoughts that I think would help improve flexibility of your code:

For the UUID validator

Instead of using a pattern matcher which doesn’t accept valid strings without hyphens (e.g. "1219ffcf45c78964b04a3290cf84183a"), why not just use Python’s own uuid module?

@validator
def uuid(value):
    try:
        return uuid.UUID(value) is not None
    except ValueError:
        return False

For the URL validator

Again an impressive funk of patterns 😉Why not use the rfc3987 module like so:

@validator
def url(value, public=False):
    try:
        result = rfc3987.parse(value)
        # Do `public` stuff here.
    except ValueError:
        return False

Note that this module also gives you a whole heap of regexes which you could use here instead of duplicating code.

And…

To simplify the above code, you’d also move the exception handling into the decorator:

def validator(func, *args, **kwargs):
    def wrapper(func, *args, **kwargs):
        try:
            value = func(*args, **kwargs)
            if not value:
                return ValidationFailure(
                    func, func_args_as_dict(func, args, kwargs)
                )
            return True
        except Exception:  # In the above two cases, ValueError.
           return ValidationFailure(func, func_args_as_dict(func, args, kwargs))
    return decorator(wrapper, func)

validators.email issue with `"`

""tavi.ivat."@yahoo.com"

In [1]: import validators
In [2]: validators.email(""john.doe."@yahoo.com")
Out[2]: True

This doesn't look right.

Improvements to URL validation

A few recommendations:

include re.IGNORECASE for URL validation
URL validation does not handle internationalized TLDs at the moment (as it only expects [a-z] in TLDs)
URL validation does not handle port numbers in netlocs

validators.url hang

When validating the following url, the call gets blocked and CPU spikes to 100%

$ pip freeze | grep validator
validators==0.12.0
$ python
Python 2.7.10 (default, Feb  7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import validators
>>> validators.url('http://172.20.201.135-10.10.10.1656172.20.11.80-10.10.10.1746172.16.9.13-192.168.17.68610.10.10.226-192.168.17.64610.10.10.226-192.168.17.63610.10.10.226-192.168.17.62610.10.10.226-192.168.17.61610.10.10.226-192.168.17.60610.10.10.226-192.168.17.59610.10.10.226-192.168.17.58610.10.10.226-192.168.17.57610.10.10.226-192.168.17.56610.10.10.226-192.168.17.55610.10.10.226-192.168.17.54610.10.10.226-192.168.17.53610.10.10.226-192.168.17.52610.10.10.226-192.168.17.51610.10.10.195-10.10.10.2610.10.10.194-192.168.17.685172.20.11.52-10.10.10.195510.10.10.226-192.168.17.50510.10.10.186-172.20.11.1510.10.10.165-198.41.0.54192.168.84.1-192.168.17.684192.168.222.1-192.168.17.684172.20.11.52-10.10.10.174410.10.10.232-172.20.201.198410.10.10.228-172.20.201.1983192.168.17.135-10.10.10.1423192.168.17.135-10.10.10.122310.10.10.224-172.20.201.198310.10.10.195-172.20.11.1310.10.10.160-172.20.201.198310.10.10.142-192.168.17.1352192.168.22.207-10.10.10.2242192.168.17.66-10.10.10.1122192.168.17.135-10.10.10.1122192.168.17.129-10.10.10.1122172.20.201.198-10.10.10.2282172.20.201.198-10.10.10.2242172.20.201.1-10.10.10.1652172.20.11.2-10.10.10.1412172.16.8.229-12.162.170.196210.10.10.212-192.168.22.133')

Control+C breaking the execution shows the following stacks:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-18>", line 2, in url
  File "/Users/librah/virtualenv/w4cs/lib/python2.7/site-packages/validators/utils.py", line 78, in wrapper
    value = func(*args, **kwargs)
  File "/Users/librah/virtualenv/w4cs/lib/python2.7/site-packages/validators/url.py", line 108, in url
    result = pattern.match(value)
KeyboardInterrupt

URL validator does not match ips ending with 0

WARNING: This IP host malicious software

The url 'http://5[.]196[.]190[.]0/' should return True when validating. Although ips ending with 0 could be network address, that's not always the case.

Domain Validator - International Domains

I think that the domain validator does not word for all cases of international domains, eg. thepiratebay.xn--fiqs8s is a working domain but the validator says it is invalid.

I located a fix in the code I believe should fix the issue - in the regex pattern:

pattern = re.compile(
r'^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|'
r'([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|'
r'([a-zA-Z0-9][-_.a-zA-Z0-9]{0,61}[a-zA-Z0-9])).'
r'([a-zA-Z]{2,13}|[a-zA-Z0-9-]{2,30}.[a-zA-Z0-9]{2,3})$'
)

Cheers

Deprecation warning inspect.getargspec() is deprecated

I am getting the following deprecation warning.

DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()

Debian package

Hallo,

There is my Debian package with your library:

https://github.com/VitexSoftware/python-validators.deb

Thank you for your work!

URL validation fails on Russian punycode TLD

>>> url('http://президент.рф/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://\xd0\xbf\xd1\x80\xd0\xb5\xd0\xb7\xd0\xb8\xd0\xb4\xd0\xb5\xd0\xbd\xd1\x82.\xd1\x80\xd1\x84/'})
>>> url('http://xn--d1abbgf6aiiy.xn--p1ai/')
ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--d1abbgf6aiiy.xn--p1ai/'})

Yet these are valid URLs, see https://en.wikipedia.org/wiki/.%D1%80%D1%84

The domain has an ASCII representation of xn--p1ai derived as Punycode

Why msn.comm validates?

I have the following code:

import validators

url = 'msn.comm'

if (validators.domain(url)) or (validators.url(url)):
    print("VALIDATION PASSED")
else:
    print("Please insert a domain name in this format: example.com")

And the msn.comm validates. Why is that?