As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing valid IDNs in A-label form:
In [1]: import validators
In [2]: validators.url('http://xn--j1ail.xn--p1ai')
Out[2]: ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--j1ail.xn--p1ai'})
This failure is caused by the fact that the regex for validators.url only allows for repetition of hyphens as part of larger groups within the host and domain name sections. These groups must begin with a non-hyphen character, thus preventing sequential hyphens. For the TLD section no such group even exists; hyphens aren't permitted at all. The relevant portion of the regex is found on lines 36-41 of url.py:
# host name
u"(?:(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)"
# domain name
u"(?:\.(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)*"
# TLD identifier
u"(?:\.(?:[a-z\u00a1-\uffff]{2,}))"
The issue also occurs when processing URLs of valid domains that have consecutive hyphens in their name. While such domain names are less common and may be frowned upon by certain registries, they are still technically valid according to the RFC. Here are the dig and whois results for one such domain:
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @8.8.8.8 online--trading.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31443
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;online--trading.com. IN A
;; ANSWER SECTION:
online--trading.com. 899 IN A 195.110.124.133
;; Query time: 167 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Apr 03 15:03:25 PDT 2018
;; MSG SIZE rcvd: 64
Domain Name: ONLINE--TRADING.COM
Registry Domain ID: 2171387112_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.register.it
Registrar URL: http://www.register.it
Updated Date: 2017-10-06T18:54:58Z
Creation Date: 2017-10-06T18:54:58Z
Registry Expiry Date: 2018-10-06T18:54:58Z
Registrar: Register.it SPA
Registrar IANA ID: 168
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +39.5520021555
Domain Status: ok https://icann.org/epp#ok
Name Server: NS1.REGISTER.IT
Name Server: NS2.REGISTER.IT
DNSSEC: unsigned
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
It's arguable whether domains like this should pass validators.url since they're somewhat of an edge case for everyday users. It may not be worth letting potentially erroneous URLs through just to prevent a few oddball domains from failing validation. The IDNA A-labels are a different story though -- those should absolutely pass without requiring the user to convert them beforehand. Python's built-in IDNA decoder cannot properly convert IDNA domains that are contained within URLs, so it's fairly onerous to expect the user to do that before using validators.url.
Modifying the regex to match anything that follows the IDNA A-label format is not an ideal solution since invalid A-labels can be generated using valid characters (e.g. "xn--aaaa"). Since the existing regex already checks for the Unicode characters used by IDNA U-labels, I think the ideal solution would be to isolate and convert possible IDNA hostnames before reassembling the URL and matching it against the existing regex. I've made a version of url.py that should make this fairly painless; expect my PR shortly.