Giter Site home page Giter Site logo

lpeg_patterns's Introduction

A collection of LPEG patterns

Use cases

  • Strict validation of user input
  • Searching free-form input

Modules

core

A small module implementing commonly used rules from RFC-5234 appendix B.1

  • ALPHA (pattern)
  • BIT (pattern)
  • CHAR (pattern)
  • CR (pattern)
  • CRLF (pattern)
  • CTL (pattern)
  • DIGIT (pattern)
  • DQUOTE (pattern)
  • HEXDIG (pattern)
  • HTAB (pattern)
  • LF (pattern)
  • LWSP (pattern)
  • OCTET (pattern)
  • SP (pattern)
  • VCHAR (pattern)
  • WSP (pattern)

IPv4

  • IPv4address (pattern): parses an IPv4 address in dotted decimal notation. on success, returns addresses as an IPv4 object
  • IPv4_methods (table):
    • unpack (function): the IPv4 address as a series of 4 8 bit numbers
    • binary (function): the IPv4 address as a 4 byte binary string
  • IPv4_mt (table): metatable given to IPv4 objects
    • __index (table): IPv4_methods
    • __tostring (function): returns the IPv4 address in dotted decimal notation

IPv4 "dotted decimal notation" in this document refers to "strict" form (see RFC-6943 section 3.1.1) unless otherwise noted.

IPv6

  • IPv6address (pattern): parses an IPv6 address
  • IPv6addrz (pattern): parses an IPv6 address with optional "ZoneID" (see RFC-6874)
  • IPv6_methods (table): methods available on IPv6 objects
    • unpack (function): the IPv6 address as a series of 8 16bit numbers, optionally followed by zoneid
    • binary (function): the IPv6 address as a 16 byte binary string
    • setzoneid (function): set the zoneid of this IPv6 address
  • IPv6_mt (table): metatable given to IPv6 objects
    • __tostring (function): will return the IPv6 address as a valid IPv6 string

uri

Parses URIs as described in RFC-3986.

  • uri (pattern): on success, returns a table with fields: (similar to luasocket)
    • scheme
    • userinfo
    • host
    • port
    • path
    • query
    • fragment
  • absolute_uri (pattern): similar to uri, but does not permit fragments
  • uri_reference (pattern): similar to uri, but permits relative URIs
  • relative_part (pattern): matches a relative uri not including query and fragment; data is held in named group captures "userinfo", "host", "port", "path"
  • scheme (pattern): matches the scheme portion of a URI
  • userinfo (pattern): matches the userinfo portion of a URI
  • host (pattern): matches the host portion of a URI
  • IP_literal (pattern): matches an IP based host portion of a URI. Capture is an IPv4, IPv6 or IPvFuture object
  • port (pattern): matches the port portion of a URI
  • authority (pattern): matches the authority portion of a URI; data is held in named group captures of "userinfo", "host", "port"
  • path (pattern): matches the path portion of a URI. Captures nil for the empty path.
  • segment (pattern): matches a path segment (a piece of a path without a /)
  • query (pattern): matches the query portion of a URI
  • fragment (pattern): matches the fragment portion of a URI
  • sane_uri (pattern): a variant that shouldn't match things that people would not normally consider URIs. e.g. uris without a hostname
  • sane_host (pattern): a variant that shouldn't match things that people would not normally consider valid hosts.
  • sane_authority (pattern): a variant that shouldn't match things that people would not normally consider valid hosts.
  • pct_encoded (pattern): matches a percent encoded octet, produces a capture of the normalised form.
  • sub_delims (pattern): the set of subcomponent delimeters

email

  • mailbox (pattern): the mailbox format: matches either name_addr or an addr-spec.
  • name_addr (pattern): the name and address format i.e. Display Name<[email protected]> Has captures of the local_part and the domain. Captures the display name in the named capture "display"
  • email (pattern): also known as an "addr-spec"; follows RFC-5322 section 3.4.1 Has captures of the local_part and the domain Be careful trying to reconstruct the email address from the captures; you may need escaping
  • local_part (pattern): the bit before the @ in an email address
  • domain (pattern): the bit after the @ in an email address
  • email_nocfws (pattern): a variant that doesn't allow for comments or folding whitespace
  • local_part_nocfws (pattern): the bit before the @ in an email address; no comments or folding whitespace allowed.
  • domain_nocfws (pattern): the bit after the @ in an email address; no comments or folding whitespace allowed.

http

These patterns should be considered to have non stable APIs.

  • DAV (pattern)
  • Depth (pattern)
  • Destination (pattern)
  • If (pattern)
  • Lock_Token (pattern)
  • Overwrite (pattern)
  • TimeOut (pattern)
  • SLUG (pattern)
  • DASL (pattern)
  • Accept_Patch (pattern)
  • Link (pattern)
  • Set_Cookie (pattern)
  • Cookie (pattern)
  • Content_Disposition (pattern)
  • Origin (pattern)
  • Sec_WebSocket_Accept (pattern)
  • Sec_WebSocket_Key (pattern)
  • Sec_WebSocket_Extensions (pattern)
  • Sec_WebSocket_Protocol_Client (pattern)
  • Sec_WebSocket_Protocol_Server (pattern)
  • Sec_WebSocket_Version_Client (pattern)
  • Sec_WebSocket_Version_Server (pattern)
  • Schedule_Reply (pattern)
  • Schedule_Tag (pattern)
  • If_Schedule_Tag_Match (pattern)
  • Strict_Transport_Security (pattern)
  • X_Frame_Options (pattern)
  • Accept_Datetime (pattern)
  • Memento_Datetime (pattern)
  • request_line (pattern)
  • field_name (pattern)
  • field_value (pattern)
  • header_field (pattern)
  • OWS (pattern)
  • RWS (pattern)
  • BWS (pattern)
  • token (pattern)
  • qdtext (pattern)
  • quoted_string (pattern)
  • comment (pattern)
  • Content_Length (pattern)
  • Transfer_Encoding (pattern)
  • chunk_ext (pattern)
  • TE (pattern)
  • Trailer (pattern)
  • request_target (pattern)
  • Host (pattern)
  • Via (pattern): captures are a list of tables with fields .protocol, .by and .comment
  • Connection (pattern)
  • Upgrade (pattern): captures are a list of strings containing protocol or protocol/version
  • IMF_fixdate (pattern)
  • Content_Encoding (pattern)
  • Content_Type (pattern)
  • Content_Language (pattern)
  • Content_Location (pattern)
  • Expect (pattern)
  • Max_Forwards (pattern)
  • Accept (pattern)
  • Accept_Charset (pattern)
  • Accept_Encoding (pattern)
  • Accept_Language (pattern)
  • From (pattern)
  • Referer (pattern)
  • User_Agent (pattern)
  • Date (pattern): capture is a table in the same format as used by os.time
  • Location (pattern)
  • Retry_After (pattern): capture is either a table describing an absolute time in the same format as used by os.time, or a relative time as a number of seconds
  • Vary (pattern)
  • Allow (pattern)
  • Server (pattern)
  • Last_Modified (pattern): capture is a table in the same format as used by os.time
  • ETag (pattern)
  • If_Match (pattern)
  • If_None_Match (pattern)
  • If_Modified_Since (pattern): capture is a table in the same format as used by os.time
  • If_Unmodified_Since (pattern): capture is a table in the same format as used by os.time
  • Accept_Ranges (pattern)
  • Range (pattern)
  • If_Range (pattern): capture is either an entity_tag or a table in the same format as used by os.time
  • Content_Range (pattern)
  • Age (pattern)
  • Cache_Control (pattern): captures are grouped into key/value pairs (where a directive with no value has a value of true)
  • Expires (pattern): capture is a table in the same format as used by os.time
  • Pragma (pattern)
  • Warning (pattern)
  • WWW_Authenticate (pattern)
  • Authorization (pattern)
  • Proxy_Authenticate (pattern)
  • Proxy_Authorization (pattern)
  • Forwarded (pattern)
  • Public_Key_Pins (pattern)
  • Public_Key_Pins_Report_Only (pattern)
  • Hobareg (pattern)
  • Authentication_Info (pattern)
  • Proxy_Authentication_Info (pattern)
  • ALPN (pattern)
  • CalDAV_Timezones (pattern)
  • Alt_Svc (pattern)
  • Alt_Used (pattern)
  • Expect_CT (pattern)
  • Referrer_Policy (pattern)

phone

  • phone (pattern): includes detailed checking for:
    • USA phone numbers using the NANP

language

Patterns for definitions from RFC-4646 Section 2.1

  • langtag (pattern): Capture is a table with the language tag decomposed into components:
    • language
    • extlang (optional)
    • script (optional)
    • region (optional)
    • variant (optional): an array
    • extension (optional): a dictionary from singleton to value
    • privateuse (optional): an array
  • privateuse (pattern): captures an array
  • Language_Tag (pattern): captures the whole language tag

lpeg_patterns's People

Contributors

daurnimator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lpeg_patterns's Issues

High memory usage

$ /usr/bin/time -f "Max RSS %M" lua -e 'require"lpeg_patterns.http"'
Max RSS 26028

email: limit length?

From RFC 3696

In addition to restrictions on syntax, there is a length limit on
email addresses. That limit is a maximum of 64 characters (octets)
in the "local part" (before the "@") and a maximum of 255 characters
(octets) in the domain part (after the "@") for a total length of 320
characters. Systems that handle email should be prepared to process
addresses which are that long, even though they are rarely
encountered.

See also: https://www.rfc-editor.org/errata_search.php?rfc=3696&eid=1690

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.