Giter Site home page Giter Site logo

jhs-106's People

Contributors

jsyrjala avatar tomimas avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jsyrjala

jhs-106's Issues

core.clj/parse: another regexp bomb on invalid input

Hi,

The fix in #1 left the same bug open for exploitation: A ':' passes the "allowed characters" filter and there are even tests asserting addresses like "Gregorius IX:n tie" can be parsed.

Yet suitable placement of ':' sends the parser off its way to computational hell again, as observable with

(parse "aaaaaaaaa aaaaaaaaa aaaaaaaaa 13:")

I suppose technically it's the same issue as #1. Any suggestions? The ':' character doesn't need to be the last one either, so naive tail-stripping of weirdness won't work.

core.clj/parse: regexp bomb on invalid input

Hi!

We have application that, among other things, uses this library to parse end user input.

Because endusers are what they are, this input sometimes contains invalid data, like email addresses, for reasons unknown.

Trying to parse such an "address" causes regexp bomb in our application, taking very long time to process and eventually leaving server unresponsive once several such requests have come in.

As this library is meant for handling address parsing, it would be great if it could handle invalid input data more gracefully, either by rejecting invalid input altogether, or by using more efficient regexp for parsing it.

I believe problem is in rather complex regexp in reg.clj/street, but did not (atleast yet) go and actually see what it does.

Test case to demonstrate problem:

diff --git a/test/jhs_106/core_test.clj b/test/jhs_106/core_test.clj
index c6fbeb6..4b9211c 100644
--- a/test/jhs_106/core_test.clj
+++ b/test/jhs_106/core_test.clj
@@ -547,6 +547,9 @@
                    :stairway "\u00C4"
                    :apartment "13\u00F6"}} (simple-parse "Gregorius IX:n tie 12-14 rak. 2 \u00C4 13\u00F6"))))

+(deftest parse-should-handle-invalid-input-without-eternal-loop
+  (parse "This is email address with some text [email protected]"))
+
 (deftest should-unabbreviate-streetname
   (doseq [v abbreviations]
     (is (= (str "Rosvo" (name (key v))) (unabbreviate (str "Rosvo" (val v)))) (str (val v) " => " (name (key v))))

Lein test takes very, very long time to execute (so long that I did not have time to wait for it and killed it after 10 minutes) on 2.3Ghz Intel Core i7.

Thread dump:

"main" #1 prio=5 os_prio=31 tid=0x00007fba05002000 nid=0x1303 runnable [0x000000010c581000]
   java.lang.Thread.State: RUNNABLE
    at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4602)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4794)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4801)
    at java.util.regex.Pattern$Prolog.match(Pattern.java:4741)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Matcher.match(Matcher.java:1270)
    at java.util.regex.Matcher.matches(Matcher.java:604)
    at clojure.core$re_matches.invoke(core.clj:4424)
    at jhs_106.core$street_line.invoke(core.clj:35)
    at jhs_106.core$parse.invoke(core.clj:92)
    at jhs_106.core_test$fn__671.invoke(core_test.clj:551)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.