Giter Site home page Giter Site logo

Comments (20)

BurntSushi avatar BurntSushi commented on August 11, 2024

This isn't a bad idea. It used to be ASCII only. But clearly, that is also an extreme. I often find myself defining new types that restrict the set of possible values to make debugging easier.

See also: #77

from quickcheck.

shepmaster avatar shepmaster commented on August 11, 2024

Copying my idea from #77:

I could conceive of a string with only ASCII as being "smaller" than one with ASCII + ASCII punctuation. Then it could "grow" to include more common Unicode, then "grow" towards uncommon.

from quickcheck.

vi avatar vi commented on August 11, 2024

@shepmaster, Maybe first uncommon/tricky (zero-width things, BOM, left-to-right), then numerous common?

Each Unicode codepoint has equal weight (time required to test with it), but unequal usefullness (probability that this codepoint catches some bug).

from quickcheck.

shepmaster avatar shepmaster commented on August 11, 2024

It's an interesting thing - what is the most useful order to iterate though test cases? I'd think most people using quickcheck would want it to find things that they haven't thought of (at least it's true for me!). However, once something is found, we want it to reduce it to something that we can wrap our brains around.

I think that "simple ASCII" will often be the easy-to-understand group of characters. The problem is going to be that different usages of Strings will have different "tricky" bits. Perhaps your area of code is more likely to have issues with BOMs, but mine with control characters. I'd doubt there's One True Order.

from quickcheck.

vi avatar vi commented on August 11, 2024

But I expect that a problem will rarely come up with, for example, character U+12345 CUNEIFORM SIGN URU TIMES KI (𒍅) exactly (and not with other high-plane characters). Yet including all high-plane characters significantly increase the testing space and outnumbers more useful characters. So for "lesser" strings you can leave just one high-plane character.

Imagine the table:

Option Weight Usefulness notes
Don't include high-plane characters in "easy set" smallest Won't catch respective problems
Include just one high-plane character in "easy set" small Likely to catch the problem with such characters
Include all high-plane characters in "easy set" big Only slightly more chance to catch such problems compared to the previous row

Small character classes (control characters, whitespace) should be included entirely.

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

I somewhat feel like the obvious behavior for String is the current behavior: any Unicode codepoint is fair game.

With that said, there's no reason why quickcheck couldn't define a few other newtypes around String that correspond to useful subsets of Unicode. (If we go that route, I would prefer to the keep the number of such types in quickcheck proper very small.)

from quickcheck.

vi avatar vi commented on August 11, 2024

What does mean "a fair game"? In my idea any codepoint can appear, but probability should be drastically different.

Useful subset may fail to find a problem even if running long enough.

Option Speed Immediate results Long-term results
All codepoints regularly distributed (current) slow few all
"Useful subset" fast moderate moderate
All codepoints, but not regularly distributed (proposed) medium moderate all

For example, if a function breaks just when being fed a string with three spaces in a row, I expect it to find it fast. If a function only breaks when being fed with tree 𒍅s in a row, that is expected to be found out slower (because of space is must more popular for bugs that some arbitrary character).

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

@vi Ah, I see, I misunderstood. I think I'm fine with a smarter impl of String.

from quickcheck.

vi avatar vi commented on August 11, 2024

@BurntSushi, Maybe smarter impl of char? Do you feel OK if arbitrary char would not be regularly distributed and would prefer some characters?

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

I think that might be OK.

from quickcheck.

vi avatar vi commented on August 11, 2024

Can such logic be also applied to u32 and friends (making things like 0,1,2,-1,0x80000000 more popular) ?

Probably [0,0,0,0] can trigger more bugs than [1582149423,1582149423,1582149423,1582149423].

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

Sounds like a good idea to me!

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

I wonder if it'd be worth looking at what other ports of quickcheck do. Does the Haskell quickcheck do anything fancy like this? If not, did they consider it?

from quickcheck.

vi avatar vi commented on August 11, 2024

Asked on IRC.

http://haddock.stackage.org/lts-3.17/QuickCheck-2.8.1/src/Test-QuickCheck-Arbitrary.html#line-471

kadoban> _Vi: So it basically only ever picks characters between 0 and 255, and it's biased towards 0 to 128

from quickcheck.

FranklinChen avatar FranklinChen commented on August 11, 2024

This is a known crappy Haskell QuickCheck default that has bitten many people. The standard workaround is http://hackage.haskell.org/package/quickcheck-unicode

from quickcheck.

vi avatar vi commented on August 11, 2024

Shall I try submitting a pull request about this, making it generate chars a bit like aforementioned quickcheck-unicode (but with some emphasis on whitespace, special characters and specific tricky Unicode characters).?

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

@vi That would be lovely!

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

Done with PR #116 in commit faed60d. Thanks @vi!

from quickcheck.

vi avatar vi commented on August 11, 2024

QuickCheck's string generator's motto should be "I love characters you hate".

from quickcheck.

BurntSushi avatar BurntSushi commented on August 11, 2024

@vi Haha, I like it!

from quickcheck.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.