Comments (20)
This isn't a bad idea. It used to be ASCII only. But clearly, that is also an extreme. I often find myself defining new types that restrict the set of possible values to make debugging easier.
See also: #77
from quickcheck.
Copying my idea from #77:
I could conceive of a string with only ASCII as being "smaller" than one with ASCII + ASCII punctuation. Then it could "grow" to include more common Unicode, then "grow" towards uncommon.
from quickcheck.
@shepmaster, Maybe first uncommon/tricky (zero-width things, BOM, left-to-right), then numerous common?
Each Unicode codepoint has equal weight (time required to test with it), but unequal usefullness (probability that this codepoint catches some bug).
from quickcheck.
It's an interesting thing - what is the most useful order to iterate though test cases? I'd think most people using quickcheck would want it to find things that they haven't thought of (at least it's true for me!). However, once something is found, we want it to reduce it to something that we can wrap our brains around.
I think that "simple ASCII" will often be the easy-to-understand group of characters. The problem is going to be that different usages of String
s will have different "tricky" bits. Perhaps your area of code is more likely to have issues with BOMs, but mine with control characters. I'd doubt there's One True Order.
from quickcheck.
But I expect that a problem will rarely come up with, for example, character U+12345 CUNEIFORM SIGN URU TIMES KI (𒍅) exactly (and not with other high-plane characters). Yet including all high-plane characters significantly increase the testing space and outnumbers more useful characters. So for "lesser" strings you can leave just one high-plane character.
Imagine the table:
Option | Weight | Usefulness notes |
---|---|---|
Don't include high-plane characters in "easy set" | smallest | Won't catch respective problems |
Include just one high-plane character in "easy set" | small | Likely to catch the problem with such characters |
Include all high-plane characters in "easy set" | big | Only slightly more chance to catch such problems compared to the previous row |
Small character classes (control characters, whitespace) should be included entirely.
from quickcheck.
I somewhat feel like the obvious behavior for String
is the current behavior: any Unicode codepoint is fair game.
With that said, there's no reason why quickcheck
couldn't define a few other newtypes around String
that correspond to useful subsets of Unicode. (If we go that route, I would prefer to the keep the number of such types in quickcheck
proper very small.)
from quickcheck.
What does mean "a fair game"? In my idea any codepoint can appear, but probability should be drastically different.
Useful subset may fail to find a problem even if running long enough.
Option | Speed | Immediate results | Long-term results |
---|---|---|---|
All codepoints regularly distributed (current) | slow | few | all |
"Useful subset" | fast | moderate | moderate |
All codepoints, but not regularly distributed (proposed) | medium | moderate | all |
For example, if a function breaks just when being fed a string with three spaces in a row, I expect it to find it fast. If a function only breaks when being fed with tree 𒍅s in a row, that is expected to be found out slower (because of space is must more popular for bugs that some arbitrary character).
from quickcheck.
@vi Ah, I see, I misunderstood. I think I'm fine with a smarter impl of String
.
from quickcheck.
@BurntSushi, Maybe smarter impl of char
? Do you feel OK if arbitrary char
would not be regularly distributed and would prefer some characters?
from quickcheck.
I think that might be OK.
from quickcheck.
Can such logic be also applied to u32
and friends (making things like 0,1,2,-1,0x80000000
more popular) ?
Probably [0,0,0,0]
can trigger more bugs than [1582149423,1582149423,1582149423,1582149423]
.
from quickcheck.
Sounds like a good idea to me!
from quickcheck.
I wonder if it'd be worth looking at what other ports of quickcheck do. Does the Haskell quickcheck do anything fancy like this? If not, did they consider it?
from quickcheck.
Asked on IRC.
http://haddock.stackage.org/lts-3.17/QuickCheck-2.8.1/src/Test-QuickCheck-Arbitrary.html#line-471
kadoban> _Vi: So it basically only ever picks characters between 0 and 255, and it's biased towards 0 to 128
from quickcheck.
This is a known crappy Haskell QuickCheck default that has bitten many people. The standard workaround is http://hackage.haskell.org/package/quickcheck-unicode
from quickcheck.
Shall I try submitting a pull request about this, making it generate chars a bit like aforementioned quickcheck-unicode (but with some emphasis on whitespace, special characters and specific tricky Unicode characters).?
from quickcheck.
@vi That would be lovely!
from quickcheck.
Done with PR #116 in commit faed60d. Thanks @vi!
from quickcheck.
QuickCheck's string generator's motto should be "I love characters you hate".
from quickcheck.
@vi Haha, I like it!
from quickcheck.
Related Issues (20)
- Cannot use Rng methods on `Gen` when implementing `Arbitrary` HOT 5
- Identity checking HOT 3
- Stack overflow in quickcheck case shrinking HOT 3
- example case sort TEST FAILED HOT 1
- QuickChecking Const Generic Code HOT 5
- Implement Arbitrary for AsMut<[T: Arbitrary]> HOT 2
- Infinite Repetition/Never Ending Test with `f32` and `f64`. HOT 17
- Q: Idiomatic way to specify the length of an arbitrary vector HOT 7
- <newbie> How to generate a number within a range HOT 2
- Negating an integer leads to stack overflow HOT 2
- upgrade notes would be nice. HOT 1
- debug_reprs taking up 41% of test runtime HOT 2
- warning: panic message is not a string literal HOT 1
- Rng Size for Vec Arbitrary cannot be 0
- Impl Clone for Gen
- Implement something like choose_weighted for `Gen`
- Is this still maintained? HOT 1
- Is quickcheck still maintained? HOT 1
- How to combine quickcheck 1+ with fake? HOT 3
- Durations's Arbitrary instance is dependant on Gen's size HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from quickcheck.