Giter Site home page Giter Site logo

discrimination's Introduction

discrimination

Hackage Build Status

This package provides linear time sorting, partitioning, and joins for a wide array of Haskell data types. This work is based on a "final encoding" of the ideas presented in multiple papers and talks by Fritz Henglein.

By adopting a final encoding we can enjoy many instances for standard classes, lawfully, without quotienting.

Contact Information

Contributions and bug reports are welcome!

Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.

-Edward Kmett

discrimination's People

Contributors

acowley avatar awpr avatar ekmett avatar gwils avatar markus1189 avatar phadej avatar ryanglscott avatar sshine avatar supki avatar treeowl avatar twhitehead avatar ysangkok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

discrimination's Issues

Raise containers bound?

discrimination currently wants containers < 0.6. This is causing issues with stuff specific to GHC 8.6 (in my case, ghc-typelits-knownnat).

"Group" Does Not Respect Eq

Given, eg

newtype NewType = NewType (Int,Int)
instance Eq NewType where
    NewType (a,_) == NewType (b,_) = a == b

we have
> Data.List.group [NewType (0,0), NewType (0,1)]
[[NewType (0,0), NewType (0,1)]]
but
> Data.Discrimination.group [NewType (0,0), NewType (0,1)]
[[NewType (0,0)], [NewType (0,1)]]
Would it be possible to change the behavior to match Data.List?

Grouping instance for Float/Double

I'm in need of a Grouping instance for Double. I'm not particularly worried about what happens with the different representations of NaNs, or with negative zeros, so I think I can get away with this:

import GHC.Float (castFloatToWord32, castDoubleToWord64)

instance Grouping Float where grouping = contramap castFloatToWord32 grouping
instance Grouping Double where grouping = contramap castDoubleToWord64 grouping

Any comments about my implementation are more than welcome, because have been cargo-culting examples from elsewhere in the code and can't say that I fully understand what I am doing.

I'd guess that groupingWord64 would work in my orphan instances (instead of grouping) if it was exported.

The functions from GHC.Float were introduced in GHC 8.4 by #4092, and Stack Overflow has a question that may contain solutions for earlier GHCs (but I haven't actually tried any of them).

Is hashing correct?

Currently hashing will group together all values which produce the same hash, but that doesn't mean they're the same value - It feels like hashing should have some other constraint or take another argument to discriminate things which hash to the same value (even if it's just compare or something which uses an internal Map, as these should generally be very small).

There may not be a need to change the definition, just some more documentation to make it clear that values which are grouped are not necessarily distinct. If that's the "fix" that's needed, I'm happy to make the PR.

Negative, zero, and Integers over one byte are broken

0 :: Integer

The handling of 0 :: Integer is currently broken as sizeInBaseInteger claims it takes one byte to encode while exportIntegerToMutableByteArray actually writes zero bytes (no bits is one of the encoding of zero).

This means comparisons of 0 :: Integer amount to comparisons of one byte of random memory. Sometimes they are equal and sometimes they are not.

Prelude Data.Discrimination> disc grouping [(0,1),(0,2)]
[[1],[2]]
Prelude Data.Discrimination> disc grouping [(0,1),(0,2)]
[[1],[2]]
Prelude Data.Discrimination> disc grouping [(0,1),(0,2)]
[[1,2]]
Prelude Data.Discrimination> disc grouping [(0,1),(0,2)]
[[1],[2]]

-1 :: Integer

As per the documentation, exportIntegerToMutableByteArray function does not include the sign in its dump. This means negative integers are currently equivalent to positive integers.

Prelude Data.Discrimination> disc grouping [(1,1),(-1,2)]
[[1,2]]

QuickLook preparation breaks discrimination.

src/Data/Discrimination/Class.hs:23:10: error:
    • Couldn't match type: forall b1. [(a, b1)] -> [[b1]]
                     with: [(a, b)] -> [[b]]
      Expected: Sort a -> [(a, b)] -> [[b]]
        Actual: Sort a -> forall b. [(a, b)] -> [[b]]
    • In the expression: runSort
      In an equation for ‘disc’: disc = runSort
      In the instance declaration for ‘Discriminating Sort’
    • Relevant bindings include
        disc :: Sort a -> [(a, b)] -> [[b]]
          (bound at src/Data/Discrimination/Class.hs:23:3)
   |
23 |   disc = runSort

I suspect we'll need to make revisions to restrict base from above for existing releases.

Grouping nested recursive types?

So I have the following type

data Atom v = Variable v
            | Term [Atom v]
    deriving stock (Eq, Read, Data, Typeable, Generic, Generic1, Functor, Foldable, Traversable)
instance Grouping v => Grouping (Atom v)

Calling group always hangs, even with an empty list, and no matter the type of v.

Is this a case where I must write a custom instance?

discrimination-0.4 fails to build with GHC 9.0.1 and 9.2.1

src/Data/Discrimination/Grouping.hs:169:100: warning: [-Wdeprecations]
    In the use of ‘sizeInBaseInteger’
    (imported from GHC.Integer.GMP.Internals):
    Deprecated: "Use integerSizeInBase# instead"
    |
169 |   p@(MutablePrimArray mba) :: MutablePrimArray RealWorld Word8 <- newPrimArray (fromIntegral $ W# (sizeInBaseInteger i 256#))
    |                                                                                                    ^^^^^^^^^^^^^^^^^

src/Data/Discrimination/Grouping.hs:170:8: warning: [-Wdeprecations]
    In the use of ‘exportIntegerToMutableByteArray’
    (imported from GHC.Integer.GMP.Internals):
    Deprecated: "Use integerToMutableByteArray# instead"
    |
170 |   _ <- exportIntegerToMutableByteArray i mba 0## 1#
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[5 of 7] Compiling Data.Discrimination.Sorting ( src/Data/Discrimination/Sorting.hs, dist/build/Data/Discrimination/Sorting.o, dist/build/Data/Discrimination/Sorting.dyn_o )
[6 of 7] Compiling Data.Discrimination.Class ( src/Data/Discrimination/Class.hs, dist/build/Data/Discrimination/Class.o, dist/build/Data/Discrimination/Class.dyn_o )

src/Data/Discrimination/Class.hs:23:10: error:
    • Couldn't match type: forall b1. [(a, b1)] -> [[b1]]
                     with: [(a, b)] -> [[b]]
      Expected: Sort a -> [(a, b)] -> [[b]]
        Actual: Sort a -> forall b. [(a, b)] -> [[b]]
    • In the expression: runSort
      In an equation for ‘disc’: disc = runSort
      In the instance declaration for ‘Discriminating Sort’
    • Relevant bindings include
        disc :: Sort a -> [(a, b)] -> [[b]]
          (bound at src/Data/Discrimination/Class.hs:23:3)
   |
23 |   disc = runSort
   |          ^^^^^^^
cabal: Failed to build discrimination-0.4.

Possibly related to simplified subsumption?!

As a Hackage trustee, I've revised v0.4 to prevent users from encountering this error: https://hackage.haskell.org/package/discrimination-0.4/revisions/

"conquer" in Divisible instance of Sort is wrong

> sortingCompare (Just 1) (Just 2 :: Maybe Int)
EQ

Because:

> runSort sorting [(Just 1, 1), (Just 2 :: Maybe Int, 2)]
[[],[1],[2]]

Because:

> runSort conquer []
[[]]

Because it's implemented as 'return . fmap snd', which assumes there's at least one element in the list.

This also gives poor results for sorting lists:

> runSort sorting [(replicate 1000 'a', 2)]
[[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], ... [],[],[],[],[],[],[],[],[],[],[],[],[],[2]]

Sorts inspect more of the input than necessary to sort single-element lists.

> runSort sorting [(repeat 'a', 2)]
-- doesn't terminate

> runSort sorting [(repeat 'a', 2), (repeat 'b', 4)]
-- doesn't terminate

Because sorts don't have short-circuiting cases like the clauses sdisc _ [] = []; sdisc _ [(_, v)] = [[v]] in http://www.diku.dk/hjemmesider/ansatte/henglein/papers/henglein2011a.pdf, sorting actually forces the entire structure of the input (or, as much of it as the given Sort is capable of inspecting): in the latter case, the first Char sort gives [[("aaaa...", 2)], [("bbbb...", 4)]], so the second step should be able to simply quit sorting and return [[2],[4]]. Instead it repeatedly unconses another char off the list and sorts by it.

This can be fixed by adding and using a smart-constructor mkSort f = Sort $ \xs -> case xs of { [] -> []; [(_,v)] -> [[v]]; _ -> f xs }, but there might be a cleaner way.

With the above smart-constructor:

> runSort sorting [(repeat 'a', 2)]
[[2]]
> runSort sorting [(repeat 'a', 2), (repeat 'b', 4)]
[[2],[4]]

Incidentally this would also fix #3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.