Giter Site home page Giter Site logo

kowainik / typerep-map Goto Github PK

View Code? Open in Web Editor NEW
99.0 7.0 17.0 192 KB

⚡️Efficient implementation of Map with types as keys

Home Page: https://kowainik.github.io/projects/typerep-map

License: Mozilla Public License 2.0

Haskell 100.00%
typerep-map haskell dependent-map cache arrays binary-search dmap hacktoberfest

typerep-map's Introduction

typerep-map

logo

GitHub CI Hackage MPL-2.0 license

typerep-map introduces TMap and TypeRepMap — data structures like Map, but where types serve as keys, and values have the types specified in the corresponding key spots.

For the more details on the implementation see the following blog post:

Usage example

ghci> import Data.TMap

ghci> tm = insert True $ one (42 :: Int)

ghci> size tm
2

ghci> res = lookup tm

ghci> res :: Maybe Int
Just 42

ghci> res :: Maybe Bool
Just True

ghci> res :: Maybe String
Nothing

ghci> lookup (insert "hello" tm) :: Maybe String
Just "hello"

ghci> member @Int tm
True

ghci> tm' = delete @Int tm

ghci> member @Int tm'
False

Benchmarks

Tables below contain comparision with DMap TypeRep of ten lookup operations on structure with size 10^4:

ghc-8.2.2 ghc-8.4.3 ghc-8.8.3 ghc-8.10.1
DMap TypeRep 517.5 ns 779.9 ns 1.559 μs 1.786 μs
typerep-map 205.3 ns 187.2 ns 190.1 ns 169.1 ns
ghc-8.2.2 ghc-8.4.3
DMap 8.2.2 DMap 8.4.3
TMap 8.2.2 TMap 8.4.3

typerep-map's People

Contributors

brandonhamilton avatar chrispenner avatar chshersh avatar dependabot[bot] avatar ilyakooo0 avatar int-index avatar kmicklas avatar mniip avatar qnikst avatar vrom911 avatar yu-zh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

typerep-map's Issues

Add 'hoist'

hoist :: (forall x. f x -> g x) -> TypeRepMap f -> TypeRepMap g

This is basically a map over the elements. The associated keys are not changed in any way.

Having hoist means that we can declare TypeRepMap an instance of MFunctor from mmorph, although I'm not sure we should add this dependency.

Refactor (Fingerpring, Any, Any) to a data type

It would be better to have a special data type for this. And also special functions to convert to this datatype. This should improve code readability. And, possible, performance, if we add {-# UNPACK #-} pragmas where possible

Fail to buid against GHCJS(or maybe other 32bit platform)

Build fail with ghcjs-8.6.0.1, due to type mismatch. ltWord# & eqWord# accept Word# type, but Word# type is not always Word64#.

Below is part of of build message:

[1 of 3] Compiling Data.TypeRepMap.Internal ( src/Data/TypeRepMap/Internal.hs, dist/build/Data/TypeRepMap/Internal.js_o )

src/Data/TypeRepMap/Internal.hs:328:69: error:
    • Couldn't match expected type ‘GHC.Prim.Word#’
                  with actual type ‘GHC.Prim.Word64#’
    • In the first argument of ‘ltWord#’, namely ‘a’
      In the expression: a `ltWord#` valA
      In the expression:
        case a `ltWord#` valA of
          0#
            -> case a `eqWord#` valA of
                 0# -> go (2# *# i +# 2#)
                 _ -> let ... in ...
          _ -> go (2# *# i +# 1#)
    |

Data.TypeRepMap.toList causes segfaulting

Hi,
I am experiencing a bug when writing some code using Data.TypeRepMap (version 0.3.0 on hackage, ghc version 8.6.3); I've distilled the problem to its essence below (load this file in ghci and read the comments):

{-# language ConstraintKinds, GADTs #-}

module CrashTypeRepMap where

import Data.TypeRepMap
import GHC.Exts (toList, fromList)

data C a = C a deriving Show

-- Being unsatisfied with the Show instance for TypeRepMap, I want to bundle
-- show constraints in the map
data Constrained c f a = c a => Constrained (f a)
type Structure = TypeRepMap (Constrained Show C)

-- I intend to use this function to do the printing:
helper :: WrapTypeable (Constrained Show C) -> String
helper (WrapTypeable (Constrained c)) = show c

-- An example of structure I'll use in the repl below
ex_structure :: Structure
ex_structure = fromList
  [ WrapTypeable (Constrained (C 3   :: C Int))
  , WrapTypeable (Constrained (C 'a' :: C Char))]

Now let's test some expressions in the repl:

-- >>> :t ex_structure
-- ex_structure :: Structure

-- >>> toList ex_structure
-- [ e8fa2f3ef0863b7b3e43faa8f94caea7
-- , e8fa2f3ef0863b7b3e43faa8f94caea7
-- ]

Aside question: why are these two fingerprint equal? Is this normal?

You can calculate the type of this expression, but evaluating it crashes ghci:

-- >>> :t map helper (toList ex_structure)
-- map helper (toList ex_structure) :: [String]

-- >>> map helper (toList ex_structure)
-- CRASHES
-- /tmp/nix-shell-24560-0/rc: line 1: 25210 Segmentation fault      ghci

Do you know what causes this problem? Did I not understand how toList/fromList were supposed to be used?

Add an `Eq` instance on GHC 8.6+

GHC 8.6 offers quantified constraints, meaning we can try to do something like this:

instance (forall a. Typeable a => Eq (f a)) => Eq (TypeRepMap f)

Consider using `Array`

Right now we have

 , anys :: V.Vector Any

which under the hood is

data Vector a = Vector {-# UNPACK #-} !Int
                       {-# UNPACK #-} !Int
                       {-# UNPACK #-} !(Array a)
        deriving ( Typeable )

and I don't think we need the two Ints that come with it. We can cut the memory footprint by using Array directly.

Note that this is Array from the primitive package, not from the array package.

Stack Haddock

On a fresh master checkout, running stack haddock yields:

haddock: internal error: ./z-typerep-map-z-typerep-extra-impls-0.3.0/z-typerep-map-z-typerep-extra-impls.haddock: openBinaryFile: does not exist (No such file or directory)

It seems like there might be something wrong with the cabal file, having two libraries in the same repository? Not entirely sure, I'm not very familiar with cabal. However, unfortunately, this is a dependency of co-log, and as we're trying to use it, we can no longer generate haddock docs of our lib & dependencies since it crashes (with the above error as well) on this dependency.

Do you have any idea what might be wrong?

Offer an API instantiated to Identity

type TMap = TypeRepMap Identity

one :: Typeable a => a -> TMap
lookup :: Typeable a => TMap -> Maybe a
...

This could be useful to avoid wrapping/unwrapping.

toPairs :: TypeRepMap f -> [(SomeTypeRep, WrapTypeable f)]

Currently we can get the first half of this via keys and the second via toList, but I see no permanent guarantee that those have the same ordering (it's implied by the internal structure, but we're explicitly not supposed to rely on that).

Motivation: if your f includes an existential wrapper witnessing a typeclass, you can map across the elements of TypeRepMap f and generate a monomorphic result. Paired with the keys, this can become a regular Map. My immediate use case is a ToJSON instance:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
module Main where

import Control.Arrow ((***))
import qualified Data.Map as M
import Data.Aeson
import Data.Functor.Compose
import Data.Functor.Identity
import Data.TypeRepMap

-- With the proposed function, these imports go away
import Type.Reflection (SomeTypeRep(..))
import Data.TypeRepMap.Internal (toTriples, anyToTypeRep, wrapTypeable, fromAny)
import GHC.Types (Any)

-- proposed addition to Data.TypeRepMap
toPairs :: TypeRepMap f -> [(SomeTypeRep, WrapTypeable f)]
toPairs = map toPair . toTriples
  where
  toPair :: (a, Any, Any) -> (SomeTypeRep, WrapTypeable f)
  toPair (_, v, k) =
    ( SomeTypeRep (anyToTypeRep k)
    , wrapTypeable (anyToTypeRep k) (fromAny v)
    )

-- motivation
data Aesonic a where
  Aesonic :: ToJSON a => a -> Aesonic a

instance ToJSON (Aesonic a) where
  toJSON (Aesonic a) = toJSON a
  toEncoding (Aesonic a) = toEncoding a

type Aesonic1 f = Compose f Aesonic

wrapTypeableToJSON :: ToJSON1 f => WrapTypeable (Aesonic1 f) -> Value
wrapTypeableToJSON (WrapTypeable x) = toJSON1 $ getCompose x

instance ToJSON1 f => ToJSON (TypeRepMap (Aesonic1 f)) where
  toJSON = toJSON . M.fromList . fmap go . toPairs
    where go = show *** wrapTypeableToJSON @f

-- proof of concept
aesonic :: TypeRepMap Aesonic
aesonic = insert (Aesonic (5::Int)) $ one (Aesonic True)

lifted :: TypeRepMap (Aesonic1 Identity)
lifted = hoist (Compose . Identity) aesonic

main :: IO ()
main = print $ encode lifted
-- >>> main
-- "{\"Int\":5,\"Bool\":true}"

`CacheMap` lookup doesn't work properly with `ghc-8.0.2`

I've noticed that benchmarks were failing on the ghc-8.0.2. After some investigations, it was figured out that it could be happening because of some bugs in ghc-8.0.2.
I was testing it on a small example with the bigMap of size 10. It was failing on lookups for Proxy 1 and Proxy 4. cachedBinarySearch doesn't return the index on that queries, though it works fine on all other ghc versions.

Experiments are in this branch: https://github.com/kowainik/typerep-map/tree/vrom911/8.0.2-bug

I'm thinking of giving up the support of 8.0.2 and have 8.6 instead as soon as ghc-typelits-knownnat releases the newest version on Hackage (support of 8.6 is already in master).

Add 'adjust' and 'adjustWithKey'

I can implement it using hoistWithKey, but a dedicated implementation can be much more efficient (with a single lookup instead of an entire traversal).

Add 'unionWith' and 'union'

unionWith :: (forall x. f x -> f x -> f x) -> TypeRepMap f -> TypeRepMap f -> TypeRepMap f
union = unionWith const

Add Backpack interface for typerep-map

Especially useful when we will expose Map-based solution. This will also allow us to refactor tests and benchmarks. We don't need to implement all functions for every solution, for non-interesting one in the internal library we can just have error "Not supported by this implementation!".

Issues on refactoring benchmarks and tests will be open only after this issue is done.

Improve performance of `insert` and `delete` function

Currently performance of insert and delete is O(n log n) because those functions sort the list each time. But I guess it's possible to implement O(n). Though, it will be really painful.

Here are the steps:

  1. Transform from tree-layout back to ordered-array layout: O(n).
  2. Insert element to proper place by shifting array: O(n).
  3. Transform back to tree layout: O(n).

But this issue should wait until we have benchmarks for insert and delete implemented:

  • Blocked by: #56

hoistWithKey

hoistWithKey :: (forall x. Typeable x => f x -> g x) -> TypeRepMap f -> TypeRepMap g

Adding this function will require adding another vector that will store TypeReps:

data TypeRepMap (f :: k -> Type) =
  TypeRepMap
    { fingerprintAs :: {-# UNPACK #-} !(PrimArray Word64) -- ^ first components of key fingerprints
    , fingerprintBs :: {-# UNPACK #-} !(PrimArray Word64) -- ^ second components of key fingerprints
    , anys          :: {-# UNPACK #-} !(Array Any)        -- ^ values stored in the map
    , keys          :: {-# UNPACK #-} !(Array Any)        -- ^ typerep keys
}

In hoistWithKey we can access a corresponding element in keys, cast it to TypeRep a and then apply https://hackage.haskell.org/package/base-4.11.1.0/docs/Type-Reflection.html#v:withTypeable to get the Typeable instance.

I'm hoping it won't affect the performance of lookups: the cachedBinarySearch function will stay unchanged.

Implementation plan:

  1. add the keys field, check that the benchmarks show no change
  2. implement hoistWithKey. it should be probably available only on 8.2+ because it relies on the new TypeRep for withTypeable

Use cache-friendly binary search

You can try to use different implementation of binary search algorithm (though still very simple). You can use cache-friendly binary search:

This can improve performance of lookup. I guess you can create another module and implement it there. In order to implement this algorithm you need to perform two steps:

  1. Implement Vector initialization from sorted list.
  2. Implement cache-friendly binary search.

2 is simple, because algo itself is simple and there's C code in blog post. Though, 1 is a little bit more difficult...

  1. First, you need to sort list.
  2. Then you need to create possibly perfectly-balanced binary tree from this sorted list. Introduce your own
data Tree a = Leaf | Node a (Tree a) (Tree a)
  1. Convert Tree to list using level-order.
  2. Finally, just create Vector from this list.

So, probably initialization of Vector becomes slower. But AFAIU, this is not important.

The only drawback: if first halves of Fingerpint are equal, you can't just increment index by 1 i + 1 to search for the next key. But, assuming this is not important we can forget about this :) Or actually compare by second part of Fingerptint in case first part is equal. You can just print number of unique first halves of Fingeprints in benchmark first (using just print) to see how many different elements there.

Remove 'containers' from dependencies

To remove containers from dependencies we need to rewrite two functions:

  • unionWith
  • fromSortedList

unionWith

Rewrite this to something more efficient:

  1. Allocate new mutable vectors of length n+m.
  2. Traverse input vectors simultaneously, copying elements in the right order. We should be able to take advantage of the fact that the elements are in a particular order. For regular sorted vectors it's very simple, for our cache-friendly order it can be tricky, but should be possible (I hope)
  3. While doing this, count the actual amount of elements copied (it can be less than n+m due to conflicts). Shrink the mutable vector.
  4. Freeze and return.

fromSortedList

Rewrite into the following way:

  1. Allocate two mutable arrays of length n: one for original list and second one for result.
  2. Rewrite current algorithm inside monad and change array in-place.
  3. Freeze and return.

Explore different lookup algorithm: Bucket-based lookup (like in HashMap)

Currently lookup uses cache optimized binary search. Performance of this algorithm is already quite good! But we can explore different search algorithms using the fact that Fingerprint is basically Word128.

For example, we can introduce buckets of size 16 elements correspondingly to first 4 bits of Fingerprint. If we take fingerprint, we can look at first 4 bits of this fingerprint and decide in which bucket it should go. Then we can take second 4 bits and continue recursively. So we will have structure something like this (not final form):

data Tree
    = Empty
    | Leaf Int  -- index inside array of Any
    | Node (Array Tree)  -- array of size 16

In HashMap structure is more complicated (probably because it's more optimized) but we can specialize it to our case:

data HashMap k v
    = Empty
    | BitmapIndexed !Bitmap !(A.Array (HashMap k v))
    | Leaf !Hash !(Leaf k v)
    | Full !(A.Array (HashMap k v))
    | Collision !Hash !(A.Array (Leaf k v))

So in worst case we will walk for this tree on depth 32 (128 / 4) but assuming that Fingerprints are very random this worst-case scenario should never happen. And we can increase speed by increasing size of array from 16 to something like 128 but our structure will consume more memory (unless we use IntMap instead of Array).

Unpack fields

Instead of

data TypeRepMap (f :: k -> Type) = TypeRepMap
    { fingerprintAs :: Unboxed.Vector Word64
    , fingerprintBs :: Unboxed.Vector Word64
    , anys          :: V.Vector Any
    }

why not

data TypeRepMap (f :: k -> Type) = TypeRepMap
    { fingerprintAs :: {-# UNPACK #-} !(Unboxed.Vector Word64)
    , fingerprintBs :: {-# UNPACK #-} !(Unboxed.Vector Word64)
    , anys          :: {-# UNPACK #-} !(V.Vector Any)
    }

Module structure and documentation

  • It's not nice to send people to README.md in the package description, just copy the info
  • It's not nice to send the people to TypeRepMap from TMap, just copy the info
  • A better module structure: Data.TMap, Data.TypeRepMap, Data.TypeRepMap.Internal. In particular, things like fingerprintAs should be exported only from the .Internal module.
  • Module descriptions provide little info

I'll take care of this.

Add 'keys', improve Show instance

Now that we store TypeRep, there are some improvements that come to my mind:

  • in the Show instance, display actual TypeRep values instead of their fingerprints
  • add a function keys :: TypeRepMap f -> [SomeTypeRep]

Feature request: hoistM and toList/fromList

Hi, I would like to have additional helpers for the library:

-- Usecase: operations with map in custom monads
hoistM :: Applicative t => (forall x. f x -> t (g x)) -> TypeRepMap f -> t (TypeRepMap g) 

-- I don't sure that code below is the best design, but it should represent the main idea 
-- Usecase: serelisation/deserialisation for aeson 
newtype Wrapper f = Wrapper { unWrapper :: forall x . f x }
toList :: TypeRepMap f -> [Wrapper f]
fromList :: [Wrapper f] -> TypeRepMap f

Support GHC-8.8.1

TODO:

  • Update Travis config
  • Add -Wmissing-deriving-strategies to .cabal file ghc options

Export `containers`-based solution

Probably do it only after multiple public library support is added to cabal.

Also, maybe it will become possible to abstract interface with Backpack and remove code duplication in benchmarks and test-suites? 🤔

Bump up QuickCheck version

Current constraints:

  • QuickCheck >= 2.11.3 && < 2.13

But the latest version if 2.13 so we should relax constraints for typerep-map. If no code changes required then also just make a revision on Hackage. Needed for Stackage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.