For a walkthrough of what this package provides with examples of common operations see the containers introduction.
See containers
on Hackage for
full API documentation.
See CONTRIBUTING.md.
Assorted concrete container types
Home Page: https://hackage.haskell.org/package/containers
For a walkthrough of what this package provides with examples of common operations see the containers introduction.
See containers
on Hackage for
full API documentation.
See CONTRIBUTING.md.
I am sorely tempted to use IntMap as a backing for some C-interop code. The foreign library uses 64-bit handles (guaranteed to be equivalent to int64_t
) but IntMap only guarantees Haskell's platform-sized Int
compatibility. I don't want to restrict my library to only operating on x86-64, so I will have to use HashMap. I doubt the cost will be substantial, but I thought you would like to know that it's a use case that users might like to use IntMap but need a certain sized integer guaranteed.
I admit I haven't looked into the IntMap
implementation to know if I've overlooked something.
The Applicative
operations allow the size of a tree to be calculated before it is built. We can take advantage of this by building the tree from the outside in, pushing deferred computations down the middle and offering efficient access to the ends. But then things break as soon as someone indexes or splits near the end of the sequence, because these operations calculate positions from left to right, rather than from the outside in. I suspect this is an artifact of the design in the paper being intended for arbitrary monoids, which may lack subtraction. Tree sizes have subtraction, so we should use it! Without any doubt, we should change this for index
, and I think we should probably also change it for splitAt
. I'm not sure about other operations.
I've created this ticket because I consider this quite important for the GHC 7.10.1 release
The current Foldable
class in GHC HEAD provides the methods shown below, however containers
does only implement a very small subset of those. The containers
release to be bundled with GHC 7.10.1 ought to address this issue (while staying compatible with previous GHCs, obviously)
class Foldable (t :: * -> *) where
Data.Foldable.fold :: Monoid m => t m -> m
foldMap :: Monoid m => (a -> m) -> t a -> m
foldr :: (a -> b -> b) -> b -> t a -> b
foldl :: (b -> a -> b) -> b -> t a -> b
foldr1 :: (a -> a -> a) -> t a -> a
foldl1 :: (a -> a -> a) -> t a -> a
-- since GHC 7.6 / base-4.6
Data.Foldable.foldr' :: (a -> b -> b) -> b -> t a -> b
Data.Foldable.foldl' :: (b -> a -> b) -> b -> t a -> b
-- since GHC 7.10 / base-4.8
Data.Foldable.toList :: Foldable t => t a -> [a]
null :: Foldable t => t a -> Bool
length :: Foldable t => t a -> Int
elem :: (Foldable t, Eq a) => a -> t a -> Bool
maximum :: (Foldable t, Ord a) => t a -> a
minimum :: (Foldable t, Ord a) => t a -> a
sum :: Num a => t a -> a
product :: (Foldable t, Num a) => t a -> a
-- Defined in ‘Data.Foldable’
If you don't have time to work on this yourself in time for the upcoming GHC 7.10.1 RC in a couple of weeks, please let me know.
The new zipWith
and mapWithFunction
, unlike the old versions, do not rebalance. Furthermore, I'm becoming more and more confident that my new idea for reimplementing <*>
will work out, and very well, but by its nature has the potential, under certain circumstances, to produce very badly balanced sequences. The combination of these factors suggests to me that we should implement a rebalance
function, which would be equivalent to this:
rebalance :: Seq a -> Seq a
rebalance xs = zipWith' (replicate (length xs) ()) xs
but would probably be written to use a splitBuild
function I've played around with a bit (a generalization of fromFunction
).
Hi,
The provided tree api seems a little small. After doing the usual workflow of defining my own version of, for example, subtreeat :: Eq a => a -> Tree a -> Maybe (Tree a)
, only to later find it defined somewhere at hackage, my question is:
Does it make sense to include some of the functions defined in:
http://hackage.haskell.org/package/debian-3.81/docs/Debian-Apt-Dependencies.html#g:3
and
http://hackage.haskell.org/package/hledger-lib-0.22.1/docs/Hledger-Utils.html#v:subtreeat
As functions of Data.Tree, or should that be left for a separate package?
My imediate suggestions would be:
treefilter :: (a -> Bool) -> Tree a -> Tree a
subtreeat :: Eq a => a -> Tree a -> Maybe (Tree a)
subtreeinforest :: Eq a => a -> [Tree a] -> Maybe (Tree a)
Or am I missing some easier way of avoiding these functions?
I propose to add GHC.Exts.IsList
instances for containers, something like:
instance (Ord k) => IsList (Map k v) where
type Item (Map k v) = (k,v)
fromList = Data.Map.fromList
toList = Data.Map.toList
I'm not sure if there's anything to be done about this, but I think the splitting approach may not be giving quite as much as I had hoped for zipping. In particular, I think that while we're getting O(n)
total times, as we should, we're probably getting O((log (min{i,n-i]))^2)
times for immediate access rather than O(log(min{i,n-i}))
. Specifically, using splitAt
to produce a naive "divide and conquer" implementation of index
seems to give times that look like log n + log (n/2) + ...
, which comes out to around (log n)^2
. This is not bad, and it's certainly better than the previous O(n)
, but it's not what I'd hoped for.
The total times are still fine because, although I'm too tired to understand why right now,
log n + 2*log (n/2) + 4* log (n/4) + 8*log (n/8) + ... + 0
seems to be bounded above (by 2 for base 2 logarithms).
If these half-asleep calculations are right, I need to fix the commentary for splitMap
.
I'm currently using it to implement enumFromTo
-like functions for Seq
, which demonstrate the same sort of performance advantage for incremental use as my proposed zipWith
change. splitTraverseSeq
would also be great for conversion from Array
and Vector
types. As it appears to be generally useful, I think we should export it in some form. We probably don't want to export a Splittable
class, however, so we'll need to do something to work around that.
It just seems weird to have foldrWithIndex
and foldlWithIndex
but not foldMapWithIndex
or traverseWithIndex
. If I figure out the right way to implement the strict folds, I may want indexed versions of those too.
Elem
, Node
, Digit
, and FingerTree
are all instances of Foldable
, but we currently only use their foldMap
, foldr
, and foldl
methods. GHC does not know this, and automatically generates default code for all the unused methods. I see three ways around this:
foldrElem
, foldrNode
, etc. instead of making these Foldable
instances. This is sane and makes some things easier to understand, but it's a bit awkward.Foldable
for these types.undefined
.When I try to compile Data.Sequence
, I now get an error that it can't find containers.hs
.
I'd like an operation that does a lookup, and if the element isn't there does an insert. The crucial difference compared to insertLookupWithKey is that if the key is found there is no need to rebuild the tree. So it could have a type like
insertOrLookup :: k -> a -> Map k a -> Either (Map k a) a
It should be implementable quite efficiently by doing a CPS of the function.
The documentation for Data.IntMap.deleteMin says "An error is thrown if the IntMap is already empty. Note, this is not the same behavior Map. " But this is no longer correct as of containers-0.5; no error is thrown. I suggest that this sentence is replaced by "Returns an empty map if the map is empty. ", the same documentation as for Data.Map.
The documentation states, the implementation of Map is left-biased. I think, this property shall hold for values and keys. For example:
module Main where
import Data.Function (on)
import Data.Map.Lazy
main :: IO ()
main = do
print
$ insert (OrdOnFst ('a', 'b')) ()
$ singleton (OrdOnFst ('a', 'a')) ()
print
$ alter id (OrdOnFst ('a', 'b'))
$ singleton (OrdOnFst ('a', 'a')) ()
newtype OrdOnFst a b = OrdOnFst {unOrdOnFst :: (a, b)} deriving Show
instance Eq a => Eq (OrdOnFst a b) where
(==) = (==) `on` (fst . unOrdOnFst)
instance Ord a => Ord (OrdOnFst a b) where
compare = compare `on` (fst . unOrdOnFst)
actual output:
fromList [(OrdOnFst {unOrdOnFst = ('a','b')},())]
fromList [(OrdOnFst {unOrdOnFst = ('a','a')},())]
expected output:
fromList [(OrdOnFst {unOrdOnFst = ('a','b')},())]
fromList [(OrdOnFst {unOrdOnFst = ('a','b')},())]
In this example, insert
works left-biased as expected, but alter
does not. By looking at the source code of Data.Map.Base
, I suspect there are other functions suffering from the same problem, e.g., update
and updateWithKey
.
I'm writing this issue to inform you of the latest warnings visible when building w/ GHC HEAD (7.7):
libraries/containers/Data/Set/Base.hs:1187:1: Warning:
Local definition of ‛join’ clashes with a future Prelude name - this will become an error in GHC 7.10, under the Applicative-Monad Proposal.
libraries/containers/Data/IntSet/Base.hs:1165:1: Warning:
Local definition of ‛join’ clashes with a future Prelude name - this will become an error in GHC 7.10, under the Applicative-Monad Proposal.
libraries/containers/Data/IntSet/Base.hs:1196:17: Warning:
In the use of ‛bitSize’ (imported from Data.Bits):
Deprecated: "Use bitSizeMaybe or finiteBitSize instead"
libraries/containers/Data/Graph.hs:287:10: Warning:
‛SetM’ is an instance of Monad but not Applicative - this will become an error in GHC 7.10, under the Applicative-Monad Proposal.
There is a list of problems with the current Map
and IntMap
modification interfaces:
Because of the above I have very often found myself in requirement for the following function:
withItem ::
(Ord k) =>
k ->
(Maybe i -> (r, Maybe i)) ->
Map k i -> (r, Map k i)
withItem k f m =
let
item = Map.lookup k m
(r, item') = f item
m' = Map.update (const item') k m
in (r, m')
It covers all the imaginable scenarios of modification operations: delete, update, replace, - yet it also provides one with ability to extract the modified data and not only. The problem is that this implementation involves a repeated lookup for the same item: first with lookup
, then with update
- but the containers library exposes no functionality to get around that. So I suggest to implement an efficient version of withItem
in the library.
This function turns out to be far more generalized than any of the currently present in the library, so it can become a basic building block for all sorts of modifying functions, including all the already existing ones, e.g.:
alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
alter f k = snd . withItem k (\i -> ((), f i))
delete :: Ord k => k -> Map k a -> Map k a
delete k = snd . withItem k (const ((), Nothing))
updateLookupWithKey ::
(Ord k) =>
(k -> a -> Maybe a) ->
k ->
Map k a -> (Maybe a, Map k a)
updateLookupWithKey f k =
withItem k $ \i -> case i of
Just i -> case f k i of
Nothing -> (Just i, Nothing)
Just i' -> (Just i', Just i')
_ -> (Nothing, Nothing)
You can see how easy it makes to achieve any sort of specialized functionality. So, besides the evident benefits, this function can also become a replacement for a whole list of confusing specialized ones, thus greatly lightening the library.
You might have also noticed how this function is based around the standard a -> (b, a)
pattern of the State
monad, thus making it easily composable with it using the state
and runState
functions.
Summarizing, my suggestions are:
withItem
for lazy and strict versions of Map
and IntMap
.updateLookupWithKey
example implementation shows how this change can be benefitial.insertWith
, insertWithKey
, insertLookupWithKey
, adjust
, adjustWithKey
, update
, updateWithKey
, updateLookupWithKey
, alter
.GHC.Base
decrees
{-# RULES "map/coerce" map coerce = coerce #-}
This rule applies equally well to any legitimate Functor
instance (any Functor
instance that obeys the functor laws), including various ones in containers
.
In effect, we have two static arguments, mappend
and mempty
, that we pass down the FingerTree
spine in a dictionary. The paper about the static argument transformation suggests that as soon as there are two static arguments, pulling them out tends to improve performance. I fear, however, that we may not be able to do so without ScopedTypeVariables
. Barring that, we should think again about whether we should define foldMap
or just use the foldr
definition, and do some benchmarking. We should also consider the merits of an independent fold
definition.
This package currently offers Map, IntMap, Set, IntSet, Seq, and Tree implementations. As far as I can tell, none of these depend on each other except that Tree depends on Seq, and there is no significant core of shared code. Splitting the package and repo should, I think, make it faster to build/upgrade new/experimental versions since you'd only be compiling the container that changed instead of rebuilding the whole lot of them.
These functions are sorely missing from the Set API, but are present in the Map API. Is there a reason why these functions aren't implemented?
Hi,
I wanted to use drawTree
on a Tree Int
but it does not work for the drawing functions only accept Tree String
instances. In my opinion this is too strict and can easily be generalized to Show s => Tree s
. I made that change in a fork here: jonasc@e650dda
I would be happy to create a pull request for this commit.
Data.Set
lacks any analogues to the Traversable
operations.
Obviously, the current Data.Map.Strict
and Data.IntMap.Strict
interfaces need to stick around for backwards compatibility, but they seem to be fundamentally broken:
Functor
, Traversable
, and (apparently) Data
instances don't match their claimed strictness, because they share the same type as the lazy versions.Strict
modules and getting a map from someone else has to assume the map might contain suspensions and therefore forcing its elements could generate exceptions or fail to terminate.I think there are two sensible ways to make really strict versions:
Wrap up the map implementations in newtype
s and give them their own instances as appropriate. This is certainly the simplest approach. Its advantages:
strictToLazy
function (unwrapping the newtype
), a simple lazyToStrict
function based on deepseq
, and a cheap unsafeLazyToStrict
function (the newtype
wrapper).Write completely new strict map types with strict fields. I imagine the function implementations, in most or all cases, would have exactly the same text as the lazy ones, so a bit of CPP hackery could probably reduce the source bloat, but not the binary bloat. Its advantages:
It seems to me that the conservative approach would be pretty easy to swallow.
There are indexed functions for lookupIndex
, findIndex
, deleteIndex
and elemAt
in Data.Set
as well as Data.Map
(with additional updateAt
). Please add similar functions to Data.IntSet
and Data.IntMap
if possible. I'd help me considerably. Thank you.
At present, figuring out whether an index is left, middle, or right may require inspecting (and perhaps forcing) up to eight different Elem
or Node
values, which could be anywhere in the heap. If we add a size field to each Digit
constructor, we can skip all of these. The downside, of course, is an extra word per Digit
, which I conjecture is an entirely reasonable price to pay.
It would be super handy to get some instances on the Data.Graph.SCC type
Read,Show,Eq,Traversable,Foldable
if possible
Typeable,Data,Generic
containers/Data/Set/Base.hs defines foldlStrict
, which looks an awful lot like the usual foldl'
. Is there a reason for using this rather than the one from Data.List
or Data.Foldable
? If so, that should be documented in the source. If not, that should be changed.
Hello,
I have recently stumbled upon some code that indicated that Data.Map.insert
has linear time complexity (w.r.t the number of elements in the map).
I have created this criterion benchmark (compiled as ghc --make -O2 SetInsert.hs
) and the results (included) seem to validate my concerns. I would not rule out the possibility that the benchmark is bad or that there is a bug in criterion.
My system: Linux kernel 3.8.6-1, GHC 7.6.2, containers 0.5.2.1
Data.Sequence
, at least, uses its own Id
type. This looks like it's about the same as Identity
from Data.Functor.Identity
in base
. Is there a reason not to use that? If not, I can give it a whirl.
I just tried updating GHC's container submodule to 0.5.6.0 (b9e4e22) but now a couple of testcases in GHC's testsuite simply hang (and get killed by timeouts)
Here's an excerpt:
=====> sequence001(normal) 4362 of 4363 [0, 10, 1]
cd ../../libraries/containers/tests-ghc && '/home/hvr/Haskell/GHC/ghc/bindisttest/install dir/bin/ghc' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-warn-tabs -fno-ghci-history -o sequence001 sequence001.hs -package containers >sequence001.comp.stderr 2>&1
cd ../../libraries/containers/tests-ghc && ./dataintset001 </dev/null >dataintset001.run.stdout 2>dataintset001.run.stderr
cd ../../libraries/containers/tests-ghc && ./sequence001 </dev/null >sequence001.run.stdout 2>sequence001.run.stderr
Wrong exit code (expected 0 , actual 1 )
Stdout:
Stderr:
sequence001: <<loop>>
*** unexpected failure for sequence001(normal)
Timeout happened...killing process...
Wrong exit code (expected 0 , actual 99 )
Stdout:
11
Stderr:
*** unexpected failure for T3333(normal)
[Errno 21] Is a directory: './ghci/linking/.'
[Errno 22] Invalid argument: './ghci/linking/.'
and here's the summary
Unexpected results from:
TEST="print026 package01e ghci037 sequence001 T706 T3333 ghcilink002 ghcilink003 ghcilink004 ghcilink005 ghcilink006 T8333 ghciprog004 hpc_ghc_ghci T3171 T9872b T9872c T9872a T5837"
OVERALL SUMMARY for test run started at Mon Dec 15 23:59:09 2014 CET
0:13:22 spent to go through
4363 total tests, which gave rise to
16859 test cases, of which
12834 were skipped
55 had missing libraries
3884 expected passes
67 expected failures
1 caused framework failures
0 unexpected passes
15 unexpected failures
4 unexpected stat failures
Unexpected failures:
../../libraries/containers/tests-ghc sequence001 [bad exit code] (normal)
../../libraries/hpc/tests/ghc_ghci hpc_ghc_ghci [bad exit code] (normal)
driver T706 [bad exit code] (normal)
ghci.debugger/scripts print026 [bad exit code] (ghci)
ghci/linking T3333 [bad exit code] (normal)
ghci/linking ghcilink002 [bad exit code] (normal)
ghci/linking ghcilink003 [bad exit code] (normal)
ghci/linking ghcilink004 [bad exit code] (normal)
ghci/linking ghcilink005 [bad exit code] (normal)
ghci/linking ghcilink006 [bad exit code] (normal)
ghci/prog004 ghciprog004 [bad exit code] (normal)
ghci/scripts ghci037 [bad exit code] (normal)
ghci/should_run T3171 [bad exit code] (normal)
package package01e [stderr mismatch] (normal)
th T8333 [bad exit code] (normal)
Unexpected stat failures:
perf/compiler T5837 [stat not good enough] (normal)
perf/compiler T9872a [stat not good enough] (normal)
perf/compiler T9872b [stat not good enough] (normal)
perf/compiler T9872c [stat not good enough] (normal)
Reasonably recent GHC versions support GADTs, which allow class constraints on constructors. This suggests something like the following type for Set
:
data Set a where
Bin :: Ord a => {-# UNPACK #-} !Size -> !a -> !(Set a) -> !(Set a) -> Set a
Tip :: Set a
I would speculate that a little CPP could let Set
use this definition or something similar (and the O(log n) implementation of Data.Foldable.elem
that goes with it) where supported with minimal internal or external breakage.
After a bit of prodding, Ross Paterson kindly offered a technique for building a Seq
with a predetermined size. This should probably allow us to implement <*>
more efficiently, avoiding the incremental rebuilding currently used. I'm going to try to do that, but I'm opening this ticket to keep track.
I have tried, but I have been unable to condense this into a small example.. so please forgive me for describing what's going on in Cloud Haskell instead. I have the following function (https://github.com/haskell-distributed/distributed-process/blob/ceb7e5ae9f617da20f8e0920f0630a35849bf4fa/distributed-process/src/Control/Distributed/Process/Node.hs#L505):
ncEffectDied :: Identifier -> DiedReason -> NC ()
ncEffectDied ident reason = do
(affectedLinks, unaffectedLinks) <- gets (splitNotif ident . (^. links))
(affectedMons, unaffectedMons) <- gets (splitNotif ident . (^. monitors))
-- Do stuff with affectedLinks, affectedMons
modify' $ (links ^= unaffectedLinks) . (monitors ^= unaffectedMons)
NC
is a state monad around IO
. So we get the state from the state monad, extract two maps, partition those maps using splitNotif
(see below), then do something with the first half and finally store the other half back in the state (links
and monitors
are Accessor
s). Some important remarks:
The state is defined as
data NCState = NCState
{
_links :: !(Map Identifier (Set ProcessId))
, _monitors :: !(Map Identifier (Set (ProcessId, MonitorRef)))
, ...
}
where, importantly, we have strict references to the maps from the state.
The maps do not contain Map
s themselves.
We use modify'
to update the state, which is a custom, strict, version of the standard modify
:
modify' :: MonadState s m => (s -> s) -> m ()
modify' f = StateT.get >>= \s -> StateT.put $! f s
Now here's the rub. splitNotif
is defined as
splitNotif ident = Map.partitionWithKey (\k !v -> impliesDeathOf ident k)
With this definition, my code runs in constant space. If however I remove the bang on v
so that the predicate is no longer strict in the map values, I end up with a Map
-typed space leak. I don't understand why this happens, nor have I been able to reproduce this in smaller test cases (though Ian speculated that that might be because the GC could optimize the smaller test cases but not the bigger one).
I don't know if this is a bug in partitionWithKey
or not -- either way, if you understand why this happens, I'd really like to know.
The haddocks for IntMap
(including the unreleased 0.5.0.0
version) claim that the runtime complexity of (for example) findMin
is O(log n). The implementation looks to me like O(min(W, log n)), since it's just a direct downwards traversal of the tree until a leaf is found. Are the docs wrong or am I missing something?
Currently, fromList
just snocs list elements on one by one. If we have Deep n pr m (One a)
and we're appending elements one at a time, we first build a Two
, and then a Three
, and then finally a Four
constructor. I imagine we're losing more just with this shuffle than we're (potentially) gaining with foldr/build
fusion. I have to wonder if there's a better way to build a sequence from left to right. We could try something simple to begin with, like maybe
fromList = Seq . fromList' empty where
fromList' xs [] = xs
fromList' (Deep n pr m (One a)) xs =
case xs of
(x1:x2:x3:xs') -> fromList' (Deep (n+3) pr (m `snocTree` (Node3 3 a (Elem x1) (Elem x2))) (One (Elem x3))) xs'
[x1,x2] -> Deep (n+2) pr (m `snocTree` (Node2 2 a (Elem x1))) (One (Elem x2))
[x] -> Deep n pr m (Two a (Elem x))
fromList' (Deep n pr m (Two a b)) xs =
case xs of
(x1:x2:xs') -> fromList' (Deep (n+2) pr (m `snocTree` (Node3 3 a b (Elem x1))) (One (Elem x2))) xs'
....
In base 4.8, (*>)
will be implemented (by default) using (<$)
, for which Seq
, at least, has an optimized implementation. But this won't help (>>)
unless it's defined explicitly.
I think it would be useful to have a function similar to Data.Array.IArray.accum, where the target type isn't the same as the input type.
fromListWithDefault :: Ord k => (a' -> a -> a') -> a' -> [(k, a)] -> Map k a'
An obvious application would be
Data.Map.fromListWithDefault (:) [] [(1,2), (1,3), (2,3)]
instead of
let f (a,b) = (a,[b]) in Data.Map.fromListWith (++) $ fmap f [(1,2), (1,3), (2,3)]
In Data.Set, can member and nonMember please be declared infix 4, like Data.List.elem?
I guess the same goes for IntSet, Map, and IntMap as well.
Also, binary set operations such as union and intersection should be declared infixl 5, analogous to (++) (but associating to the left because of the hedge union algorithm).
Rationale: For lists, one can write
a+b
elemc++d && e+f
elem g++h
.
Analogously, the same notation should be possible for sets:
a+b
Set.memberc
Set.uniond && e+f
Set.memberg
Set.union h
.
Suggested fix: in Data.Set, Data.IntSet, Data.Map, Data.IntMap:
infix 4 member
infix 4 nonMember
infixl 5 union
infixl 5 intersection
infix 5 difference
Thanks!
I've noticed that two data structures are still lacking NFData instances, namely Data.Sequence.Seq
and Data.Graph.SCC
.
The polymorphic recursion in Data.Sequence
makes it hard to modify, limits flexibility in certain ways, and appears to confuse GHC's optimizer. One possible alternative is to switch from a nested type representation to a GADT one, using phantom types to keep track of balance information. I've converted <>
, viewl
, viewr
, <|
, |>
, index
, fmap
, foldMap
, traverse
, and splitAt
to use such a representation. You can see the result here: https://github.com/treeowl/GADTSeq/blob/master/Data/Sequence.hs
I believe Data.IntMap
should be able to support
fromFunction :: Int -> Int -> (Int -> a) -> IntMap a
where the first two arguments bound the domain. Alternatively, there might be some fancy way to use a domain made up of multiple ranges.
Looking at http://hackage.haskell.org/package/containers-0.5.2.1/docs/src/Data-Set-Base.html, I read:
-- [Note: Order of constructors]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- The order of constructors of Set matters when considering performance.
-- Currently in GHC 7.0, when type has 2 constructors, a forward conditional
-- jump is made when successfully matching second constructor. Successful match
-- of first constructor results in the forward jump not taken.
-- On GHC 7.0, reordering constructors from Tip | Bin to Bin | Tip
-- improves the benchmark by up to 10% on x86.
In #ghc:
nh2: thoughtpolice: another question about what you said though: if I only have two constructors, why do we need to "check" the second branch at all?
nh2: if there are two things and it's not the first one then sure it's the second one, isn't it?
Ghoul__: nh2: obviously that's optimized away
nh2: Ghoul__: not sure if I understand - those comments in containers sound like it isn't
Ghoul__: nh2: that optimization will be blazingly obvious at cmm
Ghoul__: I'd be really really surprised if it's not done
nh2: Ghoul__: well then either you will be surprised or what Data.Set says is wrong and the comment about the 10% performance increase is not accurate / outdated
thoughtpolice: OK, so i'm not privvy on the details today exactly, but pattern matching is actually not really an if statement - it's a jump table. but i imagine this can have an impact on performance in certain cases if we can scrutinize on a fast path
thoughtpolice: the easiest way is to just try it and run the benchmarks IMO. the new code generator may have changed things
leroux: thoughtpolice: nofib?
thoughtpolice: no, the benchmarks for containers
It would be great to find out whether this optimization is (still?) not done by GHC / if the 10% speed-up still occur. If it is, we can add it to the comment; if not then we probably have a cheap opportunity to make a lot of programs faster.
><
, replicate
, replicateA
, *>
, and maybe other operations are all capable of making sequences that are like the Tardis: bigger on the inside. Any of these (especially when iterated) can easily make a sequence whose size cannot be represented by an Int
, at which point nothing will work right, and operations will either produce crazy errors or silently give wrong answers. The solution is to check for this in all operations that expand sequences, from <|
on up. While we're at it, we may also want to switch from Int
to Word
size representations internally.
Would it make sense to add this?
Analog to insertLookupWithKey
from Data.Map, saving one traversal.
insertLookup :: Ord a => a -> Set a -> (Bool, Set a)
or
insertLookup :: Ord a => a -> Set a -> Maybe (Set a)
As is often the case, different definitions of splitAt
have different balances of useful laziness and efficiency in strict contexts. I would argue that the definition in Data.Sequence.splitAt
is inefficient but does not offer really useful laziness.
Data.Sequence.splitAt
has semantics that initially look similar to the ones the Haskell Report specifies for Data.List.splitAt
:
splitAt n xs = (take n xs, drop n xs)
This immediately implies that splitAt undefined undefined
is (undefined, undefined)
rather than undefined
, which is a little bit lazy, but not really usefully so.
The Report does not directly specify the laziness of take
or drop
, but the standard Prelude definition of take
is lazy in the list argument:
take n _ | n <= 0 = []
This, however, is not the case for Data.Sequence.take
:
take i = fst . splitAt i
splitAt i (Seq xs) = (Seq l, Seq r)
where (l, r) = split i xs
split :: Int -> FingerTree (Elem a) ->
(FingerTree (Elem a), FingerTree (Elem a))
split i Empty = i `seq` (Empty, Empty)
split i xs
| size xs > i = (l, consTree x r)
| otherwise = (xs, Empty)
where Split l x r = splitTree i xs
In particular,
take 0 undefined = undefined
Note also that the above definition does not make i <= 0
a special case, so if fakefoo = drop 0 foo
,
reallyUnsafePtrEquality# foo fakefoo = 0#
GHC's version of Data.List.splitAt
does something a bit non-standard with Data.List.splitAt
: it is strict in its Int
argument but lazy in its list argument, so
splitAt undefined [] = undefined
but
splitAt 0 undefined = ([], undefined)
The Report semantics strike me as a bit silly, since laziness in the split point doesn't really seem to help in any way I can personally think of. I think we should consider just using the GHC semantics:
splitAt i s@(Seq xs)
| i <= 0 = (empty, s)
| otherwise = case xs of
Empty -> (empty, empty)
_ | i < size xs -> case splitTree i xs of
Split l x r -> (Seq l, Seq (consTree x r))
| otherwise -> (s, empty)
The type of Set.map
is (Ord a, Ord b) => (a -> b) -> Set a -> Set b
but the Ord a
constraint is unused. I suggest that its type should be changed to Ord b => (a -> b) -> Set a -> Set b
. It will make the function easier to reason about and may possibly improve performance.
I know the function is deprecated, so this is mostly just here as a note:
Before 0.5, Data.Map.insertWith' didn't force the arguments to the function before applying it - just the result of applying the function. In 0.5, both the arguments and the result of the function are forced.
I would expect, that mappend is also used for overlapping values:
ghci Data.Map Data.Monoid> Sum 5 <> Sum 3
Sum {getSum = 8}
ghci Data.Map Data.Monoid> fromList [(42, Sum 5)] <> fromList [(42, Sum 3)]
fromList [(42,Sum {getSum = 5})]
ghci Data.Map Data.Monoid> Product 5 <> Product 3
Product {getProduct = 15}
ghci Data.Map Data.Monoid> fromList [(1, Product 5)] <> fromList [(1, Product 3)]
fromList [(1,Product {getProduct = 5})]
ghci Data.Map Data.Monoid> "Hello" <> " World"
"Hello World"
ghci Data.Map Data.Monoid> fromList [("en", "Hello"), ("de", "Hallo")] <> fromList [("en", " world"), ("de", " Welt")]
fromList [("de","Hallo"),("en","Hello")]
Is there any reason, why this Monoid instance was implemented this way?
Prelude> import Data.Map.Lazy as L
Prelude L> import Data.Map.Strict as S
Prelude L S> let m = S.fromList [(1, ())]
Prelude L S> S.size $ fmap undefined m
1
Prelude L S> S.size $ S.map undefined m
*** Exception: Prelude.undefined
As there shoule only be one instance for a given type, it is not possible to make the the functor instance for Data.Map.Strict to be strict without newtyping. Is there a reason for this behavior?
Sometimes, as in a replicate
-optimized Seq
, m >> n
or m *> n
will be much better than m >>= \_ -> n
; in these cases, GHC will not know that they are the same. We can add GHC rewrite rules to attempt to do the right thing as appropriate.
See also #62
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.