haskell / containers Goto Github PK

View Code? Open in Web Editor NEW

314.0 22.0 177.0 9.05 MB

Assorted concrete container types

Home Page: https://hackage.haskell.org/package/containers

Haskell 99.43% Makefile 0.14% Perl 0.05% Shell 0.01% Python 0.26% Batchfile 0.07% C 0.04%

containers's Introduction

The `containers` Package

For a walkthrough of what this package provides with examples of common operations see the containers introduction.

See containers on Hackage for full API documentation.

Contributing

See CONTRIBUTING.md.

containers's People

Contributors

Stargazers

Watchers

Forkers

sebfisch nomeata kazu-yamamoto roconnor mkscrg batterseapower gnuemacs thomie twanvl tilowiklund liyang taneb ekmett andysonnenburg dreixel bgamari isturdy dorchard rrnewton gereeter alphaheavy buybackoff haasn k-bx meiersi hvr icelandjack selinger fmapfmapfmap jonasc cacay strout treeowl iu-parfunc lowasser mherzl adubovik archblob greydot plaprade zigazou ori-sky kuk0 gabriella439 monkey-mas fpco codecommunity dmwit mstksg joshvar88 kosmikus recursion-ninja lspitzner drnico fmaste esoeylemez wrengr erikd paolino nerdgguy charles-cooper psibi alexbiehl thumphries sjakobi duog mikeplus64 chris-martin kevinnj phadej stevenxl sanjorgek bwroga mattmorrow m-renaud dwincort gbaz argumatronic totbwf hardentoo oliviersohn andreaspk tweag oisdk christopherdavenport sgraf812 insideoutclub harpocrates jwaldmann quasicomputational lolotp igrep davideichmann ethercrow fuuzetsu chessai goolord hasufell benjaminweb obsidiansystems

containers's Issues

An Int64Map/Word64Map guaranteed to be 64-bits wide on all platforms?

I am sorely tempted to use IntMap as a backing for some C-interop code. The foreign library uses 64-bit handles (guaranteed to be equivalent to int64_t) but IntMap only guarantees Haskell's platform-sized Int compatibility. I don't want to restrict my library to only operating on x86-64, so I will have to use HashMap. I doubt the cost will be substantial, but I thought you would like to know that it's a use case that users might like to use IntMap but need a certain sized integer guaranteed.

I admit I haven't looked into the IntMap implementation to know if I've overlooked something.

Some Seq operations are too strict

The Applicative operations allow the size of a tree to be calculated before it is built. We can take advantage of this by building the tree from the outside in, pushing deferred computations down the middle and offering efficient access to the ends. But then things break as soon as someone indexes or splits near the end of the sequence, because these operations calculate positions from left to right, rather than from the outside in. I suspect this is an artifact of the design in the paper being intended for arbitrary monoids, which may lack subtraction. Tree sizes have subtraction, so we should use it! Without any doubt, we should change this for index, and I think we should probably also change it for splitAt. I'm not sure about other operations.

Implement new `Foldable` methods

I've created this ticket because I consider this quite important for the GHC 7.10.1 release

The current Foldable class in GHC HEAD provides the methods shown below, however containers does only implement a very small subset of those. The containers release to be bundled with GHC 7.10.1 ought to address this issue (while staying compatible with previous GHCs, obviously)

class Foldable (t :: * -> *) where
  Data.Foldable.fold :: Monoid m => t m -> m
  foldMap :: Monoid m => (a -> m) -> t a -> m
  foldr :: (a -> b -> b) -> b -> t a -> b
  foldl :: (b -> a -> b) -> b -> t a -> b
  foldr1 :: (a -> a -> a) -> t a -> a
  foldl1 :: (a -> a -> a) -> t a -> a

  -- since GHC 7.6 / base-4.6
  Data.Foldable.foldr' :: (a -> b -> b) -> b -> t a -> b
  Data.Foldable.foldl' :: (b -> a -> b) -> b -> t a -> b

  -- since GHC 7.10 / base-4.8
  Data.Foldable.toList :: Foldable t => t a -> [a]
  null :: Foldable t => t a -> Bool
  length :: Foldable t => t a -> Int
  elem :: (Foldable t, Eq a) => a -> t a -> Bool
  maximum :: (Foldable t, Ord a) => t a -> a
  minimum :: (Foldable t, Ord a) => t a -> a
  sum :: Num a => t a -> a
  product :: (Foldable t, Num a) => t a -> a
    -- Defined in ‘Data.Foldable’

If you don't have time to work on this yourself in time for the upcoming GHC 7.10.1 RC in a couple of weeks, please let me know.

Add rebalance to Data.Sequence

The new zipWith and mapWithFunction, unlike the old versions, do not rebalance. Furthermore, I'm becoming more and more confident that my new idea for reimplementing <*> will work out, and very well, but by its nature has the potential, under certain circumstances, to produce very badly balanced sequences. The combination of these factors suggests to me that we should implement a rebalance function, which would be equivalent to this:

rebalance :: Seq a -> Seq a
rebalance xs = zipWith' (replicate (length xs) ()) xs

but would probably be written to use a splitBuild function I've played around with a bit (a generalization of fromFunction).

Data.Tree additional functions

Hi,
The provided tree api seems a little small. After doing the usual workflow of defining my own version of, for example, subtreeat :: Eq a => a -> Tree a -> Maybe (Tree a), only to later find it defined somewhere at hackage, my question is:

Does it make sense to include some of the functions defined in:
http://hackage.haskell.org/package/debian-3.81/docs/Debian-Apt-Dependencies.html#g:3
and
http://hackage.haskell.org/package/hledger-lib-0.22.1/docs/Hledger-Utils.html#v:subtreeat
As functions of Data.Tree, or should that be left for a separate package?

My imediate suggestions would be:
treefilter :: (a -> Bool) -> Tree a -> Tree a
subtreeat :: Eq a => a -> Tree a -> Maybe (Tree a)
subtreeinforest :: Eq a => a -> [Tree a] -> Maybe (Tree a)

Or am I missing some easier way of avoiding these functions?

Add IsList instances

I propose to add GHC.Exts.IsList instances for containers, something like:

instance (Ord k) => IsList (Map k v) where
  type Item (Map k v) = (k,v)
  fromList = Data.Map.fromList
  toList   = Data.Map.toList

Splitting may be too slow

I'm not sure if there's anything to be done about this, but I think the splitting approach may not be giving quite as much as I had hoped for zipping. In particular, I think that while we're getting O(n) total times, as we should, we're probably getting O((log (min{i,n-i]))^2) times for immediate access rather than O(log(min{i,n-i})). Specifically, using splitAt to produce a naive "divide and conquer" implementation of index seems to give times that look like log n + log (n/2) + ..., which comes out to around (log n)^2. This is not bad, and it's certainly better than the previous O(n), but it's not what I'd hoped for.

The total times are still fine because, although I'm too tired to understand why right now,

log n + 2*log (n/2) + 4* log (n/4) + 8*log (n/8) + ... + 0

seems to be bounded above (by 2 for base 2 logarithms).

If these half-asleep calculations are right, I need to fix the commentary for splitMap.

Export something like splitTraverseSeq

I'm currently using it to implement enumFromTo-like functions for Seq, which demonstrate the same sort of performance advantage for incremental use as my proposed zipWith change. splitTraverseSeq would also be great for conversion from Array and Vector types. As it appears to be generally useful, I think we should export it in some form. We probably don't want to export a Splittable class, however, so we'll need to do something to work around that.

Add foldMapWithIndex and traverseWithIndex

It just seems weird to have foldrWithIndex and foldlWithIndex but not foldMapWithIndex or traverseWithIndex. If I figure out the right way to implement the strict folds, I may want indexed versions of those too.

Stop generating unused instance code

Elem, Node, Digit, and FingerTree are all instances of Foldable, but we currently only use their foldMap, foldr, and foldl methods. GHC does not know this, and automatically generates default code for all the unused methods. I see three ways around this:

Use specialized functions foldrElem, foldrNode, etc. instead of making these Foldable instances. This is sane and makes some things easier to understand, but it's a bit awkward.
Use a miniature version of Foldable for these types.
Set all unused class methods to undefined.

containers.h prevents compilation without cabal

When I try to compile Data.Sequence, I now get an error that it can't find containers.hs.

New operation: insertOrLookup

I'd like an operation that does a lookup, and if the element isn't there does an insert. The crucial difference compared to insertLookupWithKey is that if the key is found there is no need to rebuild the tree. So it could have a type like

insertOrLookup :: k -> a -> Map k a -> Either (Map k a) a

It should be implementable quite efficiently by doing a CPS of the function.

Documentation of IntMap.deleteMin/Max out of date wrt empty maps

The documentation for Data.IntMap.deleteMin says "An error is thrown if the IntMap is already empty. Note, this is not the same behavior Map. " But this is no longer correct as of containers-0.5; no error is thrown. I suggest that this sentence is replaced by "Returns an empty map if the map is empty. ", the same documentation as for Data.Map.

implementation of Map is not always left-biased for keys

The documentation states, the implementation of Map is left-biased. I think, this property shall hold for values and keys. For example:

module Main where

import Data.Function (on)
import Data.Map.Lazy


main :: IO ()
main = do
  print
    $ insert    (OrdOnFst ('a', 'b')) ()
    $ singleton (OrdOnFst ('a', 'a')) ()
  print
    $ alter id  (OrdOnFst ('a', 'b'))
    $ singleton (OrdOnFst ('a', 'a')) ()


newtype OrdOnFst a b = OrdOnFst {unOrdOnFst :: (a, b)} deriving Show

instance Eq a => Eq (OrdOnFst a b) where
  (==) = (==) `on` (fst . unOrdOnFst)

instance Ord a => Ord (OrdOnFst a b) where
  compare = compare `on` (fst . unOrdOnFst)

actual output:

fromList [(OrdOnFst {unOrdOnFst = ('a','b')},())]
fromList [(OrdOnFst {unOrdOnFst = ('a','a')},())]

expected output:

fromList [(OrdOnFst {unOrdOnFst = ('a','b')},())]
fromList [(OrdOnFst {unOrdOnFst = ('a','b')},())]

In this example, insert works left-biased as expected, but alterdoes not. By looking at the source code of Data.Map.Base, I suspect there are other functions suffering from the same problem, e.g., update and updateWithKey.

GHC 7.7 warnings

I'm writing this issue to inform you of the latest warnings visible when building w/ GHC HEAD (7.7):

libraries/containers/Data/Set/Base.hs:1187:1: Warning:
    Local definition of ‛join’ clashes with a future Prelude name - this will become an error in GHC 7.10, under the Applicative-Monad Proposal.

libraries/containers/Data/IntSet/Base.hs:1165:1: Warning:
    Local definition of ‛join’ clashes with a future Prelude name - this will become an error in GHC 7.10, under the Applicative-Monad Proposal.

libraries/containers/Data/IntSet/Base.hs:1196:17: Warning:
    In the use of ‛bitSize’ (imported from Data.Bits):
    Deprecated: "Use bitSizeMaybe or finiteBitSize instead"

libraries/containers/Data/Graph.hs:287:10: Warning:
    ‛SetM’ is an instance of Monad but not Applicative - this will become an error in GHC 7.10, under the Applicative-Monad Proposal.

A single general modification function for Map and IntMap proposal

There is a list of problems with the current Map and IntMap modification interfaces:

they are filled with quirky and too specialized functions
they are not consistent in terms of how equally named functions behave
they still don't cover some important scenarios of use

Because of the above I have very often found myself in requirement for the following function:

withItem ::
  (Ord k) =>
  k ->
  (Maybe i -> (r, Maybe i)) ->
  Map k i -> (r, Map k i)
withItem k f m = 
  let
    item = Map.lookup k m
    (r, item') = f item
    m' = Map.update (const item') k m
  in (r, m')

It covers all the imaginable scenarios of modification operations: delete, update, replace, - yet it also provides one with ability to extract the modified data and not only. The problem is that this implementation involves a repeated lookup for the same item: first with lookup, then with update - but the containers library exposes no functionality to get around that. So I suggest to implement an efficient version of withItem in the library.

This function turns out to be far more generalized than any of the currently present in the library, so it can become a basic building block for all sorts of modifying functions, including all the already existing ones, e.g.:

alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
alter f k = snd . withItem k (\i -> ((), f i))

delete :: Ord k => k -> Map k a -> Map k a
delete k = snd . withItem k (const ((), Nothing))

updateLookupWithKey :: 
  (Ord k) => 
  (k -> a -> Maybe a) -> 
  k -> 
  Map k a -> (Maybe a, Map k a)
updateLookupWithKey f k = 
  withItem k $ \i -> case i of
    Just i -> case f k i of
      Nothing -> (Just i, Nothing)
      Just i' -> (Just i', Just i')
    _ -> (Nothing, Nothing)

You can see how easy it makes to achieve any sort of specialized functionality. So, besides the evident benefits, this function can also become a replacement for a whole list of confusing specialized ones, thus greatly lightening the library.

You might have also noticed how this function is based around the standard a -> (b, a) pattern of the State monad, thus making it easily composable with it using the state and runState functions.

Summarizing, my suggestions are:

Implement an efficient version of withItem for lazy and strict versions of Map and IntMap.
Change the order of parameters from "lambda -> key" to "key -> lambda". The updateLookupWithKey example implementation shows how this change can be benefitial.
Begin the deprecation process of the following functions: insertWith, insertWithKey, insertLookupWithKey, adjust, adjustWithKey, update, updateWithKey, updateLookupWithKey, alter.

Implement fmap/coerce rules for GHC

GHC.Base decrees

{-# RULES "map/coerce" map coerce = coerce #-}

This rule applies equally well to any legitimate Functor instance (any Functor instance that obeys the functor laws), including various ones in containers.

Think some more about foldMap

In effect, we have two static arguments, mappend and mempty, that we pass down the FingerTree spine in a dictionary. The paper about the static argument transformation suggests that as soon as there are two static arguments, pulling them out tends to improve performance. I fear, however, that we may not be able to do so without ScopedTypeVariables. Barring that, we should think again about whether we should define foldMap or just use the foldr definition, and do some benchmarking. We should also consider the merits of an independent fold definition.

Consider splitting this package and repo

This package currently offers Map, IntMap, Set, IntSet, Seq, and Tree implementations. As far as I can tell, none of these depend on each other except that Tree depends on Seq, and there is no significant core of shared code. Splitting the package and repo should, I think, make it faster to build/upgrade new/experimental versions since you'd only be compiling the container that changed instead of rebuilding the whole lot of them.

Add Data.Set.[elemAt|deleteAt]

These functions are sorely missing from the Set API, but are present in the Map API. Is there a reason why these functions aren't implemented?

Generalize the drawing functions of Data.Tree

Hi,

I wanted to use drawTree on a Tree Int but it does not work for the drawing functions only accept Tree String instances. In my opinion this is too strict and can easily be generalized to Show s => Tree s. I made that change in a fork here: jonasc@e650dda
I would be happy to create a pull request for this commit.

mapM and traverse on Set

Data.Set lacks any analogues to the Traversable operations.

Add actually-strict versions of `Map` and `IntMap`

Obviously, the current Data.Map.Strict and Data.IntMap.Strict interfaces need to stick around for backwards compatibility, but they seem to be fundamentally broken:

Their Functor, Traversable, and (apparently) Data instances don't match their claimed strictness, because they share the same type as the lazy versions.
For the same reason, anyone using the Strict modules and getting a map from someone else has to assume the map might contain suspensions and therefore forcing its elements could generate exceptions or fail to terminate.

I think there are two sensible ways to make really strict versions:

Conservative

Wrap up the map implementations in newtypes and give them their own instances as appropriate. This is certainly the simplest approach. Its advantages:

Very little new code is needed.
The package source and binary grow only slightly.
It costs almost no extra time to maintain.
It supports a trivial strictToLazy function (unwrapping the newtype), a simple lazyToStrict function based on deepseq, and a cheap unsafeLazyToStrict function (the newtype wrapper).

Radical

Write completely new strict map types with strict fields. I imagine the function implementations, in most or all cases, would have exactly the same text as the lazy ones, so a bit of CPP hackery could probably reduce the source bloat, but not the binary bloat. Its advantages:

There's no need to carefully force everything by hand, and hope you didn't miss anything, because the data constructors will do the job themselves.
Any retrieved element is known to be forced, and GHC can probably take advantage of that to get a minor efficiency gain. It may or may not be able to improve strictness analysis and unbox more things.

It seems to me that the conservative approach would be pretty easy to swallow.

Add indexed function such as `elemAt` to `IntSet` and `IntMap`.

There are indexed functions for lookupIndex, findIndex, deleteIndex and elemAt in Data.Set as well as Data.Map (with additional updateAt). Please add similar functions to Data.IntSet and Data.IntMap if possible. I'd help me considerably. Thank you.

Consider adding size to `Digit`

At present, figuring out whether an index is left, middle, or right may require inspecting (and perhaps forcing) up to eight different Elem or Node values, which could be anywhere in the heap. If we add a size field to each Digit constructor, we can skip all of these. The downside, of course, is an extra word per Digit, which I conjecture is an entirely reasonable price to pay.

Missing instances on Data.Graph.SCC

It would be super handy to get some instances on the Data.Graph.SCC type

Read,Show,Eq,Traversable,Foldable

if possible

Typeable,Data,Generic

No explanation for use of local foldlStrict instead of foldl'

containers/Data/Set/Base.hs defines foldlStrict, which looks an awful lot like the usual foldl'. Is there a reason for using this rather than the one from Data.List or Data.Foldable? If so, that should be documented in the source. If not, that should be changed.

Inserts appears to have linear time complexity.

Hello,
I have recently stumbled upon some code that indicated that Data.Map.insert has linear time complexity (w.r.t the number of elements in the map).

I have created this criterion benchmark (compiled as ghc --make -O2 SetInsert.hs) and the results (included) seem to validate my concerns. I would not rule out the possibility that the benchmark is bad or that there is a bug in criterion.

My system: Linux kernel 3.8.6-1, GHC 7.6.2, containers 0.5.2.1

Consider using Data.Functor.Identity

Data.Sequence, at least, uses its own Id type. This looks like it's about the same as Identity from Data.Functor.Identity in base. Is there a reason not to use that? If not, I can give it a whirl.

containers-0.5.6.0 causes GHC testsuite tests to hang

I just tried updating GHC's container submodule to 0.5.6.0 (b9e4e22) but now a couple of testcases in GHC's testsuite simply hang (and get killed by timeouts)

Here's an excerpt:

=====> sequence001(normal) 4362 of 4363 [0, 10, 1] 
cd ../../libraries/containers/tests-ghc && '/home/hvr/Haskell/GHC/ghc/bindisttest/install   dir/bin/ghc' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-warn-tabs -fno-ghci-history -o sequence001 sequence001.hs  -package containers  >sequence001.comp.stderr 2>&1
cd ../../libraries/containers/tests-ghc && ./dataintset001    </dev/null >dataintset001.run.stdout 2>dataintset001.run.stderr
cd ../../libraries/containers/tests-ghc && ./sequence001    </dev/null >sequence001.run.stdout 2>sequence001.run.stderr
Wrong exit code (expected 0 , actual 1 )
Stdout:

Stderr:
sequence001: <<loop>>

*** unexpected failure for sequence001(normal)
Timeout happened...killing process...
Wrong exit code (expected 0 , actual 99 )
Stdout:
11

Stderr:

*** unexpected failure for T3333(normal)
[Errno 21] Is a directory: './ghci/linking/.'
[Errno 22] Invalid argument: './ghci/linking/.'

and here's the summary

Unexpected results from:
TEST="print026 package01e ghci037 sequence001 T706 T3333 ghcilink002 ghcilink003 ghcilink004 ghcilink005 ghcilink006 T8333 ghciprog004 hpc_ghc_ghci T3171 T9872b T9872c T9872a T5837"

OVERALL SUMMARY for test run started at Mon Dec 15 23:59:09 2014 CET
 0:13:22 spent to go through
    4363 total tests, which gave rise to
   16859 test cases, of which
   12834 were skipped

      55 had missing libraries
    3884 expected passes
      67 expected failures

       1 caused framework failures
       0 unexpected passes
      15 unexpected failures
       4 unexpected stat failures

Unexpected failures:
   ../../libraries/containers/tests-ghc  sequence001 [bad exit code] (normal)
   ../../libraries/hpc/tests/ghc_ghci    hpc_ghc_ghci [bad exit code] (normal)
   driver                                T706 [bad exit code] (normal)
   ghci.debugger/scripts                 print026 [bad exit code] (ghci)
   ghci/linking                          T3333 [bad exit code] (normal)
   ghci/linking                          ghcilink002 [bad exit code] (normal)
   ghci/linking                          ghcilink003 [bad exit code] (normal)
   ghci/linking                          ghcilink004 [bad exit code] (normal)
   ghci/linking                          ghcilink005 [bad exit code] (normal)
   ghci/linking                          ghcilink006 [bad exit code] (normal)
   ghci/prog004                          ghciprog004 [bad exit code] (normal)
   ghci/scripts                          ghci037 [bad exit code] (normal)
   ghci/should_run                       T3171 [bad exit code] (normal)
   package                               package01e [stderr mismatch] (normal)
   th                                    T8333 [bad exit code] (normal)

Unexpected stat failures:
   perf/compiler  T5837 [stat not good enough] (normal)
   perf/compiler  T9872a [stat not good enough] (normal)
   perf/compiler  T9872b [stat not good enough] (normal)
   perf/compiler  T9872c [stat not good enough] (normal)

/cc @treeowl and @foxik

Set has the "wrong" type for GHC

Reasonably recent GHC versions support GADTs, which allow class constraints on constructors. This suggests something like the following type for Set:

data Set a where
   Bin :: Ord a => {-# UNPACK #-} !Size -> !a -> !(Set a) -> !(Set a) -> Set a
   Tip :: Set a

I would speculate that a little CPP could let Set use this definition or something similar (and the O(log n) implementation of Data.Foldable.elem that goes with it) where supported with minimal internal or external breakage.

Try to improve <*> for Seq

After a bit of prodding, Ross Paterson kindly offered a technique for building a Seq with a predetermined size. This should probably allow us to implement <*> more efficiently, avoiding the incremental rebuilding currently used. I'm going to try to do that, but I'm opening this ticket to keep track.

Space leak in Data.Map.partitionWithKey?

I have tried, but I have been unable to condense this into a small example.. so please forgive me for describing what's going on in Cloud Haskell instead. I have the following function (https://github.com/haskell-distributed/distributed-process/blob/ceb7e5ae9f617da20f8e0920f0630a35849bf4fa/distributed-process/src/Control/Distributed/Process/Node.hs#L505):

ncEffectDied :: Identifier -> DiedReason -> NC ()
ncEffectDied ident reason = do
  (affectedLinks, unaffectedLinks) <- gets (splitNotif ident . (^. links))
  (affectedMons,  unaffectedMons)  <- gets (splitNotif ident . (^. monitors))

  -- Do stuff with affectedLinks, affectedMons

  modify' $ (links ^= unaffectedLinks) . (monitors ^= unaffectedMons)

NC is a state monad around IO. So we get the state from the state monad, extract two maps, partition those maps using splitNotif (see below), then do something with the first half and finally store the other half back in the state (links and monitors are Accessors). Some important remarks:

The state is defined as

 data NCState = NCState 
  {
    _links    :: !(Map Identifier (Set ProcessId))
  , _monitors :: !(Map Identifier (Set (ProcessId, MonitorRef)))
  , ...
  }

where, importantly, we have strict references to the maps from the state.

The maps do not contain Maps themselves.

We use modify' to update the state, which is a custom, strict, version of the standard modify:

modify' :: MonadState s m => (s -> s) -> m ()
modify' f = StateT.get >>= \s -> StateT.put $! f s

Now here's the rub. splitNotif is defined as

splitNotif ident = Map.partitionWithKey (\k !v -> impliesDeathOf ident k)

With this definition, my code runs in constant space. If however I remove the bang on v so that the predicate is no longer strict in the map values, I end up with a Map-typed space leak. I don't understand why this happens, nor have I been able to reproduce this in smaller test cases (though Ian speculated that that might be because the GC could optimize the smaller test cases but not the bigger one).

I don't know if this is a bug in partitionWithKey or not -- either way, if you understand why this happens, I'd really like to know.

Incorrect complexity documentation for IntMap functions?

The haddocks for IntMap (including the unreleased 0.5.0.0 version) claim that the runtime complexity of (for example) findMin is O(log n). The implementation looks to me like O(min(W, log n)), since it's just a direct downwards traversal of the tree until a leaf is found. Are the docs wrong or am I missing something?

Try to make fromList a little better

Currently, fromList just snocs list elements on one by one. If we have Deep n pr m (One a) and we're appending elements one at a time, we first build a Two, and then a Three, and then finally a Four constructor. I imagine we're losing more just with this shuffle than we're (potentially) gaining with foldr/build fusion. I have to wonder if there's a better way to build a sequence from left to right. We could try something simple to begin with, like maybe

fromList = Seq . fromList' empty where
  fromList' xs [] = xs
  fromList' (Deep n pr m (One a)) xs =
    case xs of
      (x1:x2:x3:xs') -> fromList' (Deep (n+3) pr (m `snocTree` (Node3 3 a (Elem x1) (Elem x2))) (One (Elem x3))) xs'
      [x1,x2] -> Deep (n+2) pr (m `snocTree` (Node2 2 a (Elem x1))) (One (Elem x2))
      [x] -> Deep n pr m (Two a (Elem x))
  fromList' (Deep n pr m (Two a b)) xs =
    case xs of
      (x1:x2:xs') -> fromList' (Deep (n+2) pr (m `snocTree` (Node3 3 a b (Elem x1))) (One (Elem x2))) xs'
      ....

Optimize (>>) for `Seq`

In base 4.8, (*>) will be implemented (by default) using (<$), for which Seq, at least, has an optimized implementation. But this won't help (>>) unless it's defined explicitly.

fromListWithDefault

I think it would be useful to have a function similar to Data.Array.IArray.accum, where the target type isn't the same as the input type.

fromListWithDefault :: Ord k => (a' -> a -> a') -> a' -> [(k, a)] -> Map k a'

An obvious application would be

Data.Map.fromListWithDefault (:) [] [(1,2), (1,3), (2,3)]

instead of

let f (a,b) = (a,[b]) in Data.Map.fromListWith (++) $ fmap f [(1,2), (1,3), (2,3)]

infix declaration for Set.member and Set.nonMember

In Data.Set, can member and nonMember please be declared infix 4, like Data.List.elem?
I guess the same goes for IntSet, Map, and IntMap as well.

Also, binary set operations such as union and intersection should be declared infixl 5, analogous to (++) (but associating to the left because of the hedge union algorithm).

Rationale: For lists, one can write

a+b elemc++d && e+felem g++h.

Analogously, the same notation should be possible for sets:

a+b Set.membercSet.uniond && e+fSet.membergSet.union h.

Suggested fix: in Data.Set, Data.IntSet, Data.Map, Data.IntMap:

infix 4 member
infix 4 nonMember
infixl 5 union
infixl 5 intersection
infix 5 difference

Thanks!

Add missing NFData instances for `Seq` and `SCC`

I've noticed that two data structures are still lacking NFData instances, namely Data.Sequence.Seq and Data.Graph.SCC.

Consider alternative sequence representations

The polymorphic recursion in Data.Sequence makes it hard to modify, limits flexibility in certain ways, and appears to confuse GHC's optimizer. One possible alternative is to switch from a nested type representation to a GADT one, using phantom types to keep track of balance information. I've converted <>, viewl, viewr, <|, |>, index, fmap, foldMap, traverse, and splitAt to use such a representation. You can see the result here: https://github.com/treeowl/GADTSeq/blob/master/Data/Sequence.hs

Consider adding fromFunction to Data.IntMap

I believe Data.IntMap should be able to support

fromFunction :: Int -> Int -> (Int -> a) -> IntMap a

where the first two arguments bound the domain. Alternatively, there might be some fancy way to use a domain made up of multiple ranges.

Does the order of contstructors really matter?

Looking at http://hackage.haskell.org/package/containers-0.5.2.1/docs/src/Data-Set-Base.html, I read:

-- [Note: Order of constructors]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- The order of constructors of Set matters when considering performance.
-- Currently in GHC 7.0, when type has 2 constructors, a forward conditional
-- jump is made when successfully matching second constructor. Successful match
-- of first constructor results in the forward jump not taken.
-- On GHC 7.0, reordering constructors from Tip | Bin to Bin | Tip
-- improves the benchmark by up to 10% on x86.

In #ghc:

nh2: thoughtpolice: another question about what you said though: if I only have two constructors, why do we need to "check" the second branch at all?
nh2: if there are two things and it's not the first one then sure it's the second one, isn't it?
Ghoul__: nh2: obviously that's optimized away
nh2: Ghoul__: not sure if I understand - those comments in containers sound like it isn't
Ghoul__: nh2: that optimization will be blazingly obvious at cmm
Ghoul__: I'd be really really surprised if it's not done
nh2: Ghoul__: well then either you will be surprised or what Data.Set says is wrong and the comment about the 10% performance increase is not accurate / outdated
thoughtpolice: OK, so i'm not privvy on the details today exactly, but pattern matching is actually not really an if statement - it's a jump table. but i imagine this can have an impact on performance in certain cases if we can scrutinize on a fast path
thoughtpolice: the easiest way is to just try it and run the benchmarks IMO. the new code generator may have changed things
leroux: thoughtpolice: nofib?
thoughtpolice: no, the benchmarks for containers

It would be great to find out whether this optimization is (still?) not done by GHC / if the 10% speed-up still occur. If it is, we can add it to the comment; if not then we probably have a cheap opportunity to make a lot of programs faster.

Data.Sequence does not check for `Int` overflow

><, replicate, replicateA, *>, and maybe other operations are all capable of making sequences that are like the Tardis: bigger on the inside. Any of these (especially when iterated) can easily make a sequence whose size cannot be represented by an Int, at which point nothing will work right, and operations will either produce crazy errors or silently give wrong answers. The solution is to check for this in all operations that expand sequences, from <| on up. While we're at it, we may also want to switch from Int to Word size representations internally.

Set: A variant of insert that tells whether the element was already inside

Would it make sense to add this?

Analog to insertLookupWithKey from Data.Map, saving one traversal.

insertLookup :: Ord a => a -> Set a -> (Bool, Set a)

insertLookup :: Ord a => a -> Set a -> Maybe (Set a)

Data.Sequence.splitAt semantics are unfortunate

As is often the case, different definitions of splitAt have different balances of useful laziness and efficiency in strict contexts. I would argue that the definition in Data.Sequence.splitAt is inefficient but does not offer really useful laziness.

Data.Sequence.splitAt has semantics that initially look similar to the ones the Haskell Report specifies for Data.List.splitAt:

splitAt n xs = (take n xs, drop n xs)

This immediately implies that splitAt undefined undefined is (undefined, undefined) rather than undefined, which is a little bit lazy, but not really usefully so.

The Report does not directly specify the laziness of take or drop, but the standard Prelude definition of take is lazy in the list argument:

take n _      | n <= 0 =  []

This, however, is not the case for Data.Sequence.take:

take i = fst . splitAt i

splitAt i (Seq xs) = (Seq l, Seq r)
  where (l, r) = split i xs

split :: Int -> FingerTree (Elem a) ->
    (FingerTree (Elem a), FingerTree (Elem a))
split i Empty = i `seq` (Empty, Empty)
split i xs
  | size xs > i = (l, consTree x r)
  | otherwise = (xs, Empty)
  where Split l x r = splitTree i xs

In particular,

take 0 undefined = undefined

Note also that the above definition does not make i <= 0 a special case, so if fakefoo = drop 0 foo,

reallyUnsafePtrEquality# foo fakefoo = 0#

GHC's version of Data.List.splitAt does something a bit non-standard with Data.List.splitAt: it is strict in its Int argument but lazy in its list argument, so

splitAt undefined [] = undefined

but

splitAt 0 undefined = ([], undefined)

The Report semantics strike me as a bit silly, since laziness in the split point doesn't really seem to help in any way I can personally think of. I think we should consider just using the GHC semantics:

splitAt i s@(Seq xs)
  | i <= 0 = (empty, s)
  | otherwise = case xs of
                  Empty -> (empty, empty)
                  _ | i < size xs -> case splitTree i xs of
                                      Split l x r -> (Seq l, Seq (consTree x r))
                    | otherwise -> (s, empty)

Remove superfluous 'Ord a' constraint on 'Set.map'

The type of Set.map is (Ord a, Ord b) => (a -> b) -> Set a -> Set b but the Ord a constraint is unused. I suggest that its type should be changed to Ord b => (a -> b) -> Set a -> Set b. It will make the function easier to reason about and may possibly improve performance.

0.5 version of Data.Map.insertWith' doesn't have the same strictness properties as < 0.5

I know the function is deprecated, so this is mostly just here as a note:

Before 0.5, Data.Map.insertWith' didn't force the arguments to the function before applying it - just the result of applying the function. In 0.5, both the arguments and the result of the function are forced.

Monoid instance for Map acts strangely

I would expect, that mappend is also used for overlapping values:

ghci Data.Map Data.Monoid> Sum 5 <> Sum 3
Sum {getSum = 8}
ghci Data.Map Data.Monoid> fromList [(42, Sum 5)] <> fromList [(42, Sum 3)]
fromList [(42,Sum {getSum = 5})]

ghci Data.Map Data.Monoid> Product 5 <> Product 3
Product {getProduct = 15}
ghci Data.Map Data.Monoid> fromList [(1, Product 5)] <> fromList [(1, Product 3)]
fromList [(1,Product {getProduct = 5})]

ghci Data.Map Data.Monoid> "Hello" <> " World"
"Hello World"
ghci Data.Map Data.Monoid> fromList [("en", "Hello"), ("de", "Hallo")] <> fromList [("en", " world"), ("de", " Welt")]
fromList [("de","Hallo"),("en","Hello")]

Is there any reason, why this Monoid instance was implemented this way?

Functor instance for Data.Map.Strict ist not strict

Prelude> import Data.Map.Lazy as L
Prelude L> import Data.Map.Strict  as S
Prelude L S> let m = S.fromList [(1, ())]
Prelude L S> S.size  $ fmap undefined m
1
Prelude L S> S.size  $ S.map undefined m
*** Exception: Prelude.undefined

As there shoule only be one instance for a given type, it is not possible to make the the functor instance for Data.Map.Strict to be strict without newtyping. Is there a reason for this behavior?

Attempt to rewrite (>>=) applications when appropriate

Sometimes, as in a replicate-optimized Seq, m >> n or m *> n will be much better than m >>= \_ -> n; in these cases, GHC will not know that they are the same. We can add GHC rewrite rules to attempt to do the right thing as appropriate.