kowainik / slist Goto Github PK

View Code? Open in Web Editor NEW

46.0 6.0 6.0 81 KB

♾️ Sized list

Home Page: https://kowainik.github.io/projects/slist

License: Mozilla Public License 2.0

Haskell 100.00%

data-structure list sized haskell hacktoberfest

slist's Introduction

slist

⚠️ Caution: this is a very opinionated library. There is no intention to replace the standard list data type. We are aware of every design decision we made for this package, and we are taking responsibility for that design. If you find it inappropriate, please, consider to use another library instead, that would fulfil your requirements.

This package introduces sized list data type — Slist. The data type has the following shape:

data Slist a = Slist
    { sList :: [a]
    , sSize :: Size
    }

As you can see that along with the familiar list, it contains Size field that represents the size of the structure. Slists can be finite or infinite, and this is expressed with Size.

data Size
    = Size Int
    | Infinity

⚠️ Caution: Int is used for the size by design. We had to make some trade-offs to provide the better (as we think) interface for users. For more details on the choice, see the Haddock documentation for the Size data type.

This representation of the list gives some additional advantages. Getting the length of the list is the "free" operation (runs in O(1)). This property helps to improve the performance for a bunch of functions like take, drop, at, etc. But also it doesn't actually add any overhead on the existing functions.

Also, this allows to write a number of safe functions like safeReverse, safeHead, safeLast, safeIsSuffixOf, etc.

Comparison

Check out the comparison table between lists and slists performance.

Function	list (finite)	list (infinite)	Slist (finite)	Slist (infinite)
`length`	`O(n)`	<hangs>	`O(1)`	`O(1)`
`safeLast`	`O(n)`	<hangs>	`O(n)`	`O(1)`
`init`	`O(n)`	<works infinitely>	`O(n)`	`O(1)`
`take`	`O(min i n)`	`O(i)`	`0 < i < n`: `O(i)`; otherwise: `O(1)`	`O(i)`
`at`	`O(min i n)` (run-time exception)	`O(i)` (run-time exception)	`0 < i < n`: `O(i)`; otherwise: `O(1)`	`O(i)`
`safeStripPrefix`	`O(m)`	`O(m)` (can hang)	`O(m)`	`O(m)`

Potential usage cases

When you ask the size of the list too frequently.
When you need to convert to data structures that require to know the list size in advance for allocating an array of the elements. Example: Vector data structure.
When you need to serialise lists.
When you need to control the behaviour depending on the finiteness of the list.
When you need a more efficient or safe implementation of some functions.

slist's People

Contributors

Stargazers

Watchers

Forkers

waynee95 rohitdm97 rszczers mstruebing e7dal tomjaguarpaw

slist's Issues

Test failures on ghc 9.4.2

Not sure what it's all about but here it is:

running tests
Running 2 test suites...
Test suite slist-doctest: RUNNING...
src/Slist.hs:386: failure in expression `head mempty'
expected: *** Exception: Prelude.head: empty list
 but got: *** Exception: Prelude.head: empty list
          CallStack (from HasCallStack):
          ^
            error, called at libraries/base/GHC/List.hs:1646:3 in base:GHC.List
            errorEmptyList, called at libraries/base/GHC/List.hs:85:11 in base:GHC.List
            badHead, called at libraries/base/GHC/List.hs:81:28 in base:GHC.List
            head, called at src/Slist.hs:391:8 in main:Slist

src/Slist.hs:416: failure in expression `last mempty'
expected: *** Exception: Prelude.last: empty list
 but got: *** Exception: Prelude.last: empty list
          CallStack (from HasCallStack):
          ^
            error, called at libraries/base/GHC/List.hs:1646:3 in base:GHC.List
            errorEmptyList, called at libraries/base/GHC/List.hs:156:13 in base:GHC.List
            lastError, called at libraries/base/GHC/List.hs:151:29 in base:GHC.List
            last, called at src/Slist.hs:425:8 in main:Slist

src/Slist.hs:1477: failure in expression `unsafeAt (-1) sl'
expected: *** Exception: Prelude.!!: negative index
 but got: *** Exception: Prelude.!!: negative index
          CallStack (from HasCallStack):
          ^
            error, called at libraries/base/GHC/List.hs:1371:12 in base:GHC.List
            negIndex, called at libraries/base/GHC/List.hs:1375:17 in base:GHC.List
            !!, called at src/Slist.hs:1485:30 in main:Slist

Examples: 227  Tried: 225  Errors: 0  Failures: 3

Thanks

Add `chunksOf`

What do you think about adding the following function?

chunksOf :: Int -> Slist a -> Slist (Slist a)

For ordinary lists it would work like this:

ghci> chunksOf 3 [0..7]
[[0,1,2], [3,4,5], [6,7]]

But since we have access to the list length, we can implement a more efficient version of chunksOf.

Add 'partitionOn'

Currently, slist has the following function:

partition :: forall a. (a -> Bool) -> Slist a -> (Slist a, Slist a)

But sometimes it's desirable to change the types of lists, and the function like below can be helpful:

partitionOn :: (a -> Either b c) -> Slist a -> (Slist b, Slist c)

The name can be different 🙂 But I wanted something like this for ordinary lists so many times...

Property tests

For type class instances laws checks.

Add generic operations

https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-List.html#g:26

Create patterns for smooth pattern-matching

Add `mapMaybe`

Use main default branch

Setup GitHub Actions

Serializable representations with leading length

This is more of a comment for something to keep in mind in future work, rather than an issue about something that exists now in the code. I mention it now because this potential issue is easier to prevent than to correct afterwards =)

I noticed that in the internal representation for Slist, the size comes after the list itself. This is fine, there's nothing wrong with this in principle. However, when the time comes to implement serializations for values of type Slist x, such as Binary or Serialize or Serialise or ToJSON or whatever, if the serialized representations include the size of the list, please make sure the size comes before the list content, and not after. This can make parsing significantly more performant, since the expected length can be verified before even attempting to parse the entirety of the list.

Cool library!

Build with Stack

In order to upload to Stackage in the future

Support for GHC 9.2

Now build fails due to base version constraints.

Support GHC-8.8.1

Support GHC-8.10

Add `cons`

Render issue on hackage page

Not sure if this is already known, but on the hackage page the rendering of the table is broken. It seems that hackage does not support tables.

I found this compromise.

It's not a huge issue, just noticed it.

len's maxBound?

Thanks for the cool library @vrom911! I really like the docs!

One thing I think should be changed, because it could otherwise have catastrophic consequences, is the fact that len, length and genericLength return maxBound on infinite lists. There are at least a couple of problems with this.

First, that there's no way to tell apart, just by looking at this length value, a list of length maxBound from an infinite list. If this length value was to be used to decide, say, how many resources to allocate for some work, then maxBound will surely deplete all resources. This could be addressed by returning, say, -1 rather than maxBound. But…

Second, a program checking the length of a Slist would erroneously succeed where one using [] would diverge. This is not good. As unfortunate as diverging code is, a diverging program that prevents us from accomplishing the wrong thing is better than one that doesn't. This can be addressed by having the length of an infinite list result in error "Tried to obtain the length on an infinite Slist", which is still diverging code, but rather than hanging forever it aborts right away, which is definitely an improvement over length for plain lists.

Obviously, people are encouraged to use size rather than len, length or genericLength, but many times, particularly when code is implemented in terms of Foldable, referring to length will be unavoidable.

My recommendation is to change len and friends so that they loudly fail with error, rather than silently succeed with the wrong result.

Keep up the good work! I look forward to using this library!

[RFC] Sized Difference List

Because of how ordinary lists are implemented in Haskell, left-associative appending of such lists is very slow. The following takes much more time than it should:

((((l1 ++ l2) ++ l3) ++ l4) ++ ...)

Optimal appending order is the following:

l1 ++ (l2 ++ (l3 ++ (l4 ++ ...))))

There's a trick called Difference List which automatically rearranges list appending to get an optimal order. The trick is representing a list in a different way: instead of storing [a], you store [a] -> [a]. It's implemented in the following packages: https://hackage.haskell.org/package/dlist

This technique is very useful if you don't have control of how lists are appending (for example, when appending inside some recursive call).

I thought, that maybe slist package could provide a data type like SDList (similar to DList) — sized difference list to optimize appending. The problem with DList is since it represents the list as a function, you can't get any info from this function for free. With SDList you should be able to store size and get at list size, so I'm thinking about data type like this:

data SDList a = SDList Size (Slist a -> Slist a)

Do you think it's worth having such a structure in the library? I guess you don't need to implement bazillion functions for this structure at first, it will be enough to have a few basic functions, because mostly you're interested in appending such difference lists and converting from and to Slist.

NonEmpty Slist

Create another data structure similar to NonEmpty implemented through the ordinary list. Something like this:

data SNonEmpty a = a :| Slist a

[RFC] Strict append

Sometimes, I want to do <> on lists but add sizes strictly to avoid space leaks. The size field of the Slist data structure is intentionally lazy to make some functions work and not hang on some corner cases (or maybe to have a better lazy semantics). So having a separate function like append' probably won't hurt 🙂

I also often use concatMap to combine a lot of lists, so having concatMap' will be nice as well 👍

Both strict append' and concatMap' can be faster my skipping empty lists. This is especially relevant when you want to append a lot of lists of size 0 or 1.

Alternatively, we can change the definition of <> itself, but dealing with laziness is difficult, so probably we can introduce a few separate functions.

Add faster `sortWith`

Support GHC-9.6

Error: setup: Encountered missing or private dependencies:
base >=4.10.1.0 && <4.18

Support GHC-9.0

[__1] rejecting: base-4.15.0.0/installed-4.15.0.0 (conflict: slist =>
base>=4.10.1.0 && <4.15)

Implement fromRange function

This can be useful as an alternative to [from .. to] syntax in lists. So I propose to add function with the following type:

fromRange :: Int -> Int -> Slist Int
fromRange from to = ...

Probably this function can be generalized further to work not only with Int 🤔

Support GHC-9.4

[RFC] Integration with `containers`

While using slist, I've noticed that sometimes I want to convert Map/Set to slist more efficiently than using the slist function. I'm thinking about adding integration with containers and providing functions:

mapToVals :: Map k v -> Slist v
mapToKeys :: Map k v -> Slist k
mapToPairs :: Map k v -> Slist (k, v)

setToSlist :: Set a -> Slist a

If we depend on containers, we also can provide a more efficient versions of some helpful functions:

ordNub :: Ord a => Slist a -> Slist a
countUnique :: Ord a => Slist a -> Int  -- number of unique elements