Giter Site home page Giter Site logo

collection-strawman's Introduction

This is Scala 2! Welcome!

This is the home of the Scala 2 standard library, compiler, and language spec.

For Scala 3, visit scala/scala3.

How to contribute

Issues and bug reports for Scala 2 are located in scala/bug. That tracker is also where new contributors may find issues to work on: good first issues, help wanted.

For coordinating broader efforts, we also use the scala/scala-dev tracker.

To contribute here, please open a pull request from your fork of this repository.

Be aware that we can't accept additions to the standard library, only modifications to existing code. Binary compatibility forbids adding new public classes or public methods. Additions are made to scala-library-next instead.

We require that you sign the Scala CLA before we can merge any of your work, to protect Scala's future as open source software.

The general workflow is as follows.

  1. Find/file an issue in scala/bug (or submit a well-documented PR right away!).
  2. Fork the scala/scala repo.
  3. Push your changes to a branch in your forked repo. For coding guidelines, go here.
  4. Submit a pull request to scala/scala from your forked repo.

For more information on building and developing the core of Scala, read the rest of this README, especially for setting up your machine!

Get in touch!

In order to get in touch with other Scala contributors, join the #scala-contributors channel on the Scala Discord chat, or post on contributors.scala-lang.org (Discourse).

If you need some help with your PR at any time, please feel free to @-mention anyone from the list below, and we will do our best to help you out:

username talk to me about...
@lrytz back end, optimizer, named & default arguments, reporters
@retronym 2.12.x branch, compiler performance, weird compiler bugs, lambdas
@SethTisue getting started, build, CI, community build, Jenkins, docs, library, REPL
@dwijnand pattern matcher, MiMa, partest
@som-snytt warnings/lints/errors, REPL, compiler options, compiler internals, partest
@Ichoran collections library, performance
@viktorklang concurrency, futures
@sjrd interactions with Scala.js
@NthPortal library, concurrency, scala.math, LazyList, Using, warnings
@bishabosha TASTy reader
@joroKr21 higher-kinded types, implicits, variance

P.S.: If you have some spare time to help out around here, we would be delighted to add your name to this list!

Branches

Target the oldest branch you would like your changes to end up in. We periodically merge forward from older release branches (e.g., 2.12.x) to new ones (e.g. 2.13.x).

If your change is difficult to merge forward, you may be asked to also submit a separate PR targeting the newer branch.

If your change is version-specific and shouldn't be merged forward, put [nomerge] in the PR name.

If your change is a backport from a newer branch and thus doesn't need to be merged forward, put [backport] in the PR name.

Choosing a branch

Most changes should target 2.13.x. We are increasingly reluctant to target 2.12.x unless there is a special reason (e.g. if an especially bad bug is found, or if there is commercial sponsorship).

The 2.11.x branch is now inactive and no further 2.11.x releases are planned (unless unusual, unforeseeable circumstances arise). You should not target 2.11.x without asking maintainers first.

Repository structure

Most importantly:

scala/
+--build.sbt                 The main sbt build definition
+--project/                  The rest of the sbt build
+--src/                      All sources
   +---/library              Scala Standard Library
   +---/reflect              Scala Reflection
   +---/compiler             Scala Compiler
+--test/                     The Scala test suite
   +---/files                Partest tests
   +---/junit                JUnit tests
   +---/scalacheck           ScalaCheck tests
+--spec/                     The Scala language specification

but also:

scala/
   +---/library-aux          Scala Auxiliary Library, for bootstrapping and documentation purposes
   +---/interactive          Scala Interactive Compiler, for clients such as an IDE (aka Presentation Compiler)
   +---/intellij             IntelliJ project templates
   +---/manual               Scala's runner scripts "man" (manual) pages
   +---/partest              Scala's internal parallel testing framework
   +---/partest-javaagent    Partest's helper java agent
   +---/repl                 Scala REPL core
   +---/repl-frontend        Scala REPL frontend
   +---/scaladoc             Scala's documentation tool
   +---/scalap               Scala's class file decompiler
   +---/testkit              Scala's unit-testing kit
+--admin/                    Scripts for the CI jobs and releasing
+--doc/                      Additional licenses and copyrights
+--scripts/                  Scripts for the CI jobs and releasing
+--tools/                    Scripts useful for local development
+--build/                    Build products
+--dist/                     Build products
+--target/                   Build products

Get ready to contribute

Requirements

You need the following tools:

  • Java SDK. The baseline version is 8 for both 2.12.x and 2.13.x. It is almost always fine to use a later SDK such as 11 or 15 for local development. CI will verify against the baseline version.
  • sbt

MacOS and Linux work. Windows may work if you use Cygwin. Community help with keeping the build working on Windows and documenting any needed setup is appreciated.

Tools we use

We are grateful for the following OSS licenses:

Build setup

Basics

During ordinary development, a new Scala build is built by the previously released version, known as the "reference compiler" or, slangily, as "STARR" (stable reference release). Building with STARR is sufficient for most kinds of changes.

However, a full build of Scala is bootstrapped. Bootstrapping has two steps: first, build with STARR; then, build again using the freshly built compiler, leaving STARR behind. This guarantees that every Scala version can build itself.

If you change the code generation part of the Scala compiler, your changes will only show up in the bytecode of the library and compiler after a bootstrap. Our CI does a bootstrapped build.

Bootstrapping locally: To perform a bootstrap, run restarrFull within an sbt session. This will build and publish the Scala distribution to your local artifact repository and then switch sbt to use that version as its new scalaVersion. You may then revert back with reload. Note restarrFull will also write the STARR version to buildcharacter.properties so you can switch back to it with restarr without republishing. This will switch the sbt session to use the build-restarr and target-restarr directories instead of build and target, which avoids wiping out classfiles and incremental metadata. IntelliJ will continue to be configured to compile and run tests using the starr version in versions.properties.

For history on how the current scheme was arrived at, see https://groups.google.com/d/topic/scala-internals/gp5JsM1E0Fo/discussion.

Building with fatal warnings: To make warnings in the project fatal (i.e. turn them into errors), run set Global / fatalWarnings := true in sbt (replace Global with the name of a module—such as reflect—to only make warnings fatal for that module). To disable fatal warnings again, either reload sbt, or run set Global / fatalWarnings := false (again, replace Global with the name of a module if you only enabled fatal warnings for that module). CI always has fatal warnings enabled.

Using the sbt build

Once you've started an sbt session you can run one of the core commands:

  • compile compiles all sub-projects (library, reflect, compiler, scaladoc, etc)
  • scala / scalac run the REPL / compiler directly from sbt (accept options / arguments)
  • enableOptimizer reloads the build with the Scala optimizer enabled. Our releases are built this way. Enable this when working on compiler performance improvements. When the optimizer is enabled the build will be slower and incremental builds can be incorrect.
  • setupPublishCore runs enableOptimizer and configures a version number based on the current Git SHA. Often used as part of bootstrapping: sbt setupPublishCore publishLocal && sbt -Dstarr.version=<VERSION> testAll
  • dist/mkBin generates runner scripts (scala, scalac, etc) in build/quick/bin
  • dist/mkPack creates a build in the Scala distribution format in build/pack
  • junit/test runs the JUnit tests; junit/testOnly *Foo runs a subset
  • scalacheck/test runs scalacheck tests, use testOnly to run a subset
  • partest runs partest tests (accepts options, try partest --help)
  • publishLocal publishes a distribution locally (can be used as scalaVersion in other sbt projects)
    • Optionally set baseVersionSuffix := "bin-abcd123-SNAPSHOT" where abcd123 is the git hash of the revision being published. You can also use something custom like "bin-mypatch". This changes the version number from 2.13.2-SNAPSHOT to something more stable (2.13.2-bin-abcd123-SNAPSHOT).
    • Note that the -bin string marks the version binary compatible. Using it in sbt will cause the scalaBinaryVersion to be 2.13. If the version is not binary compatible, we recommend using -pre, e.g., 2.14.0-pre-abcd123-SNAPSHOT.
    • Optionally set ThisBuild / Compile / packageDoc / publishArtifact := false to skip generating / publishing API docs (speeds up the process).

If a command results in an error message like a module is not authorized to depend on itself, it may be that a global sbt plugin is causing a cyclical dependency. Try disabling global sbt plugins (perhaps by temporarily commenting them out in ~/.sbt/1.0/plugins/plugins.sbt).

Sandbox

We recommend keeping local test files in the sandbox directory which is listed in the .gitignore of the Scala repo.

Incremental compilation

Note that sbt's incremental compilation is often too coarse for the Scala compiler codebase and re-compiles too many files, resulting in long build times (check sbt#1104 for progress on that front). In the meantime you can:

  • Use IntelliJ IDEA for incremental compiles (see IDE Setup below) - its incremental compiler is a bit less conservative, but usually correct.

IDE setup

We suggest using IntelliJ IDEA (see src/intellij/README.md).

Metals may also work, but we don't yet have instructions or sample configuration for that. A pull request in this area would be exceedingly welcome. In the meantime, we are collecting guidance at scala/scala-dev#668.

In order to use IntelliJ's incremental compiler:

  • run dist/mkBin in sbt to get a build and the runner scripts in build/quick/bin
  • run "Build" - "Make Project" in IntelliJ

Now you can edit and build in IntelliJ and use the scripts (compiler, REPL) to directly test your changes. You can also run the scala, scalac and partest commands in sbt. Enable "Ant mode" (explained above) to prevent sbt's incremental compiler from re-compiling (too many) files before each partest invocation.

Coding guidelines

Our guidelines for contributing are explained in CONTRIBUTING.md. It contains useful information on our coding standards, testing, documentation, how we use git and GitHub and how to get your code reviewed.

You may also want to check out the following resources:

Scala CI

Build Status

Once you submit a PR your commits will be automatically tested by the Scala CI.

Our CI setup is always evolving. See scala/scala-dev#751 for more details on how things currently work and how we expect they might change.

If you see a spurious failure on Jenkins, you can post /rebuild as a PR comment. The scabot README lists all available commands.

If you'd like to test your patch before having everything polished for review, you can have Travis CI build your branch (make sure you have a fork and have Travis CI enabled for branch builds on it first, and then push your branch). Also feel free to submit a draft PR. In case your draft branch contains a large number of commits (that you didn't clean up / squash yet for review), consider adding [ci: last-only] to the PR title. That way only the last commit will be tested, saving some energy and CI-resources. Note that inactive draft PRs will be closed eventually, which does not mean the change is being rejected.

CI performs a compiler bootstrap. The first task, validatePublishCore, publishes a build of your commit to the temporary repository https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots. Note that this build is not yet bootstrapped, its bytecode is built using the current STARR. The version number is 2.13.2-bin-abcd123-SNAPSHOT where abcd123 is the commit hash. For binary incompatible builds, the version number is 2.14.0-pre-abcd123-SNAPSHOT.

You can use Scala builds in the validation repository locally by adding a resolver and specifying the corresponding scalaVersion:

$ sbt
> set resolvers += "pr" at "https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots/"
> set scalaVersion := "2.12.2-bin-abcd123-SNAPSHOT"
> console

"Nightly" builds

The Scala CI builds nightly download releases and publishes them to https://scala-ci.typesafe.com/artifactory/scala-integration/ .

Using a nightly build in sbt is explained in this Stack Overflow answer

Although we casually refer to these as "nightly" builds, they aren't actually built nightly, but "mergely". That is to say, a build is published for every merged PR.

Scala CI internals

The Scala CI runs as a Jenkins instance on scala-ci.typesafe.com, configured by a chef cookbook at scala/scala-jenkins-infra.

The build bot that watches PRs, triggers testing builds and applies the "reviewed" label after an LGTM comment is in the scala/scabot repo.

Community build

The Scala community build is an important method for testing Scala releases. A community build can be launched for any Scala commit, even before the commit's PR has been merged. That commit is then used to build a large number of open-source projects from source and run their test suites.

To request a community build run on your PR, just ask in a comment on the PR and a Scala team member (probably @SethTisue) will take care of it. (details)

Community builds run on the Scala Jenkins instance. The jobs are named ..-integrate-community-build. See the scala/community-builds repo.

collection-strawman's People

Contributors

allanrenucci avatar biboudis avatar epronovost avatar esarbe avatar ichoran avatar jiminhsieh avatar julienrf avatar kelebra avatar lptk avatar lrytz avatar marcelocenerine avatar msteindorfer avatar nicolasstucki avatar nthportal avatar odd avatar odersky avatar olafurpg avatar optician avatar pathikrit avatar pnf avatar psycho7 avatar sethtisue avatar shawjef3 avatar sjrd avatar smarter avatar stephennancekivell avatar szeiger avatar tim-ruhland avatar tpolecat avatar xavier-fernandez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collection-strawman's Issues

Exceptions and errors encountered during running of time benchmarks

When running the time benchmarks while evaluating the performance of #52 some exceptions and errors for the other collection types occurred.

The time benchmarks was in this case run with:
charts -f 1 -bm avgt -bs 1 -w 100ms -r 500ms -to 30s -t max -wi 10 -i 10

The exceptions encountered where:

  • LazyListBenchmark.foldLeft for sizes 1, 7, 17, 282, 7890: java.util.NoSuchElementException: next on empty iterator
  • LazyListBenchmark.foldRight for sizes 1, 7, 17, 282: java.util.NoSuchElementException: next on empty iterator
  • LazyListBenchmark.foreach for sizes 1, 17: java.util.NoSuchElementException: next on empty iterator
  • LazyListBenchmark.uncons for size 1: java.util.NoSuchElementException: next on empty iterator

The errors encountered where:

  • LazyListBenchmark.foldLeft for size 32911: java.lang.StackOverflowError
  • LazyListBenchmark.foldRight for size 32911: java.lang.StackOverflowError
  • ListBenchmark.uncons for size 1, 7890, 32911: java.lang.StackOverflowError
  • ListBenchmark.foldLeft for sizes 7890, 32911: java.lang.StackOverflowError
  • ListBenchmark.foldRight for size 7890, 32911: java.lang.StackOverflowError
  • ArrayBufferBenchmark.foldLeft for sizes 7890, 32911: java.lang.StackOverflowError
  • ArrayBufferBenchmark.foldRight for sizes 7890, 32911: java.lang.StackOverflowError
  • ListBufferBenchmark.foldLeft for sizes 7890, 32911: java.lang.StackOverflowError
  • ListBufferBenchmark.foldRight for sizes 7890, 32911: java.lang.StackOverflowError

The stack overflow errors could be avoided by increasing the stack size (while a tail recursive implementation would avoid the problem altogether).

Add missing operations

Add methods from the existing collections library that are missing in strawman-collections.

Map functional combinators

Last year I opened a PR to Scala 2.12 aimed to include a join like operation subset within Map operations set:

scala/scala#5353

After some discussion it was clear that a more functional interface would fit better with the current map collections interface as join methods belong to the foreign (and not so abstract) realm of databases operations.

That was thoroughly syntetized by @szeiger 's comment, who proposed the following operation set:

def zipByKeyWith[W, X](that: Map[K, W])(f: (V, W) => X): Map[K, X]
def zipByKey[W](that: Map[K, W]): Map[K, (V, W)] = zipByKeyWith(that)(_ -> _)

And, to be able to handle non matching - in both maps - keys:

def mergeByKeyWith[W, X](that: Map[K, W](f: PartialFunction[(Option[V], Option[W]), X]: Map[K, X]
def mergeByKey[W](that: Map[K, W]): Map[K, (Option[V], Option[W])] = mergeByKeyWith(that)(PartialFunction(identity))

merge* would then be the way to provide left/right/full-outer join logical operations.

How could join operations be expressed using this API?

(NOTE: This is an application example, not part of the requested functionality)

  • Inner join:
def join[K, VA, VB](a: Map[K, VA], b: Map[K, VB]): Map[K, (VA, VB)] =
    a zipByKey b
  • Left outer join:
def leftOuterJoin[K, VA, VB](a: Map[K, VA], b: Map[K, VB]): Map[K, (VA, Option[VB])] =
    a.mergeByKeyWith(b) { case (Some(va), optVb) => va -> optVb }
  • Right outer join:
def rightOuterJoin[K, VA, VB](a: Map[K, VA], b: Map[K, VB]): Map[K, (Option[VA], VB)] = 
    a.mergeByKeyWith(b) { case (optVa, Some(vb)) => optVa -> vb }
  • Full outer join:
def fullOuterJoin[K, VA, VB](a: Map[K, VA], b: Map[K, VB]): Map[K, (Option[VA], Option[VB])] = 
    a.mergeByKey(b)

I think mergeByKey* operations are important as they cover all outer cases. In fact, zipByKeyWith implementation could be:

def zipByKeyWith[W, X](that: Map[K, W])(f: (V, W) => X): Map[K, X] = 
    mergeByKeyWith(that) { case (Some(va), Some(vb)) => f(va, vb) }

But I wouldn't use that implementation since, provided you're zipping only matching keys, you could avoid scrutinising that key set.

Have some sort of HasBuilder type class ?

Is there any chance to have some sort of type class that would provide features similar to:

trait HasBuilder[F[_]] {
  def newBuilder[A]: Builder[A, F[A]]
}

With default instances for all applicable collection types?

I know of at least one concrete use case: type class based decoding libraries. Let's take the example of a very simple CSV decoding library. One would write decoding type class as follows:

// Decodes a single CSV cell
trait CellDecoder[A] {
  def decode(cell: String): Try[A]
}

// Decode an entire CSV row
trait RowDecoder[A] {
  def decode(row: Seq[String]): Try[A]
}

With CanBuildFrom, it's easy to provide an instance of RowDecoder for all collection types:

implicit def cbfDecoder[A, F[_]](implicit da: CellDecoder[A], cbf: CanBuildFrom[Nothing, A, F[A]]): RowDecoder[F[A]] = new RowDecoder[F[A]] {
  override def decode(row: Seq[String]): (row.foldLeft(Success(cbf.apply())) { (acc, cell) => 
    for {
      builder <- acc
      a <- da.decode(cell)
    } yield builder += a).map(_.result())
}

I know of at least three libraries that use this mechanism:

I understand CanBuildFrom is going to disappear, and no mechanism is planned to emulate the above behaviour. The HasBuilder type class I'm suggesting would replace it, and is vastly simpler (and less tied to the core implementation of the collection API) than CBF.

It would ideally be part of the standard collection API, although I can't think of anything that would prevent third party libraries from providing that feature externally.

HashSet should be parametrized on a hashable type class

Tree-style collections (TreeMap, TreeSet) take an instance of Ordering[A] as a parameter. This is necessary because there's only universal equality on the JVM, but not universal "less than". The nice "side effect" of that is that I can pass in custom instances when constructing collections.

On the other hand, hash-style collections (HashMap, HashSet) do not take an instance of Hash[A] as a parameter, for I believe two reasons:

  1. there is universal equality and hashing on the JVM
  2. the type class doesn't exist yet

If I want to use a different hash code implementation, I can either implement a new collection type or use awkward newtype-style wrappers. If the collections were parametrized on a Hash instance, I could pass in a custom instance.

Note that this proposal is unaffected by global instance uniqueness. I'm merely suggesting establishing feature parity between hash-style and tree-style collections. Whether or not you should pass in multiple different instances per type is an entirely different discussion.

Regression in SortedMap

I commented directly on the commit that introduced the regression but I think it’s easier to keep track of the issue if I create a proper issue…

I didn’t notice that when I first reviewed the code but I think there is a regression here.

The updated and + methods are overriden in order to refine the return type CC so that it is SortedMap instead of Map (which we would have otherwise, because we inherit MapLike[K, V, Map]).

This is a problem because that means that we will have to carefully refine the return type of all the operations inherited from Map, furthermore the operator aliases can not be made final anymore.

I initially fixed this problem in #56 by introducing (yet…) another abstraction named MapValuePolyTransforms, whose purpose was to model the return type of transformation operations that return a Map collection with the same key type but a different value type (like this updated method).

To adapt this solution to the new design, we would have to introduce a fifth type parameter to the MapOps trait:

trait MapOps[K, +V, +CC[X, Y], +VC[Y], +C] {
  def updated[V0 >: V](key: K, value: V0): VC[V0]
}

Where VC means “constructor for a map with a different type of values”.

In practice this is how the Map trait would instantiate these type parameters:

trait Map[K, +V] extends MapOps[K, V, Map, [V0] => Map[K, V0], Map[K, V]]

As you can see, one problem is that we have no support for partial type application (I wish I could replace [V0] => Map[K, V0] with just Map[K, _]), and, furthermore, in Scala we don’t have type lambdas, which means that we would have to define something like this:

trait PartialAp[TC2[X, Y], A] {
  type TC[B] = TC2[A, B]
}

And then:

trait Map[K, +V] extends MapOps[K, V, Map, PartialAp[Map, K]#TC, Map[K, V]]

Any thoughts, @odersky, @Ichoran, @DarkDimius, @szeiger?

I’m thinking of an alternative design that would not have all these type parameters, and, consequently, that would not hit these issues… (but it will have other trade offs…)

Allow offline mode with Dotty

sbt offline mode is currently broken because of the way we resolve Dotty:

$ sbt -offline
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024M; support was removed in 8.0
[info] Loading global plugins from /Users/szeiger/.sbt/0.13/plugins/project
[info] Loading global plugins from /Users/szeiger/.sbt/0.13/plugins
[info] Loading project definition from /Users/szeiger/code/collection-strawman/project
Fetching latest Dotty nightly version (requires an internet connection)...
java.net.UnknownHostException: repo1.maven.org
       	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
       	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
       	at java.net.Socket.connect(Socket.java:589)
       	at java.net.Socket.connect(Socket.java:538)
       	at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
       	at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
       	at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
       	at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
       	at sun.net.www.http.HttpClient.New(HttpClient.java:308)
       	at sun.net.www.http.HttpClient.New(HttpClient.java:326)
       	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
       	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
       	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:989)
       	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:987)
       	at java.security.AccessController.doPrivileged(Native Method)
       	at java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:782)
       	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:986)
       	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
       	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1513)
       	at sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:90)
       	at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1433)
       	at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1431)
       	at java.security.AccessController.doPrivileged(Native Method)
       	at java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:782)
       	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1430)
       	at java.net.URL.openStream(URL.java:1045)
       	at scala.io.Source$.fromURL(Source.scala:140)
       	at scala.io.Source$.fromURL(Source.scala:130)
       	at com.felixmulder.dotty.plugin.DottyPlugin$autoImport$.dottyLatestNightlyBuild(DottyPlugin.scala:18)
       	at $54d02c4975ea6c60a7d6$$anonfun$$sbtdef$1.apply(/Users/szeiger/code/collection-strawman/build.sbt:3)
       	at $54d02c4975ea6c60a7d6$$anonfun$$sbtdef$1.apply(/Users/szeiger/code/collection-strawman/build.sbt:3)
       	at sbt.Init$Value$$anonfun$apply$13.apply(Settings.scala:609)
       	at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$constant$1.apply(INode.scala:163)
       	at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$constant$1.apply(INode.scala:163)
       	at sbt.EvaluateSettings$MixedNode.evaluate0(INode.scala:175)
       	at sbt.EvaluateSettings$INode.evaluate(INode.scala:135)
       	at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$submitEvaluate$1.apply$mcV$sp(INode.scala:69)
       	at sbt.EvaluateSettings.sbt$EvaluateSettings$$run0(INode.scala:78)
       	at sbt.EvaluateSettings$$anon$3.run(INode.scala:74)
       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       	at java.lang.Thread.run(Thread.java:745)

Collections meeting 06/27

Topics for the next meeting, feel free to add more:

  • Should we provide buffering transformations on Iterator (as implemented in #96) or only on Iterable (as in the 2.12 collections)?
  • Empty parameter list vs no parameter list? We should make a decision based on the direction Dotty will take. Also consider Java interop.
  • Implicit builders #112 vs #114.
  • Type inference issue for collect (see #117)
  • Merge Vector (#115)
  • Merge Traversable and Iterable operations (#119 and #120)
  • Runtime error with Dotty only in #48
  • Non-laziness of #::

Separating interfaces and implementations

Context

XxxOps types provide convenient default implementations for most methods. Some of them rely on protected coll, fromIterable and fromSpecificIterable members (example). Often the coll member’s type is refined so that more complex operations can be implemented (see here, and there). For convenience, this member is implemented in Iterable, as follows:

def coll: this.type = this

Problems

  • When we refine the type of coll to be C or CC[A], we then have to put upper bounds on these C and CC type parameters. Then these upper bounds show up everywhere and make the code less readable (example) ;
  • The signature of fromIterable and fromSpecificIterable is impossible to implement for specialized Views (like IndexedView). See this comment for more details. A derived consequence of this is that transformation operations on IndexedView return a View unless they are explicitly overriden (example).

Proposal

  • Having traits with no implemented methods would make it possible to remove the bounds on the type parameters. (But that might just delay the problem because if we want to have reusable implementations inherited by concrete collections then those reusable implementations will have the bounds)
  • I think that another consequence will be that Java collection’s decorators and Views will be better integrated (e.g. it will be possible to make IndexedView[A] extend Seq[A], and it will be simpler to specialize its operations’ return type to IndexedView[A]).

I’m thinking of giving a try to this idea. But before I start, do you have any comment?

Add "Generator" / "Foreachable" types?

I don't know if this is the right place for this discussion/proposal/idea, so feel free to close this issue if it's not.

I have a Geny library that provides a Generator type, which is basically the core of Traversable: a simple data-type based around .foreach, whose operations are fused/lazy until you call a terminal operation. It is essentially the push-based dual to the existing pull-based scala.Iterators, and is similar to what java.util.Stream provides.

It's currently lacking an abstract interface similar to TraversableOnce, but one could be added that would codify the .foreach method without all the existing fused/lazy transformation methods. For the purpose of this issue let's call this hypothetical interface Foreachable.

trait Foreachable[+A]{
  def foreach(f: A => Unit): Unit
}

Would there be any interest in adding a Foreachable interface or Generator data-type to the collections library?

Where would Foreachable be useful?

Every Iterator/Iterable would be a Foreachable, but not every Foreachable would be an Iterable.

I think having a pure trait that defines foreach would be useful, as there are tons of cases where you can define a foreach method but cannot realistically define an Iterator:

  • The renderTo(sb: StringBuilder) method of a JSON AST, which can be wrapped in an iterable, but needs some non-trivial optics/lenses/paths to make it work and probably at a significant performance cost

  • The renderTo(sb: StringBuilder) method of scalatags.Text.Frag template fragments, which could in theory be wrapped in an iterable, but at a terrible performance cost, as well as requiring quite an extensive rewrite of the internal implementation to make it happen.

  • Third-party SAX parsers, or other event-based recursive-descent parsers. These cannot be easily wrapped in an iterable.

  • The streaming output of the Scala compiler, coming from the scala.tools.nsc.Global object's def compile method. This could be trivially be converted into a Traversable[String], but is impossible to convert to an Iterable[String] without extreme measures (described below)

(taken from a post on reddit I made that goes into more details why a Generator/Foreachable is not "just" an Iterable)

If I wanted to deal with the streaming results from any of these sources, I would not be able to use an Iterator/Iterable (see link above), but with a Foreachable interface I could apply that to all these data-sources and then transform/manipulate them generically just as you would transform/manipulate iterators generically. I think that would be nice.

Where would Generator be useful?

Apart from the broad generality of this Foreachable data-type (not sure you can really call it a collection, since it doesn't hold elements, but neither does Iterator) it is also useful because a fused/lazy implementation such as geny.Generator can guarantee cleanup operations run after the terminal operations are complete. For example, if I call:

val x: C[String] = readFileLines()
println(x.slice(10, 20).mkString("\n"))

To print the lines 10-20 of a particular file, if...

  • If C is an Iterator, we will leak the file handle

  • If C is a "self-closing-iterator" that is common in Scala libraries, it will still leak a file handle since those typically only close when the iterator completes. Even if you try to be clever and override slice (and take and init and ...) someone else using "raw" next/hasNext operations will likely end up leaking your file handle

  • If C is a more concrete collection (Vector, List, etc.) you can ensure that the file gets closed by the time readFileLines() returns, but it will end up loading the whole file into memory just to take the first 20 lines.

  • If C is a geny.Generator, the file will only get opened during the terminal operation mkString, which iterates-and-discards the first 10 lines, reads lines 10-20, joins them together before closing the file.


In conclusion, Foreachable/Generator provides iterator-like "streaming"/fusing/laziness together, with broader and more general applicability than Iterator/Iterable provides (use cases described above) and the potential for deterministic-cleanup (which is a perennial pain, e.g. in scala.io.Source, at least in my experience). Ammonite-Ops now uses geny.Generators as the return type for all it's lazy/"streaming" operations, so it can ensure that the proper resources get released and you don't run out of file descriptors because you did read.iter(path).take(10) to read the first 10 lines of many different files.

The Scala 2.x incantation of Traversable is not suitable for this task, because it brings along with it dozens (hundreds?) of concrete method implementations, most of which are eager/copying, build up the "entire" data-structure in memory each time, and are not compatible with e.g. the concrete fused/lazy implementation of Generator I've linked above.

WDYT? Is this something worth continuing discussing in the context of this strawman?

Explore how this library can support Java collections.

People who are using Scala it interoperate heavily with Java typically need to convert back and forth between Java and Scala collections, e.g. with JavaConverters. We should write converters for a couple of collections (maybe ArrayList and HashMap?) to shake out any pain points that might exist.

Support for Multiversal Equality

With multiversal equality in Dotty (scala/scala3#1247), it would seem inconsistent not to use it for operations based on == such as contains and indexOf, which are currently pretty unsafe:

scala> List(1,2) contains 'ko
val res3: Boolean = false

It has been agreed on before that these methods could benefit from M-E. Indeed, one can write things like:

scala> def contains[A,B](ls: List[A], b: B)(implicit eq: Eq[A,B]) = ls.exists(_ == b)
scala> contains(List(1,2), 'ko)
-- Error: <console>:5:2 --------------------------------------------
5 |  contains(List(1,2), 'ko)
  |                         ^
  |         Values of types Int^ and Symbol^ cannot be compared with == or !=

In order to make that work with current Scala, we could have a single Eq implicit available there so that contains would always work (as before).

One thing I noticed that may or may not be a problem is that adding a subtyping constraint to one of the type parameters makes the code type check again...

scala> def contains[A,B >: A](ls: List[A], b: B)(implicit eq: Eq[A,B]) = ls.exists(_ == b)
scala> contains(List(1,2), 'ko)
res0: Boolean = false

Add NonEmpty data structure to the collections

There exists third-party code that integrates with the previous Scala collections to provide a non empty collection that proves its contents are non-zero. This is useful in lots of cases and since now only Scalaz has provided this functionality out of the box.

I think this may shine on public APIs, providing more type safety. I have just found a use case for this in the sbt API (sbt/sbt#3230) but we cannot use it because it's not in the official collections. We may end up porting the implementation over to sbt/sbt.

@julienrf @szeiger What do you think?

Opt-In to partial methods on collections

I would like to propose we consider making unsafe collections operations opt-in. I understand the case for using them when you need performance, have reason to believe they will be safe, or plan on handling the error so this is not a suggestion to remove them, just to make them opt-in.

@Ichoran has suggested a possible implementation through requiring the import of a NonTotal token to use the unsafe methods.

If this proposal is accepted I would be happy to do the work related to it as well as write a scalafix rewrite to add the imported NonTotal token where required to maintain backwards compatibility. With a nice message stating that the user needs to import scala.collection.NonTotal (or Partial?) it would be low overhead for users but still give a reminder to those new to deciding when it is acceptable to use the partial methods in collections.

Views

Let's use this ticket for discussions about views. Some comments were already made in #7:

  • Should mutable views be removed entirely?
  • How about special mutable views for mutable collections?

My own 2ct on the topic so far: I never cared for mutable views (too confusing) but I like the ideas of views as reified iterator operations, as currently implemented here. These could be useful in the context of both, mutable and immutable collections. I would like to propose to take them one step further and make the evaluation order of view operations unspecified. If views are used as a lightweight abstraction for composing complex transformations, you don't generally care about side effects, but you always care about performance (otherwise you'd transform the underlying collections in multiple steps). Views could be a place where we allow operations to be fused for better performance.

FoldWhile

Recently I came across this code:

  implicit class GenTraversableOnceExtensions[A](val coll: GenTraversableOnce[A]) extends AnyVal{
    /**
    Fold while accumulation function returns Some. Stops on first None.
    @param initial initual element to start the accumulation on
    @param accumulate accumulation function
    */
    def foldWhile[A1 >: A](initial: A1)
      (accumulate: (A1, A1) => Option[A1]): A1
      = foldLeftWhile[A1](initial)(accumulate)

    /**
    Fold while accumulation function returns Some. Stops on first None.
    @param initial initual element to start the accumulation on
    @param accumulate accumulation function
    */
    def foldLeftWhile[B](initial: B)
      (accumulate: (B, A) => Option[B]): B = {
      @tailrec
      def loop(it: Iterator[A], prev: B): B = {
        if(it.hasNext){
          val next = it.next
          val current = accumulate(prev,next)
          if (current.isEmpty) prev
          else loop(it,current.get)
        } else prev
      }
      loop(coll.toIterator, initial)
    }
  }

I'm written multiple variations of an abort early fold like this one.

I think it is a sufficiently common use case with collections that should merit its inclusion the new redesign. Unfortunately I have no data to prove it is in fact a common use case.

LazyList.#:: not lazy due to parser rewriting of right-associative infix operators

Evaluating

  def bleh(i: Int) = {println(s"bleh $i"); i}
  val xs20 = bleh(1) #:: bleh(2) #:: bleh(3) #:: LazyList.empty

will immediately execute all the printlns, even though .force is never called and .evaluated remains false. If one defines a left associative prepend with the same definition as #::, then

    val x21 = LazyList.empty.prepend(bleh(3)).prepend(bleh(2)).prepend(bleh(1))

will behave in a properly lazy manner.

This is no great mystery, as the parser converts the definition of xs20 to:

 val xs20 = {   <synthetic> <artifact> val x$68 = bleh(1);
    {  <synthetic> <artifact> val x$67 = bleh(2);
  {  <synthetic> <artifact> val x$66 = bleh(3);
  LazyList.empty.$hash$colon$colon(x$66)}.$hash$colon$colon(x$67)}.$hash$colon$colon(x$68)  };

This seems to be scala/bug#7333, which was closed as a duplicate of the somewhat less general scala/bug#1980

Concatenating empty with empty leads to failed assertion on Dotty

Given

import strawman.collection.immutable

the code

LazyList.empty ++ LazyList.empty

leads to a compiler assertion error when compiling with Dotty.

The problem can be avoided by adding a type ascription on any of the empty calls or by using the apply method instead of empty, e.g. neither

LazyList.empty[Nothing] ++ LazyList.empty

nor

LazyList.empty ++ LazyList()

will trigger the assertion error.

Only Seq, List and LazyList are affected (i.e. inheritors of Seq and LinearSeq).

The stack trace of the assertion failure is:

java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:156)
	at dotty.tools.dotc.core.ConstraintHandling.approximation(ConstraintHandling.scala:225)
	at dotty.tools.dotc.core.ConstraintHandling.instanceType(ConstraintHandling.scala:261)
	at dotty.tools.dotc.core.Types$TypeVar.instantiate(Types.scala:3149)
	at dotty.tools.dotc.typer.Inferencing$.op$$anonfun$28(Inferencing.scala:255)
	at scala.compat.java8.JProcedure2.apply(JProcedure2.java:18)
	at scala.compat.java8.JProcedure2.apply(JProcedure2.java:10)
	at dotty.tools.dotc.util.SimpleMap$Map3.foreachBinding(SimpleMap.scala:103)
	at dotty.tools.dotc.typer.Inferencing$.op$80(Inferencing.scala:256)
	at dotty.tools.dotc.typer.Inferencing$.interpolate$2(Inferencing.scala:221)
	at dotty.tools.dotc.typer.Inferencing$.interpolateUndetVars(Inferencing.scala:264)
	at dotty.tools.dotc.typer.Typer.op$91(Typer.scala:1735)
	at dotty.tools.dotc.typer.Typer.op$89(Typer.scala:1733)
	at dotty.tools.dotc.typer.Typer.adapt(Typer.scala:1732)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.ProtoTypes$FunProto.typedArgs$$anonfun$1$$anonfun$1(ProtoTypes.scala:229)
	at dotty.tools.dotc.typer.ProtoTypes$FunProto.cacheTypedArg(ProtoTypes.scala:215)
	at dotty.tools.dotc.typer.ProtoTypes$FunProto.typedArgs$$anonfun$1(ProtoTypes.scala:229)
	at dotty.tools.dotc.core.Decorators$ListDecorator$.loop$5(Decorators.scala:62)
	at dotty.tools.dotc.core.Decorators$ListDecorator$.mapconserve$extension(Decorators.scala:78)
	at dotty.tools.dotc.typer.ProtoTypes$FunProto.typedArgs(ProtoTypes.scala:229)
	at dotty.tools.dotc.typer.Applications.op$58(Applications.scala:1338)
	at dotty.tools.dotc.typer.Applications.resolveOverloaded(Applications.scala:1256)
	at dotty.tools.dotc.typer.Applications.op$62(Applications.scala:1242)
	at dotty.tools.dotc.typer.Applications.resolveOverloaded(Applications.scala:1195)
	at dotty.tools.dotc.typer.Typer.adaptOverloaded$1(Typer.scala:1795)
	at dotty.tools.dotc.typer.Typer.adaptInterpolated(Typer.scala:2080)
	at dotty.tools.dotc.typer.Typer.op$91(Typer.scala:1737)
	at dotty.tools.dotc.typer.Typer.op$89(Typer.scala:1733)
	at dotty.tools.dotc.typer.Typer.adapt(Typer.scala:1732)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.typedExpr(Typer.scala:1641)
	at dotty.tools.dotc.typer.Applications.op$65(Applications.scala:643)
	at dotty.tools.dotc.typer.Applications.realApply$1(Applications.scala:641)
	at dotty.tools.dotc.typer.Applications.typedApply(Applications.scala:741)
	at dotty.tools.dotc.typer.Typer.typedUnnamed$1(Typer.scala:1526)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1572)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.typedExpr(Typer.scala:1641)
	at dotty.tools.dotc.typer.Typer.op$113(Typer.scala:608)
	at dotty.tools.dotc.typer.Typer.typedBlock(Typer.scala:600)
	at dotty.tools.dotc.typer.Typer.typedUnnamed$1(Typer.scala:1533)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1572)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.typedExpr(Typer.scala:1641)
	at dotty.tools.dotc.typer.Typer.op$98(Typer.scala:1237)
	at dotty.tools.dotc.typer.Typer.typedDefDef(Typer.scala:1219)
	at dotty.tools.dotc.typer.Typer.typedNamed$1(Typer.scala:1514)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1571)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.traverse$4(Typer.scala:1609)
	at dotty.tools.dotc.typer.Typer.typedStats(Typer.scala:1629)
	at dotty.tools.dotc.typer.Typer.op$124(Typer.scala:1319)
	at dotty.tools.dotc.typer.Typer.typedClassDef(Typer.scala:1267)
	at dotty.tools.dotc.typer.Typer.typedNamed$1(Typer.scala:1517)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1571)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.traverse$4(Typer.scala:1609)
	at dotty.tools.dotc.typer.Typer.typedStats(Typer.scala:1629)
	at dotty.tools.dotc.typer.Typer.op$120(Typer.scala:1432)
	at dotty.tools.dotc.typer.Typer.typedPackageDef(Typer.scala:1419)
	at dotty.tools.dotc.typer.Typer.typedUnnamed$1(Typer.scala:1556)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1572)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.traverse$4(Typer.scala:1620)
	at dotty.tools.dotc.typer.Typer.typedStats(Typer.scala:1629)
	at dotty.tools.dotc.typer.Typer.op$120(Typer.scala:1432)
	at dotty.tools.dotc.typer.Typer.typedPackageDef(Typer.scala:1419)
	at dotty.tools.dotc.typer.Typer.typedUnnamed$1(Typer.scala:1556)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1572)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.traverse$4(Typer.scala:1620)
	at dotty.tools.dotc.typer.Typer.typedStats(Typer.scala:1629)
	at dotty.tools.dotc.typer.Typer.op$120(Typer.scala:1432)
	at dotty.tools.dotc.typer.Typer.typedPackageDef(Typer.scala:1419)
	at dotty.tools.dotc.typer.Typer.typedUnnamed$1(Typer.scala:1556)
	at dotty.tools.dotc.typer.Typer.typedUnadapted(Typer.scala:1572)
	at dotty.tools.dotc.typer.Typer.op$111(Typer.scala:1587)
	at dotty.tools.dotc.typer.Typer.typed(Typer.scala:1585)
	at dotty.tools.dotc.typer.Typer.typedExpr(Typer.scala:1641)
	at dotty.tools.dotc.typer.FrontEnd.typeCheck$$anonfun$1(FrontEnd.scala:64)
	at scala.compat.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at dotty.tools.dotc.typer.FrontEnd.monitor(FrontEnd.scala:32)
	at dotty.tools.dotc.typer.FrontEnd.typeCheck(FrontEnd.scala:68)
	at dotty.tools.dotc.typer.FrontEnd.runOn$$anonfun$7(FrontEnd.scala:93)
	at scala.compat.java8.JProcedure1.apply(JProcedure1.java:18)
	at scala.compat.java8.JProcedure1.apply(JProcedure1.java:10)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at dotty.tools.dotc.typer.FrontEnd.runOn(FrontEnd.scala:93)
	at dotty.tools.dotc.Run.$anonfun$$anonfun$13(Run.scala:82)
	at scala.compat.java8.JProcedure1.apply(JProcedure1.java:18)
	at scala.compat.java8.JProcedure1.apply(JProcedure1.java:10)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at dotty.tools.dotc.Run.compileUnits$$anonfun$1(Run.scala:90)
	at scala.compat.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at dotty.tools.dotc.util.Stats$.monitorHeartBeat(Stats.scala:76)
	at dotty.tools.dotc.Run.compileUnits(Run.scala:95)
	at dotty.tools.dotc.Run.compileSources(Run.scala:64)
	at dotty.tools.dotc.Run.compile(Run.scala:48)
	at dotty.tools.dotc.Driver.doCompile(Driver.scala:26)
	at dotty.tools.dotc.Driver.process(Driver.scala:124)
	at xsbt.CachedCompilerImpl.run(CompilerInterface.scala:63)
	at xsbt.CachedCompilerImpl.run(CompilerInterface.scala:53)
	at xsbt.CompilerInterface.run(CompilerInterface.scala:37)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:107)
	at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:53)
	at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
	at sbt.compiler.MixedAnalyzingCompiler$$anonfun$compileScala$1$1.apply$mcV$sp(MixedAnalyzingCompiler.scala:50)
	at sbt.compiler.MixedAnalyzingCompiler$$anonfun$compileScala$1$1.apply(MixedAnalyzingCompiler.scala:50)
	at sbt.compiler.MixedAnalyzingCompiler$$anonfun$compileScala$1$1.apply(MixedAnalyzingCompiler.scala:50)
	at sbt.compiler.MixedAnalyzingCompiler.timed(MixedAnalyzingCompiler.scala:74)
	at sbt.compiler.MixedAnalyzingCompiler.compileScala$1(MixedAnalyzingCompiler.scala:49)
	at sbt.compiler.MixedAnalyzingCompiler.compile(MixedAnalyzingCompiler.scala:64)
	at sbt.compiler.IC$$anonfun$compileInternal$1.apply(IncrementalCompiler.scala:160)
	at sbt.compiler.IC$$anonfun$compileInternal$1.apply(IncrementalCompiler.scala:160)
	at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:66)
	at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:64)
	at sbt.inc.IncrementalCommon.cycle(IncrementalCommon.scala:32)
	at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:72)
	at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:71)
	at sbt.inc.Incremental$.manageClassfiles(Incremental.scala:99)
	at sbt.inc.Incremental$.compile(Incremental.scala:71)
	at sbt.inc.IncrementalCompile$.apply(Compile.scala:54)
	at sbt.compiler.IC$.compileInternal(IncrementalCompiler.scala:160)
	at sbt.compiler.IC$.incrementalCompile(IncrementalCompiler.scala:138)
	at sbt.Compiler$.compile(Compiler.scala:155)
	at sbt.Compiler$.compile(Compiler.scala:141)
	at sbt.Defaults$.sbt$Defaults$$compileIncrementalTaskImpl(Defaults.scala:886)
	at sbt.Defaults$$anonfun$compileIncrementalTask$1.apply(Defaults.scala:877)
	at sbt.Defaults$$anonfun$compileIncrementalTask$1.apply(Defaults.scala:875)
	at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
	at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
	at sbt.std.Transform$$anon$4.work(System.scala:63)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
	at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
	at sbt.Execute.work(Execute.scala:237)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
	at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
	at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Arity of map method of Map collection type

val kvs: Map[Int, String] = null

// This one works fine
val kvs1 = kvs.map((k, v) => (v, k))
val kvs2: Map[String, Int] = kvs1

// This one does not
val kvs3 = for ((k, v) <- kvs) yield (v, k)
val kvs4: Map[String, Int] = kvs3
//                           ^
// type mismatch;
//   found   : strawman.collection.Iterable[(String, Int)]
//   required: strawman.collection.Map[String,Int]

This is because Map[K, V] extends Iterable[(K, V)], and thus inherits a map[B](f: ((K, V)) => B): Iterable[B] method, which is used by the for expression.

Map introduces an overload of map with the following signature: map[K2, V2](f: (K, V) => (K2, V2)): Map[K2, V2], but the fact that this overload takes a binary function as parameter rather than a unary function (whose argument type is a tuple) makes it impossible to use it in for expressions.

Make List.tail a final (val)

Hello,

I was just wondering whether this has ever been discussed for changing in the new implementation, sorry if it has been, but I can't find an issue:

final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B]

That var makes List a thread unsafe data-type, breaking users expectations, because it's only final values that guarantee that that tl won't be seen as null after the constructor of :: has finished. Users imo expect final values and thread-safety. The ListBuffer trick is cool, but it's not worth it.

Define steps to reach a Go / No-Go decision on adopting this strawman

This proposal began as a strawman, a possible new design for collections. But we're starting to work on it pretty seriously. To avoid wasting our effort, we should define those things we need to check before we can decide whether the strategy is a success and we should attempt to replace the original collections library with this one. (We then have another go/no-go decision about whether we did well enough for us to be able to do that.)

I think we've already demonstrated these things adequately:

  • Superficially identical syntax for common operations
  • A reasonable hierarchy exists for doing things this way (may need tweaking, but it won't just fail)
  • Specialization can be added in where we want it
  • Both mutable and immutable base collections behave sensibly
  • Wrapping arrays as a strawman-collection works fine

Maybe I've just missed it but if not we should come up with a complete list of the things we want to check before we dive in with full effort. These occur to me right now (assuming I haven't missed any of the work which has already been done, which is possible):

  • Can we support String operations?
  • Can we implement Map elegantly? Are maps regular collections, or their own thing which gets boxed as a collection?
  • How do we handle things like BitSet#map and TreeMap#map?
  • Are there any hitches for weird collection types like PriorityQueue?

In particular, we should try to find the places where CBF is really pulling its weight and see what we can do in this system to copy the behavior, or at least do something sound.

Basic operations on mutable vs immutable collections

While working on a blog post about the new collections design I came up with this snippet and realized that we don't have a better solution so far:

var s1 = collection.immutable.Vector[Int](1)
s1 ++= Seq(2)
val s2 = collection.mutable.ArrayBuffer[Int](1)
s2 ++= Seq(2)

The first one is a compound assignment operator, the second one a real ++= operator. Keeping inheritance hierarchies separate and having separate implementations doesn't help because the two cases are already different and only appear the same syntactically.

Should we try to make these calls look different from each other (which creates additional migration effort)?

How could it be done? One option is to remove all methods ending in = from mutable collections on the grounds that if an operation is not a reassignment to a variable it shouldn't look like one. What should they be called instead? ++ is not an option if mutable and immutable collections share common supertypes. We'd have to keep mutable collections completely separate and require that users get a quasi-immutable view even for basic Iterable operations.

Provide descriptive names as alternatives to the most commonly used symbolic method names

As outlined in http://docs.scala-lang.org/style/naming-conventions.html the use of symbolic method names should be avoided.

I suggest implementing descriptive alternatives to the most commonly used symbolic method names used in the Scala Collections API. I understand that this would add more methods to the API, but I strongly believe it would make the API more accessible for people without an extensive functional background.

As an example, Map already now provides updated method that is almost synonymous to +.

Similarly ++ in Seq (or its super class) could be called concat
As another example :+ in Seq could be called append

Migration

We should spend some time in Q2 on automatic migration of user code to the new collections. The main tool will probably be ScalaFix but JetBrains' Migrations API should also be considered.

Migration could either be performed by an IDE or an automated tool. We need to perform tests to see how much manual work is still required after the automatic migration on some popular Scala code bases.

Easy upcasting to leave out undesired overloads

I’m creating this issue to keep track of the idea proposed in #24 (comment) by @Ichoran.

I think that could be useful to have such methods, for instance a TreeSet[A] could have a asSet: Set[A] making it easier to then call the right map overload (the one that does not require an implicit Ordering as parameter).

Recommended way to produce collections

What’s the recommended way to produce collections?

The default collections are immutable collections. When we present how to use them we mention the :: operator (for List) to construct a new collection. Constructing a List containing the n first fibonacci numbers would then look like this:

def fibonacci(n: Int): List[Int] = {
  def loop(remaining: Int, prev: Int, prevPrev: Int, result: List[Int]): List[Int] =
    if (remaining > 0) {
      val value = prev + prevPrev
      loop(remaining - 1, value, prev, value :: result)
    } else result
  loop(n - 2, 1, 0, List(1, 0)).reverse
}

But then one might ask: would it be possible to construct the sequence in the right order from the beginning?

One way to do that is to use Builders:

def fibonacci(n: Int): List[Int] = {
  val builder = List.newBuilder[Int]
  builder += 0
  builder += 1
  def loop(remaining: Int, prev: Int, prevPrev: Int): Unit = {
    if (remaining > 0) {
      val value = prev + prevPrev
      builder += value
      loop(remaining - 1, value, prev)
    } else ()
  }
  loop(n - 2, 1, 0)
  builder.result()
}

But then mixing immutable and mutable programming style feels a bit awkward… We could go all mutable but then that would contrast with the general advice of going immutable first.

We can consider several alternative solutions, like having an immutable Tsil that would append elements instead of prepending them, but then the resulting collection would require precautions (e.g. we should avoid to use head/tail decomposition on it).

Otherwise, we could use a collection implementation that provides good performance for both +: and :+, like Vector. But that seems perfectible…

I think that it would be great to provide a general API to efficiently construct all kind of collections. The user-facing API would use immutable style, but the underlying implementation would use specific mutable collections (e.g. ListBuffer to build a List, etc.).

It seems that what I am looking for are “unfolds”, but I’m wondering if people tried other alternatives or have any advice to address this problem? Does someone know a nice implementation of unfolds? (I mean, a lot of variants could make sense: should we discard the final state or not? should we support the production of several elements in a single step? conversely, should we support not producing an element yet without stopping the process? etc.)

What should go in an iterator

This issue is intended to bundle the discussion what should go in an iterator. We had several discussions on PRs before, and Alexandru Nedelcu has written a blog post about some closely related issues:

https://alexn.org/blog/2017/01/16/iterator.html

There's two issues he brings up. I want to concentrate on the second here. He writes:

Well, at this point you should be thinking that this violates the principles of OOP design. When Iterator comes with operations like map and filter, that are polymorphic and can be overridden, it is no longer just a minimal protocol for “iterating over things”, but a big, fat interface.

I think things are not so simple, and the alternatives he quotes are not necessarily better. The first question to ask is:

Do iterators need all these operations?

I think the answer is yes, there is lots of code out there which uses them, and generally users appreciate the convenience that the same methods work everywhere. That's the strength of Scala collections.

[One side remark to illustrate: I observed many newcomers use collections and they were generally delighted by the consistency of the interface. Until they hit sliding and grouped - these are methods that are available only on iterators, not on iterables. The delight turned into disgust - how could one let such an obvious inconsistency persist? So, there's a huge value attached to have the same, rich set of operations work on all collection-like types].

The next question is how to implement these operations:

  1. We could just implement the methods directly or use a mixin trait, as we do now. Then the blog notes that all iterator-yielding methods would have to be overridden if we want them to return a richer iterator such as a CloseableIterator. There are two possible ways to address the problem:

    1. Ignore it. Leave map and friends as they are, returning an iterator. That means if you want
      a closeable iterator, don't use map and friends.
    2. Add machinery as we have for Iterables where map and friends return automatically the
      same kind of Iterable as the one on which they are called.
  2. We could use a "typeclass", i.e. a decorator value class to provide the methods externally. But I don't see how that solves the problem. If we add CloseableIterator then we have the following choices:

    1. Do nothing. Then map and friends will still return Iterators, exactly as before.
    2. Implement another decorator for CloseableIterator. Then we still have to re-implement the same number of transformations as before. Furthermore, we trade dynamic for static dispatch, which means we lose abstraction.

So I don't see the type class approach give us any advantages over plain method definitions here.

My tendency would be to do 1.1., i.e. ignore the problem on the Iterator level. We have much better machinery now to deal with the problem using views. In fact, I can trivially write a CloseableView like this:

trait CloseableView[+A] extends View[A] with IterableLike[A, CloseableView] {
  def close(): Unit
}

And everything works out of the box. In particular, this works:

val xs: CloseableView[Int]
val ys = xs.map(_ + 1)
val zs: CloseableView[Int] = ys

This shows that types of tranforms are indeed preserved on CloseableViews. So, given that we have this nice machinery in views, maybe we should just encourage everyone to use them for fancy stuff and use iterators for basic functionality. In particular when thinking about specializations, one should favor specializing views over specializing iterators.

Extension Methods without CanBuildFrom?

(Maybe I'm late to the party on this. I've been a bit checked out from mailing lists and such. Sorry if so!)

I'm curious about the future of ergonomic generic extension methods in the new collections library, at least if CanBuildFrom is removed entirely.

For instance, in our codebase, we've introduced a method flatCollect, which we frequently find useful. (I noticed collect's missing, I assume that's just because it's relatively uninteresting, in terms of design.) It has the obvious meaning, with signature:

def flatCollect[Result, B](f: PartialFunction[A, TraversableOnce[B]])
                              (implicit cbf: CanBuildFrom[Coll, B, Result]): Result 

We also have toMultiMap, which is an efficient implementation of xs.groupBy(_.a).eagerMapValues(_.map(_.b)) (as mentioned in #42 but using our own eagerMapValues because of frequent unintended consequences of the lazy variant).

implicit class ScEnrichColl[Coll <: Traversable[_], A, B](val __this: Coll)(implicit view: Coll <:< Traversable[(A, B)]) {
  def toMultiMap[Result](implicit cbf: CanBuildFrom[Coll, B, Result]): Map[A, Result] = ???
}

Maybe I'm being insufficiently imaginative, but it's hard for me to see how to implement them in the new approach where they maintain the same level of ergonomics as before. It's possible it's worth the reduction in complexity, but it makes me somewhat sad.

One answer is "use views and use to/FromIterable" at the end, but I don't think that handles the toMultiMap case particularly nicely. Another is to have CBF, but only for these cases. (This can be largely done via an external library, possibly.) Is there another?

A more sensible `groupBy` API?

This is the result of some personal experimentations. Please let me know if it's not appropriate to post that here.

Introduction

The groupBy method is very powerful and pretty useful, but it often suffers from inefficiencies because of the way its API is specified:

  • It is supposed to return an immutable Map. While sometimes useful, it's often not what one needs. Many uses of groupBy end up mapping or simply iterating on the result, in which case having a Map early is counter-productive.
    Notice that the standard implementation of groupBy constructs a mutable.Map and then rebuilds it into an immutable.Map, a final step that is often redundant.
    Proposal: Make the method return an Iterable of the same type as the current collection, and let the user decide whether to call toMap on it or not. One can also write something like xs.view.groupBy(f) which returns an immutable view into the internal mutable.Map used by groupBy, avoiding any unnecessary intermediate container (note that the call to groupBy is still strict nonetheless).

  • A very common use-case of groupBy involves transforming the values after having grouped them. Doing this as a second step as in xs.groupBy(_.a).mapValues(_.map(_.b)) is wasteful as it creates needless intermediate collections.
    Proposal: Have an optional parameter to specify the value mapping to perform while grouping values. As long as this parameter has a default value, it should not even break the current API, as both groupBy(f) and groupBy(f,g) would be valid.
    Note: after implementing this, I discovered that user @balefrost had already mentioned the problem, and that it turns out that's how it's already done in C#.

  • With the new design based on the Iterable hierarchy, it is currently not possible to directly build the buckets of the grouped elements because there is no way to obtain a builder of the current collection. Although it should be possible eventually, I'm not even sure it makes much sense to have the buckets mirror the original collection type anyway. In this proposal groupBy simply returns buckets as View[V].

Implementation

I have implemented the traditional groupBy API (but it's currently suboptimal because of the lack of newBuilder), along with my proposal for a modified API called viewBy, the signature of which is:

def viewBy[K,V](fk: A => K, fv: A => V = (a:A) => a): C[(K, View[V])]

Benchmark

To show an example where one only wants to iterate over the result of groupBy, I implemented the following equivalent algorithms (where xs:Iterable[(Int,java.lang.String)]):

def groupBy(): Any = {
  var sum = 0
  xs.groupBy(_._1).mapValues(_.map(_._2)).foreach {
    case (k,v) => sum += k * v.foldLeft(0)(_ + _.length)
  }}
def groupBy_sensible(): Any = {
  var sum = 0
  xs.viewBy(_._1,_._2).foreach {
    case (k,v) => sum += k * v.foldLeft(0)(_ + _.length)
  }}
def groupBy_sensView(): Any = {
  var sum = 0
  xs.view.viewBy(_._1,_._2).foreach {
    case (k,v) => sum += k * v.foldLeft(0)(_ + _.length)
  }}

My results are as follows:

groupby

groupBy_sensible and groupBy_sensView are often significantly faster than the other implementations, up to twice faster for smaller collections.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.