Comments (18)
I'm not really familiar with the current way the strawman works, but the way I imagined this fitting in, relative to how things worked in Scala 2.x (let's call this imaginary world lihaoyi-strawman collections) would be:
-
TraversableOnce
stays more-or-less unchanged, withIterator
still inheriting from it -
Traversable
's concrete implementations ofmap
filter
etc (which in Scala 2.x all create concrete copies using builders) get moved down the hierarchy intoSeq
or some other trait. -
Traversable
is now left withforeach
, and abstract methods likemap
andfilter
. Thus in my head,Traversable
would become the same asForeachable
described in the initial issue. -
Seq
, or some other class down the hierarchy, override these to use their current (in Scala 2.x) builder-taking implementations -
Generator
is now a subclass ofTraversable
, and overrides the abstract methods likemap
andfilter
etc. with "lazy"/fusing implementations like those I've implemented ingeny.Generator
Thus, in lihaoyi-strawman collections, Traversable
is now a mostly-abstract interface similar to what TraversableOnce
is in Scala 2.x collections. The logic of implementing a lazy, fusing, possibly resource-clean-upping Traversable
is left to the Generator
implementation.
In lihaoyi-strawman collections,
-
If you want something with lazy/fusing operations that may need resource cleanup, you ask for a
Generator
. -
A method asking for a
Traversable
means it does not care whether you get aSeq
orIterable
orGenerator
-
A method asking for a
TraversableOnce
does not care if you get aSeq
orIterable
orGenerator
orIterator
Now, that's only what I had in my head as lihaoyi-collections, abstractly, using the Scala 2.x collections as a baseline. I'm not sure what parts of that would or wouldn't work in practice, nor do I know how it would fit into the "new world" of collection-strawman with your decorators and re-organized trait hierarchy and other things I'm not familiar with =P
from collection-strawman.
Having Traversable
and TraversableOnce
at the top of the hierarchy wouldn't hurt the new design. Ideally we'd keep them around deprecated for 2.13 to provide a smoother migration path but I don't think we have the necessary features to prevent deprecation warnings from leaking into all kinds of code that isn't deprecated.
from collection-strawman.
I would like to bump this discussion since I think the push vs pull discussion (or producer-owned vs consumer-owned stack) is something that concerns a lot of communities!
For instance, Rust went from push to pull in 2013 [1] and [2]. Recently claims for their inefficiencies re-emerged as discussed in Rust’s iterators are inefficient emphasizing that there is still room for improvement with internal iterators. Databases went from pull to push and Spark is doing as well [3]. Java has push, C# has pull-y LINQ and F# has also pull-y Seq
.
Push-based designs may be able to produce stellar performance in some cases when method inlining succeeds for cases of maps, filters, sum of squares demonstrating stream fusion (sufficiently small method sizes needed for inlining to succeed (relevant flags on the JVM MaxInlineSize
, MinInliningThreshold
, etc)). At the Java side, John Rose, on his perspectives on streams performance mail, wrote about j.u.stream
as a retrospection:
External iterators are easier to optimize.
(pull)
HotSpot are less good at internal iterators. If the original point of the user request fails to inline all the way into the internal looping part of the algorithm (a hidden "for" loop), the quality of the loop will be very poor. (Exception: With micro-benchmarks, the loop quality can be partially recovered by relying on a pure, clean profile. But with real system code, the hidden "for" loop will have a polluted profile.) This is the problem I have referred to as "loop customization", even though it applies to non-looping code as well (as long as there is a template that needs expanding in order to gain performance).
(push)
We can carefully study that and take under consideration his points 1, 3 and 4 for our benchmarks.
From my experience, when it comes to parallel loops (i.e., zip
) the implementation must transform both streams into pull-based in order to accommodate proper termination checks (in the presence oftake
operators, short-ranging the stream) and to maintain proper lazy characteristics.
Last week I created a benchmarking suite porting some of our benchmarks from our strymonas project (for the record it is a staged library based on a pull-based design with elements of push-based composition. Since we are not talking about a solution based on partial evaluation here I am not going into details).
I ported some of our benchmarks in stream-benchmarks just to motivate comparisons between the new and the old design for lazy View
s. My solely purpose is to initiate discussion. Here are some very quick results (without fixing my CPU clock and performing three forked-vm executions). Baselines are hand-written fused loops, semantically equivalent with pipelines. Views are the old ones, Strawman are the new ones. I have also included a pull- and a push-based reference implementation (which captures the essence of Java 8 streams as well) to compare them quickly side by side. JVM version (1.8.0_65-b17), scalac (2.12.1) and TieredCompilation off. I am using Long
integers everywhere.
We can discuss further anything you would recommend in terms of benchmarking. If you have any request, proposal, fix, etc for lazy collections benchmarks (for ArrayViews etc) let me know here #62.
from collection-strawman.
Example:
trait External[A] {
def hasNext: Boolean
def next: A
}
trait Internal[A] extends External[A] {
def foreach[U](f: A => U) { while(hasNext) f(next) }
}
from collection-strawman.
I think it would definitely be worth discussion. We have already started on reddit, but for more details here is the right place.
Here are some initial thoughts:
- There are fewer operations that make sense on Traversable than on Iterable
- Operations should have different implementations
- Both of these things mean that the "new" Traversable will be hard to integrate in a collection hierarchy: If we implement traversable ops as methods on Traversable, we have to override them all on Iterable, and we are back to the fat classes mess that we want to get away from. If we implement Traversable ops as decorators, we get a better fit, but it means that we lose runtime overriding so when the static type is Traversable, all operations use the Traversable implementations which might be less efficient than the Iterable ones.
- So maybe a separate type is better, even though it means it will be a huge pain to keep operations in sync between Iterable and Traversable.
from collection-strawman.
What operations would you define on Traversable
and TraversableOnce
? We'd either have to stick to a small set of methods or require lots of reimplementation for your Generator
type.
How do you distinguish between terminal and non-terminal operations? Java 8 Streams makes this very explicit but Scala collections don't. For example, does map
on a Traversable
return a lazy view or a proper collection?
from collection-strawman.
What operations would you define on Traversable and TraversableOnce? We'd either have to stick to a small set of methods or require lots of reimplementation for your Generator type.
I'd expect to see most of the terminal operations (mkString
, sum
, reduce
, fold
, ...) implemented on Traversable
/TraversableOnce
, since those can be implemented the same way on both Generator
and all the traditional collections using foreach
or similar. I'd expect to see the transformations (map
, filter
, collect
, ...) defined but not implemented, so they can be implemented in different ways (lazy vs eager-with-builder) on Generator
and traditional collections
How do you distinguish between terminal and non-terminal operations?
Good question. I don't have an answer here, but how does the distinction between scala.Iterator
/scala.Stream
and the rest of the collections work in collections-strawman? Iterator
/Stream
also have this "terminal operation" distinction where some things force (and even exhaust, for Iterator
) the computation while others just build up the transformation-chain. If there's some mechanism that collections-strawman uses to distinguish terminal operations for those two, maybe we could do the same thing
from collection-strawman.
I'd expect to see most of the terminal operations (
mkString
,sum
,reduce
,fold
, ...) implemented onTraversable
/TraversableOnce
, since those can be implemented the same way on bothGenerator
and all the traditional collections usingforeach
or similar.
Having basic implementations based on foreach
is what we do in the current collections library but we should avoid that in the new design because it is inefficient. It may be faster to have two separate implementations for Generator
and Iterable
but there's still the risk of losing optimizations because some calls are not guaranteed to be monomorphic anymore. We should have benchmarks to evaluate the performance impacts of such a change.
but how does the distinction between
scala.Iterator
/scala.Stream
and the rest of the collections work in collections-strawman?
It's not made explicit but seems to come down to using "Ops" classes for terminal operations (but there are also operations that do not force a result) and "MonoTransforms" / "PolyTransforms" classes for transformations (which are lazy for lazy collection types).
from collection-strawman.
I have a Fastring library, which depends on Traversable
as well.
from collection-strawman.
I guess a @inline final def foreach
on Iterable
would resolve the performance issue related to monomorphism. It's not necessary to delete Traversable
from collection-strawman.
Thanks @biboudis for your comment.
Can you also comment on the following?
Every Iterator/Iterable would be a Foreachable, but not every Foreachable would be an Iterable.
In other words, can we safely turn a pull-based collection into a push-based one, and vice versa?
Do you think it is a good idea to have one extend the other or should we model them with unrelated types?
from collection-strawman.
You can't go back and forth. External iteration is trivially internalizable. Internal iteration is inscrutable (save for things like CPS). You can fake the API with caching, in some cases, but it's a lousy solution.
from collection-strawman.
Nice question and accurate answer by @Ichoran! Totally agree.
Without some way to play the JITter yourself (like partial evaluation) you can't do that in an optimized way (maybe not with reasonable effort). You can go from push to pull with caching/buffering and you have to find an elegant and fast solution to short-circuiting on top of that! Check the still open issue in java world JDK-8075939 and also check that this is still unresolved.
Stream.of(new Long[]{1L,2L})
.flatMap(x -> Stream.iterate(0L, i -> i + 2).map(y -> x * y))
.iterator()
.hasNext();
}
Defunctionalizing Push Arrays shows a unified solution but this cost is noted!
Converting a Pull array to a Push array is cheap and is also subject to fusion. [..] Converting a Push array to a Pull array, however, requires computing and storing all elements to memory [..] Few such conversions is likely to be better.
from collection-strawman.
not happening for 2.13. perhaps the idea still has a future. regardless, closing the ticket since we are in the process of closing out this repo.
from collection-strawman.
@SethTisue Does "not happening" mean "not removal of Traversable
"?
from collection-strawman.
No, it means “no re-introduction of Traversable
”.
from collection-strawman.
Is there any difficulty to keep source-level backward compatibility here?
from collection-strawman.
Traversable
is defined as a deprecated alias to Iterable
.
from collection-strawman.
Related Issues (20)
- TraversableOnce is missing
- Rename `to` to `as` HOT 10
- `seq` operation has been removed HOT 3
- Rewrite `--` operation to `&~` or `diff` HOT 1
- Rewrite `retain` to `filterInPlace`
- ArrayOps does not have a `view` operation anymore HOT 1
- Modularize the migration rewrite rules HOT 1
- Reorganize repositories
- Bring back Iterator.duplicate
- Set.apply / Map.apply don't use SetN / MapN
- We need dev-oriented documentation for M4 HOT 2
- ArrayOps doesn’t provide a sliding operation
- Make scala.Iterable[+A] = scala.collection.immutable.Iterable[A] HOT 8
- Setup collections-compat release infra
- add `()` to `toArray` HOT 1
- resolve status of `scala.collection.immutable.useBaseline` HOT 6
- IterableOnce.scala: def toSet[B >: A]: immutable.Set[B] should be def toSet: immutable.Set[A] HOT 2
- Main criticism HOT 3
- close or transfer all open issues and archive the repository HOT 1
- Delete and archive Gitter room HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from collection-strawman.