Giter Site home page Giter Site logo

Comments (18)

lihaoyi avatar lihaoyi commented on June 15, 2024 2

I'm not really familiar with the current way the strawman works, but the way I imagined this fitting in, relative to how things worked in Scala 2.x (let's call this imaginary world lihaoyi-strawman collections) would be:

  • TraversableOnce stays more-or-less unchanged, with Iterator still inheriting from it

  • Traversable's concrete implementations of map filter etc (which in Scala 2.x all create concrete copies using builders) get moved down the hierarchy into Seq or some other trait.

  • Traversable is now left with foreach, and abstract methods like map and filter. Thus in my head, Traversable would become the same as Foreachable described in the initial issue.

  • Seq, or some other class down the hierarchy, override these to use their current (in Scala 2.x) builder-taking implementations

  • Generator is now a subclass of Traversable, and overrides the abstract methods like map and filter etc. with "lazy"/fusing implementations like those I've implemented in geny.Generator

Thus, in lihaoyi-strawman collections, Traversable is now a mostly-abstract interface similar to what TraversableOnce is in Scala 2.x collections. The logic of implementing a lazy, fusing, possibly resource-clean-upping Traversable is left to the Generator implementation.

In lihaoyi-strawman collections,

  • If you want something with lazy/fusing operations that may need resource cleanup, you ask for a Generator.

  • A method asking for a Traversable means it does not care whether you get a Seq or Iterable or Generator

  • A method asking for a TraversableOnce does not care if you get a Seq or Iterable or Generator or Iterator


Now, that's only what I had in my head as lihaoyi-collections, abstractly, using the Scala 2.x collections as a baseline. I'm not sure what parts of that would or wouldn't work in practice, nor do I know how it would fit into the "new world" of collection-strawman with your decorators and re-organized trait hierarchy and other things I'm not familiar with =P

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024 2

Having Traversable and TraversableOnce at the top of the hierarchy wouldn't hurt the new design. Ideally we'd keep them around deprecated for 2.13 to provide a smoother migration path but I don't think we have the necessary features to prevent deprecation warnings from leaking into all kinds of code that isn't deprecated.

from collection-strawman.

biboudis avatar biboudis commented on June 15, 2024 1

I would like to bump this discussion since I think the push vs pull discussion (or producer-owned vs consumer-owned stack) is something that concerns a lot of communities!

For instance, Rust went from push to pull in 2013 [1] and [2]. Recently claims for their inefficiencies re-emerged as discussed in Rust’s iterators are inefficient emphasizing that there is still room for improvement with internal iterators. Databases went from pull to push and Spark is doing as well [3]. Java has push, C# has pull-y LINQ and F# has also pull-y Seq.

Push-based designs may be able to produce stellar performance in some cases when method inlining succeeds for cases of maps, filters, sum of squares demonstrating stream fusion (sufficiently small method sizes needed for inlining to succeed (relevant flags on the JVM MaxInlineSize, MinInliningThreshold, etc)). At the Java side, John Rose, on his perspectives on streams performance mail, wrote about j.u.stream as a retrospection:

External iterators are easier to optimize.

(pull)

HotSpot are less good at internal iterators. If the original point of the user request fails to inline all the way into the internal looping part of the algorithm (a hidden "for" loop), the quality of the loop will be very poor. (Exception: With micro-benchmarks, the loop quality can be partially recovered by relying on a pure, clean profile. But with real system code, the hidden "for" loop will have a polluted profile.) This is the problem I have referred to as "loop customization", even though it applies to non-looping code as well (as long as there is a template that needs expanding in order to gain performance).

(push)

We can carefully study that and take under consideration his points 1, 3 and 4 for our benchmarks.

From my experience, when it comes to parallel loops (i.e., zip) the implementation must transform both streams into pull-based in order to accommodate proper termination checks (in the presence oftake operators, short-ranging the stream) and to maintain proper lazy characteristics.

Last week I created a benchmarking suite porting some of our benchmarks from our strymonas project (for the record it is a staged library based on a pull-based design with elements of push-based composition. Since we are not talking about a solution based on partial evaluation here I am not going into details).

I ported some of our benchmarks in stream-benchmarks just to motivate comparisons between the new and the old design for lazy Views. My solely purpose is to initiate discussion. Here are some very quick results (without fixing my CPU clock and performing three forked-vm executions). Baselines are hand-written fused loops, semantically equivalent with pipelines. Views are the old ones, Strawman are the new ones. I have also included a pull- and a push-based reference implementation (which captures the essence of Java 8 streams as well) to compare them quickly side by side. JVM version (1.8.0_65-b17), scalac (2.12.1) and TieredCompilation off. I am using Long integers everywhere.

preliminary

We can discuss further anything you would recommend in terms of benchmarking. If you have any request, proposal, fix, etc for lazy collections benchmarks (for ArrayViews etc) let me know here #62.

from collection-strawman.

Ichoran avatar Ichoran commented on June 15, 2024 1

Example:

trait External[A] {
  def hasNext: Boolean
  def next: A
}

trait Internal[A] extends External[A] {
  def foreach[U](f: A => U) { while(hasNext) f(next) }
}

from collection-strawman.

odersky avatar odersky commented on June 15, 2024

I think it would definitely be worth discussion. We have already started on reddit, but for more details here is the right place.

Here are some initial thoughts:

  • There are fewer operations that make sense on Traversable than on Iterable
  • Operations should have different implementations
  • Both of these things mean that the "new" Traversable will be hard to integrate in a collection hierarchy: If we implement traversable ops as methods on Traversable, we have to override them all on Iterable, and we are back to the fat classes mess that we want to get away from. If we implement Traversable ops as decorators, we get a better fit, but it means that we lose runtime overriding so when the static type is Traversable, all operations use the Traversable implementations which might be less efficient than the Iterable ones.
  • So maybe a separate type is better, even though it means it will be a huge pain to keep operations in sync between Iterable and Traversable.

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

What operations would you define on Traversable and TraversableOnce? We'd either have to stick to a small set of methods or require lots of reimplementation for your Generator type.

How do you distinguish between terminal and non-terminal operations? Java 8 Streams makes this very explicit but Scala collections don't. For example, does map on a Traversable return a lazy view or a proper collection?

from collection-strawman.

lihaoyi avatar lihaoyi commented on June 15, 2024

What operations would you define on Traversable and TraversableOnce? We'd either have to stick to a small set of methods or require lots of reimplementation for your Generator type.

I'd expect to see most of the terminal operations (mkString, sum, reduce, fold, ...) implemented on Traversable/TraversableOnce, since those can be implemented the same way on both Generator and all the traditional collections using foreach or similar. I'd expect to see the transformations (map, filter, collect, ...) defined but not implemented, so they can be implemented in different ways (lazy vs eager-with-builder) on Generator and traditional collections

How do you distinguish between terminal and non-terminal operations?

Good question. I don't have an answer here, but how does the distinction between scala.Iterator/scala.Stream and the rest of the collections work in collections-strawman? Iterator/Stream also have this "terminal operation" distinction where some things force (and even exhaust, for Iterator) the computation while others just build up the transformation-chain. If there's some mechanism that collections-strawman uses to distinguish terminal operations for those two, maybe we could do the same thing

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

I'd expect to see most of the terminal operations (mkString, sum, reduce, fold, ...) implemented on Traversable/TraversableOnce, since those can be implemented the same way on both Generator and all the traditional collections using foreach or similar.

Having basic implementations based on foreach is what we do in the current collections library but we should avoid that in the new design because it is inefficient. It may be faster to have two separate implementations for Generator and Iterable but there's still the risk of losing optimizations because some calls are not guaranteed to be monomorphic anymore. We should have benchmarks to evaluate the performance impacts of such a change.

but how does the distinction between scala.Iterator/scala.Stream and the rest of the collections work in collections-strawman?

It's not made explicit but seems to come down to using "Ops" classes for terminal operations (but there are also operations that do not force a result) and "MonoTransforms" / "PolyTransforms" classes for transformations (which are lazy for lazy collection types).

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

I have a Fastring library, which depends on Traversable as well.

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

I guess a @inline final def foreach on Iterable would resolve the performance issue related to monomorphism. It's not necessary to delete Traversable

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

Thanks @biboudis for your comment.

Can you also comment on the following?

Every Iterator/Iterable would be a Foreachable, but not every Foreachable would be an Iterable.

In other words, can we safely turn a pull-based collection into a push-based one, and vice versa?

Do you think it is a good idea to have one extend the other or should we model them with unrelated types?

from collection-strawman.

Ichoran avatar Ichoran commented on June 15, 2024

You can't go back and forth. External iteration is trivially internalizable. Internal iteration is inscrutable (save for things like CPS). You can fake the API with caching, in some cases, but it's a lousy solution.

from collection-strawman.

biboudis avatar biboudis commented on June 15, 2024

Nice question and accurate answer by @Ichoran! Totally agree.

Without some way to play the JITter yourself (like partial evaluation) you can't do that in an optimized way (maybe not with reasonable effort). You can go from push to pull with caching/buffering and you have to find an elegant and fast solution to short-circuiting on top of that! Check the still open issue in java world JDK-8075939 and also check that this is still unresolved.

Stream.of(new Long[]{1L,2L})
  .flatMap(x -> Stream.iterate(0L, i -> i + 2).map(y -> x * y))
  .iterator()	
  .hasNext();
}

Defunctionalizing Push Arrays shows a unified solution but this cost is noted!

Converting a Pull array to a Push array is cheap and is also subject to fusion. [..] Converting a Push array to a Pull array, however, requires computing and storing all elements to memory [..] Few such conversions is likely to be better.

from collection-strawman.

SethTisue avatar SethTisue commented on June 15, 2024

not happening for 2.13. perhaps the idea still has a future. regardless, closing the ticket since we are in the process of closing out this repo.

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

@SethTisue Does "not happening" mean "not removal of Traversable"?

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

No, it means “no re-introduction of Traversable”.

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

Is there any difficulty to keep source-level backward compatibility here?

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

Traversable is defined as a deprecated alias to Iterable.

from collection-strawman.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.