Giter Site home page Giter Site logo

Views about collection-strawman HOT 31 CLOSED

scala avatar scala commented on June 15, 2024 2
Views

from collection-strawman.

Comments (31)

odersky avatar odersky commented on June 15, 2024 3

I did a web search and it seems except for Scala, the name slice implies resource sharing. So, here's another possible naming strategy:

  • deprecate slice(start: Int, end: Int) and replace with subSequence or subseq (which do people prefer?)
  • Introduce a slice(range: Range) on those collections that can support it. These would include
    mutable.Seq and IndexedView. Non-view immutable collections should probably not support
    a slice method. You should do c.view.slice for clarity.

The result of slice would be collection.Slice and collection.mutable.Slice respectively. These types support some subset of all collection operations (essentially those operations that
maintain the element type and do not require re-indexing).

from collection-strawman.

Ichoran avatar Ichoran commented on June 15, 2024 1

Ah, yes, I did misunderstand. I agree that views should be maximally flexible in that regard; unlike iterators which should promise to be as lazy as possible, views can do whatever is "best" with regards to evaluation of sequential operations.

And I don't think evaluation order should be the same save on Seqs and SortedWhatevers where there is already a natural evaluation order.

But I think the "niche, 0.1% use case" evaluation is only because we're making it so by providing no support for this way of working. Views are a key aspect of, for instance, image processing operations in ImageJ; the conceptual equivalent of views with indexed ranges are used all over the place both in Java (e.g. java.util.Arrays.sort(xs, i0, iN)) and elsewhere (e.g. it's exceedingly common in Matlab).

One of the weaknesses of the current collections library is that it has relatively poor support for mutating operations. We needn't necessarily improve this greatly (people are used to it not being there), but it doesn't mean that the potential use cases are rare.

For instance, if one has a list of files and one wants to replace all the missing ones by default file names, you would normally do this with immutable collections:

xs.map(f => if (f.exists) f else default)

But although you could reassign everything in place mutably:

xs.mapInPlace(f => if (f.exists) f else default)

in some ways it's more natural to define a filtered view and fill it:

xs.viewWhere(! _.exists).fill(default)

Again, I'm not saying it's really important for us to provide this kind of functionality, but I don't think we should view this as for niche applications even if we have immutable alternatives as the standard now. (Except inasmuch as anything mutable is a niche application.)

from collection-strawman.

Ichoran avatar Ichoran commented on June 15, 2024 1

I think that mutable views would just be a completely different thing than immutable views, so I don't think they'd impact the hierarchy. We could call them something else also.

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

For reference, here is an example given by @Ichoran on gitter about fusion:

Just did a quick test and

def jmap(i: Iterator[Int], f: Int => Int, g: Int => Int) = i.map(f andThen g)

takes only 60% as long to run as does

def imap(i: Iterator[Int], f: Int => Int, g: Int => Int) = i.map(f).map(g)

because boxing is a lot of the overhead, and functions are specialized.

from collection-strawman.

DarkDimius avatar DarkDimius commented on June 15, 2024

@julienrf, it's not only because of boxing, it's also because of reduction total memory footprint by 2x.
If you try functions on non-primitives(which eliminates difference with boxing), you'll still see a speedup.

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

Still quoting @Ichoran:

If I use objects and just switch references around to which object I have, the jmap variant takes 75% as long as imap, so still a nice little win.
But using List instead of iterator slows things down by 5x, so you're right that avoiding intermediate collections can be the big win.

from collection-strawman.

Ichoran avatar Ichoran commented on June 15, 2024

Mutable views are nice to have for a variety of divide-and-conquer in-place algorithms. For instance, an in-place sort is easier to express if you don't have to worry about indices.

I don't think that dropping the order of evaluation is particularly useful for either mutable or immutable views. That is a different useful thing, but "I want to act on part of my collection without rebuilding it" is a different desire than "I want to parallelize my operations on this collection".

If we were to support only one of the two, we at least shouldn't call it a "view" because right now (and in most other languages I've seen) "view" means the former.

from collection-strawman.

SethTisue avatar SethTisue commented on June 15, 2024

Mutable views are nice to have for a variety of divide-and-conquer in-place algorithms. For instance, an in-place sort is easier to express if you don't have to worry about indices

I think this is a niche, 0.1% use case and we should drop it and not worry about it.

I don't think that dropping the order of evaluation is particularly useful for either mutable or immutable views. That is a different useful thing, but "I want to act on part of my collection without rebuilding it" is a different desire than "I want to parallelize my operations on this collection".

Not sure, but I think you might be misunderstanding Stefan's suggestion? It isn't about parallelization. He isn't suggesting leaving iteration order unspecified — allowing element 4 to be operated on before element 3, as can happen with parallel collections. Rather, he's suggesting to allow fusion. If you do .map(a).filter(b), it would be unspecified whether all the elements get mapped before any of them get filtered.

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

Also note that evaluation order of views in the current strawman is already different from evaluation order of operations on the underlying collections. If you wanted the same evaluation order you would have to build intermediate results but the whole point of views is to avoid that. I am only suggesting that we do not guarantee the same evaluation order as that of Iterators, either.

from collection-strawman.

SethTisue avatar SethTisue commented on June 15, 2024

I think the "niche, 0.1% use case" evaluation is only because we're making it so by providing no support for this way of working

It isn't so much that I'm opposed to including support for it, the only thing I'm strongly opposed to is letting this class of use cases be something that drives top-level design decisions. If it can be accommodated without distorting the hierarchy+API, great. (Can it?)

from collection-strawman.

odersky avatar odersky commented on June 15, 2024

I would like to propose to take them one step further and make the evaluation order of view operations unspecified.

I don't think that's necessary. The specified evaluation order of views already demands fusing. An unspecified evaluation order could give you some extra facility, for instance if you want to batch some ops into segments before fusing. But I don't think views are the right place for that.

[EDIT] I might be convinced otherwise by benchmarks that show that there's a big win to be had by automatic batching.

I am also betting that we will have much better techniques to reason about effects in the future, which would let us optimize without compromising semantics.

from collection-strawman.

odersky avatar odersky commented on June 15, 2024

Regarding mutable views, one question is what operations would produce a view? If we do not want to re-index (and I think we should not), then it seems the only sensical operations are to change the start, the end, or the stride. So, instead of views we could use a general notion of slices. E.g., on mutable (indexed?) Seq[A]:

def slice(r: Range): Slice[A]

val xs: Array[Int] = ...
xs.slice(3 until xs.length)
xs.slice(5 to 1 by -2)

Normal, strict slices are collection transformers and would not be available on mutable collections. There's no danger of silently changing semantics because the new non-strict slice
takes a range as argument instead of two bounds.

WDYT?

from collection-strawman.

odersky avatar odersky commented on June 15, 2024

I just noted that there is a source of confusion for arrays.

xs.slice(1, 3)   

creates a new, strict array, or at least it did so far. But xs.slice(1 until 3) creates a view. Not sure what to do about that one yet - deprecate old-style slice on Array maybe?

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

@odersky Maybe we should suffix operations producing a view with “view” (e.g. xs.sliceView(1 to 5)), to make it clearer that further operations will be fused.

from collection-strawman.

odersky avatar odersky commented on June 15, 2024

@julienrf I agree sliceView would be clearer but it would also be longer and harder to remember. Also, it should not be ...View since a slice is not a view, but supports different operations. Maybe some other name? But I am not sure which.

from collection-strawman.

Ichoran avatar Ichoran commented on June 15, 2024

I am wary of having too many different derived collection types. So far we have immutable, mutable, iterator, immutable view, mutable view, and now we're going to have slice also? One of the primary reasons that the existing views are not used much--aside from surprising behavior that arose from the lack of a clear distinction between mutable and immutable views and inherent bugs due to LSP issues--was that they were buggy because it was hard to maintain so many different parallel implementations.

I would therefore try quite hard to keep slices and views the same thing. A slice would be one particular way to get a view. If we're not going to do that I would like to have examples of methods that are very important to have on one but must not be on the other. I can't think of any offhand.

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

It's not clear to me why we would need separate types for slices and views. If we distinguish between mutable and immutable views, slice on a mutable collection could produce a mutable view and subsequence on an immutable collection would create an immutable view. Array.slice is a problem in this model because the operation we want on (mutable) arrays is slice (with the new meaning), so we'd have to reuse the name with a new meaning (which IMHO makes this a non-starter; a change like this is far too dangerous).

Are there any other opinions on the method names? If they are used in this way in other languages and libraries it probably makes sense to do the same (modulo the Array.slice problem) but I find them counter-intuitive. I expect slice to actually slice off the part that I want without referencing the whole collection, which means copying in the case of Array.slice. But maybe that's just because I've mostly worked with Scala for the last decade.

from collection-strawman.

odd avatar odd commented on June 15, 2024

How about def focus(r: Range): View[A]?

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

I would like to take advantage of this thread to ask whether we should keep View extend Iterable or not. The argument to not make View extend Iterable is that by doing so we have a clear separation between collections types whose transformations (filter, take, map, flatMap, etc.) are evaluated lazily (View) or eagerly (Iterable). Currently, taking an Iterable as parameter gives us no clue about the way transformation operations will be evaluated.

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

The same is true for other lazy collections like Stream or LazyList, and these do not only extend Iterable but even Seq and LinearSeq. Maybe we should make the distinction between strict and non-strict collection types explicit with a marker trait instead of trying to assign meaning to the classes in the main hierarchy.

from collection-strawman.

odersky avatar odersky commented on June 15, 2024

I agree with Stefan, and I am also not sure whether we should have a marker trait or not. I would defer all that until we have dealt with parallel collections. The order of evaluation (strict/lazy/batched/parallel) can vary a lot and we should think carefully before we invest types to reflect that.

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

Yeah, ok, I agree with your arguments :)

from collection-strawman.

LPTK avatar LPTK commented on June 15, 2024

Re: choosing names for in-place mutable operations (from the other thread #7).

@odersky

The one thing I think is important is that we should not try to come up with creative new names for linear versions of immutable operations. My current favorite would be mapInPlace, filterInPlace, but if someone has a better idea, I am happy to change my preference.

What about appending _! to the names of immutable versions? It does seem a bit alien at first, but after having used that in my code for some time, I find it appropriate: it makes it exceedingly clear what are the operations to "watch out for" because of their effects, similar to how ??? catches the reader's eye. For example:

val arr = ArrayBuffer(1,2,3,4,5)
// clearly, I'm doing imperative transformations here:
arr.map_!(x => x + 1).filter_!( _ % 2 == 0 )
// or:
arr map_! (_ + 1) filter_! (_ % 2 == 0)

There is precedence for this in Ruby, where it's somewhat nicer because there is no need for the _. It would be nice if the lexer was modified and we could have:

arr.map!(_ + 1).filter!(_ % 2 == 0)

I would probably also push for marking methods not guaranteed to have the expected efficient implementation (such as size on List not being O(1)) with a _$ as in ls.size_$, marking option-returning methods with _? as in ls.head_?, and marking parallel operations with _|| as in val ys = xs map_|| f, but that would probably be going to far 😄

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

SeqView is very dangerous at the moment because it may violate rules for java.lang.Object.equals.

https://issues.scala-lang.org/browse/SI-10201

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

view == view should not force the view.

I expect view0 eq view1 and view0 == view1 are always same.

from collection-strawman.

Atry avatar Atry commented on June 15, 2024

seq != view should always be true, given we always expect seq != iterator, and view is simply a wrapper of Iterator.

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

I agree. In the strawman we define equality at the level of Seq and Set. View is yet another basic collection type which would never be equal to a Seq or Set. Reflexivity is a special case, it should be trivial to fix in either 2.12 or 2.13.

from collection-strawman.

SolomonSun2010 avatar SolomonSun2010 commented on June 15, 2024

I really expect LazyList or Stream add these powerful and convenient methods :
repeat/cycle, repeatedly/generate, iterate, unfold, resource, reject,
like in Elixir and Clojure: https://hexdocs.pm/elixir/Stream.html#unfold/2
This symbol #:: is too low,strange to verbose,unreadable.
Also, I hope add some xxxIndexed like in Kotlin Sequence: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/index.html
I think, one of the programming trend is the Lazy Sequences .
ps: post here may right.

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

My experiences with immutable.Seq/Iterable vs collection.Seq/Iterable vs Iterator from scala-library and scala-reflect so far:

  • Many Iterator methods are commonly used but not defined in the strawman. In many cases callers abstract over IterableOnce (or TraversableOnce which is now aliased to IterableOnce). We should at least have a shared implementation trait, possibly add methods back to IterableOnce. I am defining deprecated extension methods for now.

  • Consing for collection.Seq is sorely missing. We already have ++ and ++: in collection.IterableOps. We should also have +: and :+. These are only defined in immutable.SeqOps and can be easily emulated with ++ and ++: but this prevents collection implementations from properly optimizing them (e.g. List.+:).

from collection-strawman.

julienrf avatar julienrf commented on June 15, 2024

Many Iterator methods are commonly used but not defined in the strawman.

FWIW I added some of them in that PR: https://github.com/scala/collection-strawman/pull/200/files#diff-c0d7086690fd3b617f2eb319d5b832ce

Consing for collection.Seq is sorely missing.

In case you missed it: #205 (and #206).

from collection-strawman.

szeiger avatar szeiger commented on June 15, 2024

It looks like we've settled on a View design by now so I'm closing this ticket.

from collection-strawman.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.