<a href="https://github.com/adelbertc/frameless/blob/801b5af/dataset/src/test/scala/fr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sum of Int / Short / Byte breaks because of different return type about frameless HOT 3 CLOSED

typelevel commented on June 4, 2024

Sum of Int / Short / Byte breaks because of different return type

from frameless.

Comments (3)

bamine commented on June 4, 2024 1

@OlivierBlanvillain we should close the issue as it is solved by #65, right ?

from frameless.

imarios commented on June 4, 2024

I agree with this issue. I have two related observations here. Let me show them with an example:

val e = TypedDataset.create[(Int, String, Long)]( (1,"a",2L) :: (2, "b", 4L) :: (2, "b", 1L) :: Nil )

// Summing an Int column fails:
e.select(sum(e('_1)))
<console>:25: error: could not find implicit value for parameter summable: frameless.functions.Summable[Int]
       e.select(sum(e('_1)))

// The behavior in spark when adding int is to widen them to BigInt
scala> e.dataset.select(org.apache.spark.sql.functions.sum($"_1"))
res21: org.apache.spark.sql.DataFrame = [sum(_1): bigint]

Fist, I think we need to add proper implicit for the 3 types mentioned in the title (Int, short, byte). Second, I think we should stay faithful to the widening that spark does. If you add a lot of Int, is better to have the result as something bigger. The idea is that spark is used for "Big Data". Many times when you want to save disk space you will persist you short numeric values as Short/Int. However, when it's time to sum your billions-row of data, you want to collapse these numeric values into a number that you can be sure it will not overflow. This is why spark uses BigInt every time you sum up any numeric type. It might feel strange for a strictly typed system, but it makes perfect sense for the application.

from frameless.

OlivierBlanvillain commented on June 4, 2024

I agree with you comment and I think we should simply fix this such that the return type of sum matches Spark's behaviour.

from frameless.

Related Issues (20)

Recommend Projects

Sum of Int / Short / Byte breaks because of different return type about frameless HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent