Giter Site home page Giter Site logo

Comments (3)

bamine avatar bamine commented on June 4, 2024 1

@OlivierBlanvillain we should close the issue as it is solved by #65, right ?

from frameless.

imarios avatar imarios commented on June 4, 2024

I agree with this issue. I have two related observations here. Let me show them with an example:

val e = TypedDataset.create[(Int, String, Long)]( (1,"a",2L) :: (2, "b", 4L) :: (2, "b", 1L) :: Nil )

// Summing an Int column fails:
e.select(sum(e('_1)))
<console>:25: error: could not find implicit value for parameter summable: frameless.functions.Summable[Int]
       e.select(sum(e('_1)))

// The behavior in spark when adding int is to widen them to BigInt
scala> e.dataset.select(org.apache.spark.sql.functions.sum($"_1"))
res21: org.apache.spark.sql.DataFrame = [sum(_1): bigint]

Fist, I think we need to add proper implicit for the 3 types mentioned in the title (Int, short, byte). Second, I think we should stay faithful to the widening that spark does. If you add a lot of Int, is better to have the result as something bigger. The idea is that spark is used for "Big Data". Many times when you want to save disk space you will persist you short numeric values as Short/Int. However, when it's time to sum your billions-row of data, you want to collapse these numeric values into a number that you can be sure it will not overflow. This is why spark uses BigInt every time you sum up any numeric type. It might feel strange for a strictly typed system, but it makes perfect sense for the application.

from frameless.

OlivierBlanvillain avatar OlivierBlanvillain commented on June 4, 2024

I agree with you comment and I think we should simply fix this such that the return type of sum matches Spark's behaviour.

from frameless.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.