Giter Site home page Giter Site logo

Comments (9)

bkamins avatar bkamins commented on August 20, 2024

This is the intended way to do it:

julia> combine(df, [:a, :b] .=> myextrema .=> x -> x .* ["_min", "_max"])
1×4 DataFrame
 Row │ a_min  a_max  b_min  b_max
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1     10      4     13

You can then even do just e.g.:

julia> combine(df, [:a, :b] .=> Ref∘extrema .=> x -> x .* ["_min", "_max"])
1×4 DataFrame
 Row │ a_min  a_max  b_min  b_max
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1     10      4     13

from dataframes.jl.

schlichtanders avatar schlichtanders commented on August 20, 2024

Thank you very much - I couldn't find such an example in the documentation.

I still don't understand why your second version works 😅.

This approach has the disadvantage that one needs to replicate which fields the transformation function has. Looks flexible, and easy to understand, which is really great, but also like duplication.

from dataframes.jl.

bkamins avatar bkamins commented on August 20, 2024
  1. It is documented that to produce multiple columns you have to either pass AsTable or a vector of column names.
  2. It is documented that you can auto-generate the target column names using a function (to dynamically generate them). In this case the function takes source column names as input.

This approach has the disadvantage that one needs to replicate which fields the transformation function has.

Yes - this is a disadvantage. That is why I have commented that you do not have to pass these column names in the function (the example with Ref, which skips defining target column names).


We could allow for a function taking both "source column names" and "names returned by a function" and allowing combining them, but it seemed overly complex (i.e. the API would be hard for typical users to understand and learn). What I have given you was the most concise variant.

The variant that you want is available, and it avoids duplication, but the disadvantage is that the code is longer (so I thought that it is less interesting):

julia> using DataFrames

julia> df = DataFrame(a = 1:10, b = 4:13)
10×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6
   4 │     4      7
   5 │     5      8
   6 │     6      9
   7 │     7     10
   8 │     8     11
   9 │     9     12
  10 │    10     13

julia> function myextrema(a)
           ex = extrema(a[1])
           n = propertynames(a)[1]
           (; Symbol(n, "_min") => ex[1], Symbol(n, "_max") => ex[2])
       end
myextrema (generic function with 1 method)

julia>

julia> combine(df, AsTable.([:a, :b]) .=> myextrema .=> AsTable) 
1×4 DataFrame
 Row │ a_min  a_max  b_min  b_max
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1     10      4     13

from dataframes.jl.

schlichtanders avatar schlichtanders commented on August 20, 2024

2. It is documented that you can auto-generate the target column names using a function (to dynamically generate them). In this case the function takes source column names as input.

Could an example be added to https://dataframes.juliadata.org/stable/man/working_with_dataframes/?
This was my source of truth and there I couldn't find it.

from dataframes.jl.

bkamins avatar bkamins commented on August 20, 2024

There is an example in the docstring. https://dataframes.juliadata.org/stable/lib/functions/#DataFrames.combine. We could add also something in the intro manual. Could you propose something that you would find most useful?

from dataframes.jl.

schlichtanders avatar schlichtanders commented on August 20, 2024

I think just below .=> within the combine Section would be nice

julia> combine(df, names(df) .=> sum, names(df) .=> prod)
1×4 DataFrame
 Row │ A_sum  B_sum    A_prod  B_prod
     │ Int64  Float64  Int64   Float64
─────┼─────────────────────────────────
   110     10.0      24     24.0

# this is new:
julia> combine(df, names(df) .=> Ref  extrema .=> (c -> c .* ["_min", "_max"]))

Probably with a little extra explanation what the Ref is doing here (I haven't entirely understood its need yet).

from dataframes.jl.

bkamins avatar bkamins commented on August 20, 2024

https://bkamins.github.io/julialang/2024/03/22/minicontainers.html

from dataframes.jl.

bkamins avatar bkamins commented on August 20, 2024

See #3433 for an update of the manual. Of course please comment if something is not clear or should be improved.

from dataframes.jl.

schlichtanders avatar schlichtanders commented on August 20, 2024

looks especially good. Thank you for the detailed documentation improvement!

from dataframes.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.