Giter Site home page Giter Site logo

Comments (3)

cmdlineluser avatar cmdlineluser commented on July 19, 2024 1

Is this a general issue with concat_list().flatten()?

df.group_by("a").agg(
   concat = pl.concat_list("c"),
   flatten = pl.concat_list("c").flatten(),
   count = pl.concat_list("c").flatten().count()
)
shape: (2, 4)
┌─────┬──────────────────┬──────────────┬───────┐
│ a   ┆ concat           ┆ flatten      ┆ count │
│ --- ┆ ---              ┆ ---          ┆ ---   │
│ str ┆ list[list[i64]]  ┆ list[i64]    ┆ u32   │
╞═════╪══════════════════╪══════════════╪═══════╡
│ a   ┆ [[null], [null]] ┆ [null, null] ┆ 2     │ # <- ???
│ b   ┆ [[3], [2], [1]]  ┆ [3, 2, 1]    ┆ 3     │
└─────┴──────────────────┴──────────────┴───────┘

from polars.

deanm0000 avatar deanm0000 commented on July 19, 2024

I suspect that you think concat_list does something different than what it actually is meant to do. In short, it's meant to be a rowwise function. To get your expected result you only need to do

df.with_columns(pl.col('c').count().over('a'))

Is there something, in particular, you're trying to get out of pl.concat_list('c').flatten()? In your 2 context case, the first context essentially does nothing because you're putting a column in a list with concat_list but then you're taking it out of the list with flatten. Your second context is just what I have above.

@cmdlineluser The way you present it makes it make more sense to me. When you do count on a list then it returns the size of the list irrespective of the nullness so you'd have to do

df.group_by("a").agg(
   concat = pl.concat_list("c"),
   flatten = pl.concat_list("c").flatten(),
   count = pl.concat_list("c").flatten().drop_nulls().count()
)

I'm going to close this for now but I can reopen if I've missed something.

from polars.

jesusestevez avatar jesusestevez commented on July 19, 2024

Thanks a lot for your response!

Indeed, @cmdlineluser example sheds more clarity than mine. I understand now that the reason is the usage of count on a list.

For contect, we use polars as the computation engine of a parsing tool, where the user can input one or multiple columns and we aim to cover for both cases with our query.
The output of

df.with_columns(
    pl.concat_list(pl.col("c"), pl.col("b"))
    .count().
    over(pl.col("a"))
    .alias("result")
    )

Was working as expected. Same if we focus only on column b However, thanks to your explanation now I understand we should follow something along these lines better:

df.with_columns(
    pl.concat_list(pl.col("c"), pl.col("b"))
    .flatten()
    .over(pl.col("a"), mapping_strategy="join")
    .list.eval(pl.element().count())
    .alias("result")
    )

Which work as intended for all cases.

Thanks!

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.