Comments (3)
Is this a general issue with concat_list().flatten()
?
df.group_by("a").agg(
concat = pl.concat_list("c"),
flatten = pl.concat_list("c").flatten(),
count = pl.concat_list("c").flatten().count()
)
shape: (2, 4)
┌─────┬──────────────────┬──────────────┬───────┐
│ a ┆ concat ┆ flatten ┆ count │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ list[list[i64]] ┆ list[i64] ┆ u32 │
╞═════╪══════════════════╪══════════════╪═══════╡
│ a ┆ [[null], [null]] ┆ [null, null] ┆ 2 │ # <- ???
│ b ┆ [[3], [2], [1]] ┆ [3, 2, 1] ┆ 3 │
└─────┴──────────────────┴──────────────┴───────┘
from polars.
I suspect that you think concat_list
does something different than what it actually is meant to do. In short, it's meant to be a rowwise function. To get your expected result you only need to do
df.with_columns(pl.col('c').count().over('a'))
Is there something, in particular, you're trying to get out of pl.concat_list('c').flatten()
? In your 2 context case, the first context essentially does nothing because you're putting a column in a list with concat_list
but then you're taking it out of the list with flatten
. Your second context is just what I have above.
@cmdlineluser The way you present it makes it make more sense to me. When you do count
on a list then it returns the size of the list irrespective of the nullness so you'd have to do
df.group_by("a").agg(
concat = pl.concat_list("c"),
flatten = pl.concat_list("c").flatten(),
count = pl.concat_list("c").flatten().drop_nulls().count()
)
I'm going to close this for now but I can reopen if I've missed something.
from polars.
Thanks a lot for your response!
Indeed, @cmdlineluser example sheds more clarity than mine. I understand now that the reason is the usage of count
on a list.
For contect, we use polars as the computation engine of a parsing tool, where the user can input one or multiple columns and we aim to cover for both cases with our query.
The output of
df.with_columns(
pl.concat_list(pl.col("c"), pl.col("b"))
.count().
over(pl.col("a"))
.alias("result")
)
Was working as expected. Same if we focus only on column b
However, thanks to your explanation now I understand we should follow something along these lines better:
df.with_columns(
pl.concat_list(pl.col("c"), pl.col("b"))
.flatten()
.over(pl.col("a"), mapping_strategy="join")
.list.eval(pl.element().count())
.alias("result")
)
Which work as intended for all cases.
Thanks!
from polars.
Related Issues (20)
- fold shouldn't require that acc and exprs share the same dtype
- Adding `descending` parameter to `Expr.over` HOT 5
- polars.LazyFrame.head recommends using fetch()
- Reading large json file error: ComputeError: InputTooLarge at character 0
- Serialize for AnyType has a todo!() HOT 1
- File cache invalidation not triggered for HTTP if size is the same
- Loading parquet written from an Arrow table produces non-deterministic incorrect numbers since 1.2.0 HOT 3
- illegal hardware instruction with python 3.12.4 and polars 1.1.0 on MacOS Sanoma 14, M2 HOT 4
- Panic when call `hash()` on `struct` dtype HOT 2
- Multiple - Reading into a single DataFrame - read_csv - Error when using encoding = latin1
- Big integer error HOT 1
- Add a `newline` parameter to `read_csv` HOT 1
- `sort_by` + `struct` + `exclude` index out of bounds PanicException
- CSV Downloads Fail for ADLS Gen2 with Azure CLI Authentication HOT 2
- Panic on datetime column min() HOT 1
- High memory usage after `collect()` despite using `limit(1)`
- Conda package outdated HOT 1
- DateFrame.describe() reports datetime as str HOT 2
- pl.list,len() - pl.list,len() always returning u32 no matter the results HOT 1
- [FEA]: Allow specifying null location in `set_sorted`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.