Comments (6)
thanks for the report - this looks expected, I think you're asking for a non_existent
argument, see #11579
closing in favour of that one then
from polars.
@MarcoGorelli will you consider the propose of #13758 at the same time? when you say in #11579
whereas ambiguous datetimes are a fact of life, non-existent ones indicate a data error which should probably be cleaned by the user
I agree with that, but there's no simple way to branch part of the data to be problematic and handle it separately, like if else in pthon. I do try to flag the data error with when-then-otherwise logic, and replace_time_zone is only supposed to handle the 'clean' branch, but no, all code path need to work simutaneously. it is a pain to use the when-then-otherwise logic when the code can potentially error or panic.
from polars.
this is the suggestion I'm making there, so to have non_existent
take 'null'
or 'raise'
from polars.
this is the suggestion I'm making there, so to have
non_existent
take'null'
or'raise'
I am aware of that and I am not complaining about not having the option in replace_time_zone here,
I am discussing the general case, what can user do when they have some use case where polars currently gives an error/panic?
some of problem are fixed by polars team in a week, some of the problem are fixed by maybe in a year, but if I need to use polars in production, I cannot wait till tomorrow.
So, I am asking polars team in addition to the parallel version of when-then-otherwise, to also provide us a version of when-then-otherwise, where exactly one branch of the when-then-otherwise will be executed, so, if I encounter some error/panic, I can have a quick workaround by filtering the data and handle the cases that causes the error with a when-then-otherwise clause, and when the problem is fixed by polars team I can just switch back to the offical solution, with minimal code change.
from polars.
I'd suggest explaining this in #13758
from polars.
In the specific case of non-existent time zones there are two work arounds that I've used.
One is at the end of this.
The other is to create a clean range and then join to the unclean.
So suppose you have
unclean = pl.select(a=pl.datetime_range(pl.datetime(2022,3,13), pl.datetime(2022,3,14), '30m'))
Doing the following will produce an error
unclean.with_columns(pl.col('a').dt.replace_time_zone('America/Chicago'))
ComputeError: datetime '2022-03-13 02:00:00' is non-existent in time zone 'America/Chicago'. Non-existent datetimes are not yet supported
To get around that without looping with a try/except block (as above), you can do
clean=pl.select(b=pl.datetime_range(unclean.select(pl.col('a').min().dt.replace_time_zone("America/Chicago")).item(),
unclean.select(pl.col('a').max().dt.replace_time_zone("America/Chicago")).item(),
unclean['a'].diff().min())).with_columns(
a=pl.col('b').dt.replace_time_zone(None)
)
unclean.join(clean, on='a', how='left')
shape: (49, 2)
┌─────────────────────┬───────────────────────────────┐
│ a ┆ b │
│ --- ┆ --- │
│ datetime[μs] ┆ datetime[μs, America/Chicago] │
╞═════════════════════╪═══════════════════════════════╡
│ 2022-03-13 00:00:00 ┆ 2022-03-13 00:00:00 CST │
│ 2022-03-13 00:30:00 ┆ 2022-03-13 00:30:00 CST │
│ 2022-03-13 01:00:00 ┆ 2022-03-13 01:00:00 CST │
│ 2022-03-13 01:30:00 ┆ 2022-03-13 01:30:00 CST │
│ 2022-03-13 02:00:00 ┆ null │
from polars.
Related Issues (20)
- Add `show` method to `DataFrame` and `LazyFrame` HOT 1
- `gather` in `agg` context gathers values from other groups
- ShapeError: filter's length: 155 differs from that of the series: 0 HOT 9
- Version 0.20.30 bug HOT 4
- `.list.to_array()` fails if first element of a list column is excluded HOT 2
- `scan_parquet` + `with_row_index` causing `pl.len()` to return 0 HOT 1
- full join with coalesce=True panics if more key expressions are used than columns in a frame
- LazyFrames containing nested List types will cause panic in `collect()` HOT 1
- Another "coalesce=False" `join` schema issue HOT 2
- performance slowdown with `Expr.alias` HOT 3
- Shift(n) should accept a varying n HOT 4
- Rolling ewm/prod/rank HOT 4
- Improve string split API and DataTypes (`split`, `splitn`, `split_exact`) HOT 2
- Inconsistent Behavior with `inspect` in Aggregations
- In LazyFrame, select empty Series causes panic HOT 3
- `check_sorted` causes error in `DataFrame.rolling` HOT 3
- Expose API for custom grouping operations similar to expression plugin API
- add strategy="mode" for fill_null HOT 5
- `write_parquet(pyarrow=False)` with `Struct` panic: "The children must have an equal number of values." HOT 2
- Add `pl.Config.show_full` (or something similar) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.