Comments (5)
Using Pyright/MyPy, you'll get a static type-checking warning if you try and pass in object()
to time_zone
here. The only way this could perhaps be built on (from a type-checking perspective), is maintaining a literal of all possible timezones, and using that instead of str
.
from polars.
thanks for reporting - the validation currently happens later:
In [9]: pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime(time_zone='cabbage'))
---------------------------------------------------------------------------
ComputeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime(time_zone='cabbage'))
File ~/scratch/.venv/lib/python3.11/site-packages/polars/series/series.py:312, in Series.__init__(self, name, values, dtype, strict, nan_to_null, dtype_if_empty)
309 raise TypeError(msg)
311 if isinstance(values, Sequence):
--> 312 self._s = sequence_to_pyseries(
313 name,
314 values,
315 dtype=dtype,
316 strict=strict,
317 nan_to_null=nan_to_null,
318 )
320 elif values is None:
321 self._s = sequence_to_pyseries(name, [], dtype=dtype)
File ~/scratch/.venv/lib/python3.11/site-packages/polars/_utils/construction/series.py:235, in sequence_to_pyseries(name, values, dtype, strict, nan_to_null)
225 if values_tz != "UTC" and dtype_tz is None:
226 warnings.warn(
227 "Constructing a Series with time-zone-aware "
228 "datetimes results in a Series with UTC time zone. "
(...)
233 stacklevel=find_stacklevel(),
234 )
--> 235 return s.dt.replace_time_zone(dtype_tz or "UTC")._s
236 return s._s
238 elif (
239 _check_for_numpy(value)
240 and isinstance(value, np.ndarray)
241 and len(value.shape) == 1
242 ):
File ~/scratch/.venv/lib/python3.11/site-packages/polars/series/utils.py:107, in call_expr.<locals>.wrapper(self, *args, **kwargs)
105 expr = getattr(expr, namespace)
106 f = getattr(expr, func.__name__)
--> 107 return s.to_frame().select_seq(f(*args, **kwargs)).to_series()
File ~/scratch/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py:7906, in DataFrame.select_seq(self, *exprs, **named_exprs)
7883 def select_seq(
7884 self, *exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr
7885 ) -> DataFrame:
7886 """
7887 Select columns from this DataFrame.
7888
(...)
7904 select
7905 """
-> 7906 return self.lazy().select_seq(*exprs, **named_exprs).collect(_eager=True)
File ~/scratch/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py:1810, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
1807 if background:
1808 return InProcessQuery(ldf.collect_concurrently())
-> 1810 return wrap_df(ldf.collect())
ComputeError: unable to parse time zone: 'cabbage'. Please check the Time Zone Database for a list of available time zones
from polars.
Hi @MarcoGorelli, should we add some checks here
polars/py-polars/polars/datatypes/classes.py
Lines 461 to 463 in 2970c57
Similar to what's done here
polars/py-polars/polars/_utils/convert.py
Lines 163 to 194 in 2970c57
Or should we leave the error as it is? Do you have any thoughts on this? If possible, I'd like to improve it.
from polars.
Related to this: it's a bit concerning that the invalid time zone can end up in Rust without it being validated:
import polars as pl
dtype = pl.Datetime(time_zone="invalid_time_zone")
s = pl.Series(dtype=dtype)
print(s)
shape: (0,)
Series: '' [datetime[μs, invalid_time_zone]]
[
]
from polars.
true, validation only kicks in if there's an actual date in the column. maybe it can happen earlier
from polars.
Related Issues (20)
- add `show` method for syntax compatibility with pyspark/duckdb/etc dataframe API
- `gather` in `agg` context gathers values from other groups
- ShapeError: filter's length: 155 differs from that of the series: 0 HOT 9
- Version 0.20.30 bug HOT 4
- `.list.to_array()` fails if first element of a list column is excluded HOT 2
- `scan_parquet` + `with_row_index` causing `pl.len()` to return 0 HOT 1
- full join with coalesce=True panics if more key expressions are used than columns in a frame
- LazyFrames containing nested List types will cause panic in `collect()` HOT 1
- Another "coalesce=False" `join` schema issue HOT 2
- performance slowdown with `Expr.alias` HOT 3
- Shift(n) should accept a varying n HOT 4
- Rolling ewm/prod/rank HOT 3
- Improve string split API and DataTypes (`split`, `splitn`, `split_exact`)
- Inconsistent Behavior with `inspect` in Aggregations
- In LazyFrame, select empty Series causes panic HOT 3
- `check_sorted` causes error in `DataFrame.rolling` HOT 3
- Expose API for custom grouping operations similar to expression plugin API
- add strategy="mode" for fill_null HOT 5
- `write_parquet(pyarrow=False)` with `Struct` panic: "The children must have an equal number of values." HOT 2
- Add `pl.Config.show_full` (or something similar) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.