Comments (10)
I've got it. FYI you're not using the pyarrow
path, so for hte time being try use_pyarrow=True
which may be a bit faster and doesn't trigger this error. If you don't have it installed, run pip install pyarrow
.
from polars.
I've opened a more general issue at #13438 that will address this problem.
from polars.
I'm trying to hit the failing code path and having trouble without a minimal PR. Can you provide some of the expected datatypes of the files that are failing? How many columns do they have?
from polars.
Ah, good point, let me see...
For the example above, the schemas are,
OrderedDict([('SomeId', Int32)])
OrderedDict([('SomeId', Int32)])
but then probably the problem is due to the files having different schemas, one having 2 extra columns — I hoped that explicitly specifying the columns
would work around that problem (and in 0.19.5 that worked when using globs), but apparently not here.
from polars.
One file has 3 extra String
columns scattered, all past the SomeId
; let me see if I can get an example isolated.
from polars.
For the example above, the schemas are,
OrderedDict([('SomeId', Int32)]) OrderedDict([('SomeId', Int32)])
but then probably the problem is due to the files having different schemas
Hang on, those schemas look identical, how are they different?
from polars.
Yep, here you go:
import polars as pl
df1 = pl.DataFrame({
'some-id': [1, 2, 3],
'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
})
df2 = pl.DataFrame({
'some-id': [4],
})
df1.write_parquet("/tmp/df1.parquet")
df2.write_parquet("/tmp/df2.parquet")
pl.read_parquet(["/tmp/df1.parquet", "/tmp/df2.parquet"], columns="some-id")
adjusting the description, too
from polars.
It might look unreasonable, but then Polars 0.19.5 could do
import polars as pl
df1 = pl.DataFrame({
'some-id': [1, 2, 3],
'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
})
df2 = pl.DataFrame({
'some-id': [4],
})
df1.write_parquet("/tmp/df1.parquet")
df2.write_parquet("/tmp/df2.parquet")
pl.read_parquet("/tmp/df*.parquet", columns="some-id")
successfully
(I do remember https://xkcd.com/1172/)
from polars.
In 0.20.3 the glob example panics
from polars.
use_pyarrow=True
solves my problem, thank you! It doesn't support globs though (again, nothing blocking me here, just filling in the details)
from polars.
Related Issues (20)
- Allow `min_periods` in `rolling_` functions to take dynamic temporal size HOT 2
- `str.to_decimal` has wrong scale inference, reads "2.04" as 2.4.
- feat(python) Change Python Decimal dtype scale to `int | None = None` instead of `int = 0` (consistent with Rust) HOT 1
- Strings corrupted during streaming multi-column descending sort HOT 2
- `read_csv`/`scan_csv` overwrites column names when `len(dtypes.keys()) >= len(df.columns)` HOT 5
- WriterProperties ignored when writing delta tables HOT 1
- Casting string literal to numeric dtypes HOT 2
- Sorting with capitals produces wrong order HOT 4
- Modularize the Rust tests in the CI
- PanicException in Datetime conversion HOT 1
- `mean` and `median` for `pl.Date` should return a datetime
- Improve aggregate functions for temporals
- Add `Expr.arg_floor`/`Expr.arg_ceil` or optimize `Expr.floor`/`Expr.ceil` to take significance argument HOT 1
- Rearrange / fix main page links HOT 1
- Regression (0.20.2 -> 0.20.3): failure to initialize null column with nested struct dtypes (`ComputeError`)
- Can't sink parquet from pyarrow dataset scan HOT 12
- Materialize predicate columns before projection columns
- str.to_decimal() returns null for "0" HOT 2
- Shift fails only on group_by when lazyframe is empty with SchemaMismatch Exception HOT 1
- ColumnNotFoundError when setting column with literal value HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.