Giter Site home page Giter Site logo

Comments (10)

mcrumiller avatar mcrumiller commented on June 4, 2024 1

I've got it. FYI you're not using the pyarrow path, so for hte time being try use_pyarrow=True which may be a bit faster and doesn't trigger this error. If you don't have it installed, run pip install pyarrow.

from polars.

mcrumiller avatar mcrumiller commented on June 4, 2024 1

I've opened a more general issue at #13438 that will address this problem.

from polars.

mcrumiller avatar mcrumiller commented on June 4, 2024

I'm trying to hit the failing code path and having trouble without a minimal PR. Can you provide some of the expected datatypes of the files that are failing? How many columns do they have?

from polars.

alf239 avatar alf239 commented on June 4, 2024

Ah, good point, let me see...

For the example above, the schemas are,

OrderedDict([('SomeId', Int32)])
OrderedDict([('SomeId', Int32)])

but then probably the problem is due to the files having different schemas, one having 2 extra columns — I hoped that explicitly specifying the columns would work around that problem (and in 0.19.5 that worked when using globs), but apparently not here.

from polars.

alf239 avatar alf239 commented on June 4, 2024

One file has 3 extra String columns scattered, all past the SomeId; let me see if I can get an example isolated.

from polars.

mcrumiller avatar mcrumiller commented on June 4, 2024

For the example above, the schemas are,

OrderedDict([('SomeId', Int32)])
OrderedDict([('SomeId', Int32)])

but then probably the problem is due to the files having different schemas

Hang on, those schemas look identical, how are they different?

from polars.

alf239 avatar alf239 commented on June 4, 2024

Yep, here you go:

import polars as pl

df1 = pl.DataFrame({
    'some-id': [1, 2, 3],
    'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
})
df2 = pl.DataFrame({
    'some-id': [4],
})

df1.write_parquet("/tmp/df1.parquet")
df2.write_parquet("/tmp/df2.parquet")

pl.read_parquet(["/tmp/df1.parquet", "/tmp/df2.parquet"], columns="some-id")

adjusting the description, too

from polars.

alf239 avatar alf239 commented on June 4, 2024

It might look unreasonable, but then Polars 0.19.5 could do

import polars as pl

df1 = pl.DataFrame({
    'some-id': [1, 2, 3],
    'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
})
df2 = pl.DataFrame({
    'some-id': [4],
})

df1.write_parquet("/tmp/df1.parquet")
df2.write_parquet("/tmp/df2.parquet")

pl.read_parquet("/tmp/df*.parquet", columns="some-id")

successfully

(I do remember https://xkcd.com/1172/)

from polars.

alf239 avatar alf239 commented on June 4, 2024

In 0.20.3 the glob example panics

from polars.

alf239 avatar alf239 commented on June 4, 2024

use_pyarrow=True solves my problem, thank you! It doesn't support globs though (again, nothing blocking me here, just filling in the details)

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.