Pandas version checks <li class="task-list-item"

BUG: read_parquet wrongly returns empty index if asked to read empty column list about pandas HOT 2 OPEN

batterseapower commented on June 28, 2024

BUG: read_parquet wrongly returns empty index if asked to read empty column list

from pandas.

Comments (2)

batterseapower commented on June 28, 2024

My guess based on the observable behaviour is that there are two seperate bugs here:

Pandas has a bug in the code for pd.read_parquet in the columns=[] case
pq.write_table has a bug in the case where the Table it is asked to serialize is empty, causing it to write a parquet file with 0 rows rather than the true row count. The similar bug in DataFrame.to_parquet is a consequence of this pyarrow bug.

from pandas.

batterseapower commented on June 28, 2024

Actually, maybe the bug is purely in pyarrow since engine='fastparquet' fixes both things?

pd.DataFrame(index=pd.RangeIndex(2), columns=['C', 'D']).to_parquet('temp.parquet')

# Prints 2
print(len(pd.read_parquet('temp.parquet', columns=[], engine='fastparquet').index))

pd.DataFrame(index=pd.RangeIndex(2), columns=[]).to_parquet('temp.parquet', engine='fastparquet')

# Prints 2
print(pq.read_table('temp.parquet', columns=[]).num_rows)

from pandas.

BUG: read_parquet wrongly returns empty index if asked to read empty column list about pandas HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent