Comments (2)
My guess based on the observable behaviour is that there are two seperate bugs here:
- Pandas has a bug in the code for pd.read_parquet in the columns=[] case
- pq.write_table has a bug in the case where the Table it is asked to serialize is empty, causing it to write a parquet file with 0 rows rather than the true row count. The similar bug in DataFrame.to_parquet is a consequence of this pyarrow bug.
from pandas.
Actually, maybe the bug is purely in pyarrow since engine='fastparquet' fixes both things?
pd.DataFrame(index=pd.RangeIndex(2), columns=['C', 'D']).to_parquet('temp.parquet')
# Prints 2
print(len(pd.read_parquet('temp.parquet', columns=[], engine='fastparquet').index))
pd.DataFrame(index=pd.RangeIndex(2), columns=[]).to_parquet('temp.parquet', engine='fastparquet')
# Prints 2
print(pq.read_table('temp.parquet', columns=[]).num_rows)
from pandas.
Related Issues (20)
- BUG: rank is not supported for large_string[pyarrow] dtype
- BUG: String methods has no method "isascii()" HOT 2
- BUG: freq after sort values differs between arm Mac and ubuntu HOT 2
- BUG: doctests fail after release of scipy 1.14.0
- DOC: to_sql docs should mention ADBC HOT 1
- BUG: DataFrame.xs multi-index drop_level=False has no effect when level= is left at default
- DataFrameGroupBy.agg with nan results into inf HOT 1
- CI: Don't run the 'trailing-whitespace' check on markdown files.
- BUG: HDF support and `show_versions()` broken with pandas 2.2.2 and numpy 2.0
- BUG: to_sql does not populate index column with a value when using the mssql+pyodbc engine
- BUG: to_sql does gives incorrect column name for index when callable passed in to method
- ENH: Back pd.BooleanArray with nanoarrow HOT 1
- BUG: None becomes empty string when writing multiple columns to CSV, but double quotes "" when writing single columns HOT 1
- BUG: Unable to import `pandas` when `pyarrow` 16.1.0 is installed HOT 2
- BUG: datetime64[s] data changes when put into HDFStore HOT 1
- BUG: random crash / hang when calculating rolling sum HOT 2
- BUG: 0/0 with arrow backend is not "NA" HOT 1
- BUG: pivot_table chokes on pd.DatetimeTZDtype if there are no rows.
- DOC: "Accelerated operations" talks about speedup in obsolete versions of Pandas
- BUG: rounding datetime in series is broken
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.