Comments (3)
Thanks for the suggestion. We would need to check whether the groups do indeed form a contiguous sequence. In addition, it would mean the type of index in the result is dependent on the values being grouped, which can make it hard to predict for users. For these reasons, I do not think we should be swapping out the index.
from pandas.
@rhshadrach Thanks for your quick response and for considering the suggestion. I believe the proposed optimization could be particularly useful when users group rows by an integer dtype column that forms a contiguous sequence. It really consumes much memory when using other index type other than rangeindex. I believe this enhancement could contribute to the overall efficiency and performance of Pandas, especially for data processing tasks involving large datasets. This is a common scenario, especially in large datasets, where the choice of index type can significantly impact memory usage. Utilizing RangeIndex in such cases could offer substantial memory savings.
Besides, most users interact with DataFrame contents rather than the index type and I think most operations do not distinguish int64index and rangeindex, and it would not affect user too much. Also, I anticipate that implementing this change could be relatively straightforward. Thus I think this might be easy and useful to fix.
from pandas.
I agree with @rhshadrach, we don't want value dependent behaviour if we can avoid it, the improvement isn't worth the hassle in this case. You can cast the Index by yourself if necessary.
from pandas.
Related Issues (20)
- BUG: Columns resulting from joining multiIndex dataFrame are incorrect in python 3.12.1 HOT 6
- Pyarrow will become a required dependency of pandas in the next major release of pandas HOT 2
- BUG: `std` using `numpy.float32` dtype gives incorrect result on contant array. HOT 16
- Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0), (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries) but was not found to be installed on your system. If this would cause problems for you,BUILD: HOT 2
- CI: excel tests started failing on windows
- BUG: Unexpected read_csv parse_dates behavior HOT 3
- BUG: MultiIndex.factorize fails if index is 0-length HOT 3
- Potential performance regression by commit 1e9cccc: PERF Allow RangeIndex.take to return a RangeIndex when possible HOT 1
- BUG: pandas >= 2.0 parses 'May' in ambiguous way. HOT 8
- BUG: Test failures on 32-bit x86 with pyarrow installed HOT 3
- BUG: Subtraction fails with matching index of different name and type HOT 6
- ENH: Better documentation or default behavior for GroupBy for columns with non-sortable values HOT 6
- DOC: Clarify how groupby forms groups for mixed-value columns HOT 3
- BUG: Binary op with datetime offset is resulting in incorrect results HOT 1
- BUG: `sort_values` is sorting the index too when `ignore_index=False` HOT 1
- BUG: ADBC Postgres writer incorrectly names the table
- Potential regression induced by PR #57328
- BUG: na_values dict form not working on index column HOT 2
- BUG: .rolling-method is not working properly when called with a timedelta HOT 5
- Split pandas package into pandas and pandas-core HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.