Comments (2)
IIRC the error comes from numpy.
This is what I get
File "/Users/thomasli/pandas/pandas/core/sample.py", line 153, in sample
return random_state.choice(obj_len, size=size, replace=replace, p=weights).astype(
File "numpy/random/mtrand.pyx", line 1001, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False
In that case, in my opinion, I think it's probably best for you to add an if check in your code before the call to sample to restrict the sample size to the length of the dataframe.
from pandas.
yeah, for the cases of production code, of course I can add an if. But when exploring on a notebook with constructions like this (imagine this is not static code, but iterating with many filters on it):
(
df
.loc[lambda df: df.price.gt(10)]
.loc[lambda df: df.date.lt("2023-04")
.sample(10)
)
Here you cannot do an if before because there's no before. I happen to do this very very often. I guess one solution would be ok, create a pipe and do
(
df
.loc[lambda df: df.price.gt(10)]
.loc[lambda df: df.date.lt("2023-04")
.pipe(sample_pipe, n=10)
)
containing the if... but I continue thinking that pandas should support my use case 🤷♂️
from pandas.
Related Issues (20)
- BUG: pd.isna handles np.nan inconsistently between numpy and extension / arrow types HOT 4
- BUG: pyarrow dictionary type ordered argument not respected HOT 2
- BUG: pd.sort_values() not working on categorical column HOT 2
- ENH: Support reading value labels for Stata formats 108 (Stata 6) and earlier
- BUG: True cannot be cast to bool HOT 6
- BUG: Segfault when calling df.to_json(orient='records') and numpy.datetime64 being serialized HOT 2
- test.tar file left over after running test suite HOT 1
- BUILD: C code coverage HOT 2
- BUG: Inputting a masked 64-bit integer array from numpy gets mangled by a hidden float conversion HOT 3
- ENH: DtypeWarning message enhancement HOT 2
- DOC: Documentation side navigation bar loses scroll position on option selection HOT 3
- ENH: Better error handling in `strftime` HOT 2
- ENH: consistent strftime behaviour
- ENH: Allow multi-column sorting with an `ExtensionArray` of unhashble items
- DOC: Update Series.map to reflect usage like DataFrame.map HOT 2
- ENH: dt.day_of_week should return int8 HOT 1
- BUG: df.where(series, axis=1) always results in TypeError HOT 3
- BUG: Whether using pip or whl, i am unable to install pandas on my pc. HOT 1
- TST: Extend Stata test data to include big-endian versions
- ENH: pd.read_csv() does not reture back datetime dtypes after pd.to_csv() HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.