Comments (8)
Install latest version:
pip install --upgrade tabgan
After upgrading tabgan, I am getting memory error as:
MemoryError Traceback (most recent call last)
in
6
7 # generate data
----> 8 new_train1, new_target1 = OriginalGenerator().generate_data_pipe(train, target, test, )
9 new_train1, new_target1 = GANGenerator().generate_data_pipe(train, target, test, )
10
~/anaconda3/envs/tab/lib/python3.7/site-packages/tabgan/abc_sampler.py in generate_data_pipe(self, train_df, target, test_df, deep_copy, only_adversarial, use_adversarial)
45 return generator.adversarial_filtering(new_train, new_target, test_df)
46 else:
---> 47 new_train, new_target = generator.generate_data(new_train, new_target, test_df)
48 new_train, new_target = generator.postprocess_data(new_train, new_target, test_df)
49 if use_adversarial:
~/anaconda3/envs/tab/lib/python3.7/site-packages/tabgan/sampler.py in generate_data(self, train_df, target, test_df)
99 train_df[self.TEMP_TARGET] = target
100 generated_df = train_df.sample(frac=(1 + self.pregeneration_frac * self.get_generated_shape(train_df)),
--> 101 replace=True, random_state=42)
102 generated_df = generated_df.reset_index(drop=True)
103 gc.collect()
~/anaconda3/envs/tab/lib/python3.7/site-packages/pandas/core/generic.py in sample(self, n, frac, replace, weights, random_state, axis)
5059 )
5060
-> 5061 locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
5062 return self.take(locs, axis=axis)
5063
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
mtrand.pyx in numpy.random.mtrand.RandomState.randint()
_bounded_integers.pyx in numpy.random._bounded_integers._rand_int64()
MemoryError: Unable to allocate 236. GiB for an array with shape (31680120000,) and data type int64
from gan-for-tabular-data.
I have got the results with positive integers. Also, I could generate 100k samples. Thank you!
from gan-for-tabular-data.
What is your hardware?
from gan-for-tabular-data.
What is your hardware?
I am trying on linux using virtual machine. Also tried with jupyter notebook using anaconda prompt in windows. (CPU)
from gan-for-tabular-data.
@ParagPatil3 I have fixed problems with negative values. Its was caused that heavy filter were skipped.
Secondly I added some gc.collect() - it reduced memory usage. Please take a look
If problem with memory usage still persist please provide full log.
from gan-for-tabular-data.
Install latest version: pip install --upgrade tabgan
from gan-for-tabular-data.
Thanks for full stack error. I had error for original sampling data - now it works well even with millions rows as input)
By the way you may pass use deep_copy=True
- it will decrease memory usage by 2 times.
from gan-for-tabular-data.
@ParagPatil3 reopen as new issue with new details if error still persists
from gan-for-tabular-data.
Related Issues (20)
- is it ok for regression type task? HOT 1
- generated Cov is not that close HOT 2
- all sample codes not working till epoch end HOT 1
- second args in generate_data_pipe cannot be left None HOT 2
- TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType' HOT 1
- training CTGAN stops in the middle (around 24%) HOT 2
- Difference between OriginalGenerator and GANGenerator HOT 1
- Getting this error when trying to install load HOT 2
- check HOT 1
- ContextualVersionConflict: (scikit-learn 1.0.2 (/usr/local/lib/python3.7/dist-packages), Requirement.parse('scikit-learn==0.23.2'), {'tabgan'}) HOT 3
- Dear Author, May I know the ctgan version for the installation? I am getting error. from ctgan import _CTGANSynthesizer ImportError: cannot import name '_CTGANSynthesizer' HOT 4
- pip install scikit-learn version issue HOT 3
- Mistake in Readme HOT 2
- Some issues araised when running Tab-GAN: 1) Manage Categorical Variables. 2) Batch size problem HOT 8
- Reproducibility issue HOT 1
- ValueError: Input X contains NaN although NaN filtered HOT 7
- IntCastingNaNError Despite No NaN values HOT 3
- LGBMClassifier.fit() got an unexpected keyword argument 'early_stopping_rounds' HOT 2
- Dependency issue with ForestDiffusion Generator HOT 3
- TypeError w/ Boolean Data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gan-for-tabular-data.