Comments (8)
Hi @Satishkumar44 you can actually preserve certain columns and use them as seeds for generating the rest of the record so you get exactly 1000 records with the same PKs. Here's some modifications you can do:
Just after you create the config_template
and before you create the SyntheticDataBundle
you can extract out a list of dictionaries that represent "seed" values for each record.
Also here, I replicated the training data 5 times, which is optional, but in this case helped generate results faster:
import json
seed_data = json.loads(training_df[["REF_ID"]].to_json(orient="records"))
training_df = pd.concat([training_df] * 5).sample(frac=1)
When you create the SyntheticDataBundle
you specify which columns you want to preserve as seed fields using the header_prefix=["REF_ID"]
param
When you go to generate data, you can feed the list of seed fields in. The model will generate one record for each seed in the list:
bundle.generate(num_lines=nrows, max_invalid=nrows, seed_fields=seed_data, num_proc=1)
Please note that num_proc=1
must be set here, as only one CPU can currently be used for seed based generation.
I tested this and attached my results.
synthetic.csv.zip
from gretel-synthetics.
while runing the bundle.generate() facing the below error . Can you help me out.
from gretel-synthetics.
Please share more detail. If you can share the full Notebook. The error means you have too many seed fields specified and you should only have one based on what I provided above.
from gretel-synthetics.
I have shared my notebook the data is same as shared before
Synthetic Data Generation.zip
from gretel-synthetics.
This looks like a slightly older version of gretel-synthetics
is running. Can you re-install that so you are at version 0.15.2?
from gretel-synthetics.
Yes after upgrading to 0.15.2 it works fine, But my issue is when iam trying to generate 2000 rows from the same data without preserving the primary key it is generating data duplicates in primary key column(REF_ID).Is there any way that primary key columns will not contain duplicates.
from gretel-synthetics.
There's currently no way to avoid duplicates unless you specifically provide a list of seeds like demonstrated above. Otherwise you can drop the primary key column, generate records, and add the primary key values back in.
from gretel-synthetics.
Thankyou for your support
from gretel-synthetics.
Related Issues (20)
- [BUG] Incompatability with package dependence HOT 2
- timeseries_dgan.ipynb example - error from train_numpy HOT 2
- TypeError: __init__() got an unexpected keyword argument 'prefetch_factor' HOT 1
- Poor training results HOT 6
- TooManyInvalidError: Maximum number of invalid lines reached! HOT 3
- [BUG] train_numpy() got multiple values for argument 'feature_types' - dgan HOT 4
- [FR] Generation based on given attributes HOT 2
- [FR / BUG] HOT 2
- Bug HOT 5
- Sample_len Value HOT 2
- Results about DGAN
- [BUG] : Loading a trained model and generating synthetic data throws an error HOT 8
- About DoppelGANger training results HOT 1
- [BUG]: Outdated category_encoders HOT 3
- List index out of range HOT 4
- ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'[FR / BUG] HOT 1
- [BUG] example notebook error HOT 3
- Marketoptiontend-analysis
- DGAN for ECG dataset HOT 3
- Logging the Performance of Time series DGAN,
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gretel-synthetics.