Giter Site home page Giter Site logo

Comments (8)

johntmyers avatar johntmyers commented on May 30, 2024

Hi @Satishkumar44 you can actually preserve certain columns and use them as seeds for generating the rest of the record so you get exactly 1000 records with the same PKs. Here's some modifications you can do:

Just after you create the config_template and before you create the SyntheticDataBundle you can extract out a list of dictionaries that represent "seed" values for each record.

Also here, I replicated the training data 5 times, which is optional, but in this case helped generate results faster:

import json

seed_data = json.loads(training_df[["REF_ID"]].to_json(orient="records"))
training_df = pd.concat([training_df] * 5).sample(frac=1)

When you create the SyntheticDataBundle you specify which columns you want to preserve as seed fields using the header_prefix=["REF_ID"] param

image

When you go to generate data, you can feed the list of seed fields in. The model will generate one record for each seed in the list:

bundle.generate(num_lines=nrows, max_invalid=nrows, seed_fields=seed_data, num_proc=1)

Please note that num_proc=1 must be set here, as only one CPU can currently be used for seed based generation.

I tested this and attached my results.
synthetic.csv.zip

from gretel-synthetics.

Satishkumar44 avatar Satishkumar44 commented on May 30, 2024

while runing the bundle.generate() facing the below error . Can you help me out.
generating

from gretel-synthetics.

johntmyers avatar johntmyers commented on May 30, 2024

Please share more detail. If you can share the full Notebook. The error means you have too many seed fields specified and you should only have one based on what I provided above.

from gretel-synthetics.

Satishkumar44 avatar Satishkumar44 commented on May 30, 2024

I have shared my notebook the data is same as shared before
Synthetic Data Generation.zip

from gretel-synthetics.

johntmyers avatar johntmyers commented on May 30, 2024

This looks like a slightly older version of gretel-synthetics is running. Can you re-install that so you are at version 0.15.2?

from gretel-synthetics.

Satishkumar44 avatar Satishkumar44 commented on May 30, 2024

Yes after upgrading to 0.15.2 it works fine, But my issue is when iam trying to generate 2000 rows from the same data without preserving the primary key it is generating data duplicates in primary key column(REF_ID).Is there any way that primary key columns will not contain duplicates.

from gretel-synthetics.

johntmyers avatar johntmyers commented on May 30, 2024

There's currently no way to avoid duplicates unless you specifically provide a list of seeds like demonstrated above. Otherwise you can drop the primary key column, generate records, and add the primary key values back in.

from gretel-synthetics.

Satishkumar44 avatar Satishkumar44 commented on May 30, 2024

Thankyou for your support

from gretel-synthetics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.