Comments (4)
How will we pick the default max_line_len? If I use a financial dataset as ref for avg field length, 2048 equates to 300 fields, If I use healthcare data, 2028 equates to 146 fields.
I like the default epochs, and the recommended range seems dead on. Do we alter the epochs if either model peaks before their chosen epoch or looks like it's still improving when they end?
The difference between seq_length and max_line_len is confusing. Maybe more explanation.
Should field_delimiter default be none? If it's structured data and they don't specify do we try and auto-deduce?
In vocab_size, this is really the max vocab size. The tokenizer may choose a much smaller number.
On DP, should we say the model eps and delta will be displayed once training completes, or let them figure that out?
Maybe start gen_temp with "This parameter is used to control the randomness of predictions by scaling the logits before applying softmax." then continue with the rest of your explanation.
from gretel-synthetics.
Initial checkin for docstrings are here:
gretel-synthetics/src/gretel_synthetics/config.py
Lines 31 to 100 in 5992209
from gretel-synthetics.
Added Google style doc strings to all config params - linked below.
@amysteier can you review?
gretel-synthetics/src/gretel_synthetics/config.py
Lines 31 to 100 in 838b3d8
from gretel-synthetics.
@amysteier re: default epochs, we can add a Keras callback function that checks for changes in model training loss or accuracy, and stops training after a certain point. Let's add this to roadmap.
Re: field delimiter- I'd recommend keeping it as an optional parameter. it isn't strictly necessary to specify, it just improves synthetic data performance when you set it.
Re: vocab_size, gen_temp, and seq_length - good call-outs, will clarify in doc strings.
from gretel-synthetics.
Related Issues (20)
- [BUG] Incompatability with package dependence HOT 2
- timeseries_dgan.ipynb example - error from train_numpy HOT 2
- TypeError: __init__() got an unexpected keyword argument 'prefetch_factor' HOT 1
- Poor training results HOT 6
- TooManyInvalidError: Maximum number of invalid lines reached! HOT 3
- [BUG] train_numpy() got multiple values for argument 'feature_types' - dgan HOT 4
- [FR] Generation based on given attributes HOT 2
- [FR / BUG] HOT 2
- Bug HOT 5
- Sample_len Value HOT 2
- Results about DGAN
- [BUG] : Loading a trained model and generating synthetic data throws an error HOT 8
- About DoppelGANger training results HOT 1
- [BUG]: Outdated category_encoders HOT 3
- List index out of range HOT 4
- ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'[FR / BUG] HOT 1
- [BUG] example notebook error HOT 3
- Marketoptiontend-analysis
- DGAN for ECG dataset HOT 3
- Logging the Performance of Time series DGAN,
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gretel-synthetics.