Comments (3)
Hi @sanketahegde, thanks for trying out DGAN and asking questions! In general, DGAN is quite good for biosignals when sufficient training data is available, but I know ECG data has very specific properties that need to be preserved. To get the most out of DGAN, I'd recommend thinking about the following items:
-
What is an example? That is, how long are the sequences that DGAN independently generates (
max_sequence_len
parameter)? Generally speaking, shorter sequences are easier. I'd try sequences that only contain 2-3 heartbeats (maybe even just 1) to start, and then expand to longer sequences once you have the shorter sequences working. -
How much data do you have? DGAN really excels when there's lots of training examples, 10k or more, maybe even target 100k+ to learn the intricacies of ECGs. So if you use 2 seconds of the ECG sampling as the example length, that's 100k 2-second snippets. With time series, you can do sliding windows to increase training examples if you're splitting up longer sequences. But that may also make the model learning task a bit harder if each training sequences starts at very different points in the ECG period. Definitely experiment with different ways to construct the training data.
-
Hyperparameters are absolutely key. It's great you've explored some hyperparameter tuning. I've found the most impactful parameters to explore are learning rates and epochs. Besides finding the right order of magnitude, DGAN can be fairly sensitive to even 30% changes in these values, so doing a thorough exploration with grid search, or using a library like optuna can be really powerful. And of course having a good metric to optimize for is critical. There's not really a loss that can be used for early stopping with GANs, so utilizing metrics related to ECGs would be best.
Hope that provides some experiment ideas. And if you you're willing to share a notebook or code snippet of how you're setting up the training data and the model, I'm happy to take a look to see if there are any more specific recommendations.
from gretel-synthetics.
Hi @kboyd ,
Thank you very much for your detailed reply with suggestions.
As my work with DGAN is on hold, I shall try to apply your suggestions and update here if I get some better results.
from gretel-synthetics.
This is an interesting discussion.
I have been running some basic experiments on my TS data using DGAN. My main goal is to create synthetic time series while keeping (as best possible) the fidelity and flexibility properties of my data (i.e., as stated by the original authors of the method in their paper). However, there's no free lunch, and for my particular case, having ~ 2k to 2.5k data samples of max_sequence_len = 24
is the best I can do, due to the hourly resolution of my data. Hence, following recommendations from @kboyd, I mostly rely on (3) to enhance, as much as possible, the fidelity and flexibility of the synthetic samples.
Finally, does DGAN implementation allow to use a seed S
to generate N
number of samples each time with a different seed? That is, assuming I have 2k new 24-hour synthetic TS samples, I would like to use a new seed S
to generate a new set of 2k synthetic samples, ... , and so on. I assume a new run of DGAN would approximate this behavior, right?
Comments/feedback on these questions would be appreciated.
Thanks!
from gretel-synthetics.
Related Issues (20)
- [BUG] Incompatability with package dependence HOT 2
- timeseries_dgan.ipynb example - error from train_numpy HOT 2
- TypeError: __init__() got an unexpected keyword argument 'prefetch_factor' HOT 1
- Poor training results HOT 6
- TooManyInvalidError: Maximum number of invalid lines reached! HOT 3
- [BUG] train_numpy() got multiple values for argument 'feature_types' - dgan HOT 4
- [FR] Generation based on given attributes HOT 2
- [FR / BUG] HOT 2
- Bug HOT 5
- Sample_len Value HOT 2
- Results about DGAN
- [BUG] : Loading a trained model and generating synthetic data throws an error HOT 8
- About DoppelGANger training results HOT 1
- [BUG]: Outdated category_encoders HOT 3
- List index out of range HOT 4
- ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'[FR / BUG] HOT 1
- [BUG] example notebook error HOT 3
- Marketoptiontend-analysis
- Logging the Performance of Time series DGAN,
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gretel-synthetics.