gsbdbi / ds-wgan Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 17.0 14.6 MB

Design of Simulations using WGAN

License: MIT License

Python 100.00%

ds-wgan's People

Contributors

Stargazers

Watchers

Forkers

hsheldah mindis thabangdlebese yuhsienliu sdjoko cleeway hdccer chrigraf jhpark9090 techthiyanes apoorvalal mneunhoe feadebekun2

ds-wgan's Issues

Categorical context variables

Is there an easy way to use/implement categorical context variables?
Is my understanding correct that all context_vars will implicitly be treated as continuous, such that I should (?) turn them into dummy variables manually beforehand if they are categorical (and take on more than 2 values)? Or is there a reason to prefer treating categorical context variables as continuous?

Issues with the GAN simulation output

The GAN output for the welfare data is creating skewed binary variables for almost all of them. For example, race is either 1, 2 or 3 in the original data, but in the generated data is it 0, 1, or 2:

race_welfare	race_gan
1	0
2	1
3	2

Similarly, the numbering is off for categorical variables with many categories (below is just the first 20 categories for the indus80 variable):

indus_80_welfare	indus_80_gan
10	0
11	1
20	2
21	3
30	4
31	5
40	6
41	7
42	8
50	9
60	10
100	11
101	12
102	13
110	14
111	15
112	16
120	17
121	18
122	19

Package installation error tutorial

When running the second line of the third block in the Google Colab notebook for the tutorial

!pip3 install git+https://github.com/gsbDBI/ds-wgan.git@package#egg=wgan

installation fails and gives the error

"Did not find a branch or tag 'package', assuming 'revision' or 'ref'".

Using only categorical variables

Currently a continuous variable is required for wgan training. Would it be possible to add a fix such that we can train with just categorical variables? i.e. leaving the continuous variable as empty.

Feature request: generate data (after training) without access to real data (only the relevant summary statistics)

It would be really nice if it was possible to use the generator to generate data without access to the real data. In particular, to get the scaling/centering and variable names right in the deprocess function, it would be nice if it was possible to have a function of the package to save (and load) those aspects of the data wrapper that are truly needed (variable names and types, means/standard deviations/values for categorical variables?).
It seems like that should in principle be possible such that the user does not need to continue having access to/loading the real data when they want to generate artificial data?

(Simulating one very large data set once after training while the real data is still loaded isn't always a great option, in particular when considering very large samples)

Include versions in requirements.txt

The current requirements.txt file does not specify versions.

However, this line does not run with torch <= 1.0.0.

The reason is that the output of max is a tuple in versions 1.0.0 and older, but it is a namedtuple in version 1.1.0 and up. The line linked above relies on the output being a namedtuple.

Feature request: integer data

it would be great if there was a way to simulate integer / ordered categorical data, say age in years. Treating it as a categorical variable seems to yield data sets where other variables are less smooth in age than desired and probably also increases the complexity of the training task (by turning each value into a dummy?). Treating it as a continuous variable requires rounding ex post, but ideally the rounding would happen even in training?

Y|X, t training failed

Hi,

I am trying to generate many several Y's simultaneously (some are binary and some are continuous) based on X, t. It tuened out that the training failed since the test error and training loss will both blow up to infinity. It works if I want to generate only continuous Y's.

I include both my data and the jupyter notebook file that I use based on your colab example. Could you kindly have a look? Thanks!
example.zip

gsbdbi / ds-wgan Goto Github PK

ds-wgan's People

Contributors

Stargazers

Watchers

Forkers

ds-wgan's Issues

Categorical context variables

Issues with the GAN simulation output

Package installation error tutorial

Using only categorical variables

Feature request: generate data (after training) without access to real data (only the relevant summary statistics)

Include versions in requirements.txt

Feature request: integer data

Y|X, t training failed

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

indus_80_welfare	indus_80_gan
10	0
11	1
20	2
21	3
30	4
31	5
40	6
41	7
42	8
50	9
60	10
100	11
101	12
102	13
110	14
111	15
112	16
120	17
121	18
122	19

indus_80_welfare	indus_80_gan
10	0
11	1
20	2
21	3
30	4
31	5
40	6
41	7
42	8
50	9
60	10
100	11
101	12
102	13
110	14
111	15
112	16
120	17
121	18
122	19

indus_80_welfare	indus_80_gan
10	0
11	1
20	2
21	3
30	4
31	5
40	6
41	7
42	8
50	9
60	10
100	11
101	12
102	13
110	14
111	15
112	16
120	17
121	18
122	19