Giter Site home page Giter Site logo

ds-wgan's People

Contributors

carolinthomas avatar evanmunro avatar halflearned avatar jonas-metzger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ds-wgan's Issues

Categorical context variables

Is there an easy way to use/implement categorical context variables?
Is my understanding correct that all context_vars will implicitly be treated as continuous, such that I should (?) turn them into dummy variables manually beforehand if they are categorical (and take on more than 2 values)? Or is there a reason to prefer treating categorical context variables as continuous?

Issues with the GAN simulation output

The GAN output for the welfare data is creating skewed binary variables for almost all of them. For example, race is either 1, 2 or 3 in the original data, but in the generated data is it 0, 1, or 2:

race_welfare race_gan
1 0
2 1
3 2

Similarly, the numbering is off for categorical variables with many categories (below is just the first 20 categories for the indus80 variable):

indus_80_welfare indus_80_gan
10 0
11 1
20 2
21 3
30 4
31 5
40 6
41 7
42 8
50 9
60 10
100 11
101 12
102 13
110 14
111 15
112 16
120 17
121 18
122 19

Package installation error tutorial

When running the second line of the third block in the Google Colab notebook for the tutorial

!pip3 install git+https://github.com/gsbDBI/ds-wgan.git@package#egg=wgan

installation fails and gives the error

"Did not find a branch or tag 'package', assuming 'revision' or 'ref'".

Using only categorical variables

Currently a continuous variable is required for wgan training. Would it be possible to add a fix such that we can train with just categorical variables? i.e. leaving the continuous variable as empty.

Feature request: generate data (after training) without access to real data (only the relevant summary statistics)

It would be really nice if it was possible to use the generator to generate data without access to the real data. In particular, to get the scaling/centering and variable names right in the deprocess function, it would be nice if it was possible to have a function of the package to save (and load) those aspects of the data wrapper that are truly needed (variable names and types, means/standard deviations/values for categorical variables?).
It seems like that should in principle be possible such that the user does not need to continue having access to/loading the real data when they want to generate artificial data?

(Simulating one very large data set once after training while the real data is still loaded isn't always a great option, in particular when considering very large samples)

Include versions in requirements.txt

The current requirements.txt file does not specify versions.

However, this line does not run with torch <= 1.0.0.

The reason is that the output of max is a tuple in versions 1.0.0 and older, but it is a namedtuple in version 1.1.0 and up. The line linked above relies on the output being a namedtuple.

Feature request: integer data

it would be great if there was a way to simulate integer / ordered categorical data, say age in years. Treating it as a categorical variable seems to yield data sets where other variables are less smooth in age than desired and probably also increases the complexity of the training task (by turning each value into a dummy?). Treating it as a continuous variable requires rounding ex post, but ideally the rounding would happen even in training?

Y|X, t training failed

Hi,

I am trying to generate many several Y's simultaneously (some are binary and some are continuous) based on X, t. It tuened out that the training failed since the test error and training loss will both blow up to infinity. It works if I want to generate only continuous Y's.

I include both my data and the jupyter notebook file that I use based on your colab example. Could you kindly have a look? Thanks!
example.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.