Giter Site home page Giter Site logo

Reproducing separation about tybalt HOT 3 CLOSED

greenelab avatar greenelab commented on June 12, 2024
Reproducing separation

from tybalt.

Comments (3)

gwaybio avatar gwaybio commented on June 12, 2024

Hi @zalky

Thanks for your interest in the paper and code! The code to reproduce figure 3B is in scripts/viz/feature_activation_plots.R and is discussed in #65 . Essentially, I believe that the size of the points given above obscures separation.

At a more fundamental level, the sample activation scores (for example, the ones given in encoded_rnaseq_onehidden_warmup_batchnorm.tsv) are inherently unstable. The results depend on random initialization conditions prior to training. We are working on this issue currently. But for now, it would be incorrect to train a different Tybalt model and think that the numerical designation of the latent encodings are consistent.

Thanks!
Greg

from tybalt.

zalky avatar zalky commented on June 12, 2024

Hi Greg, thanks for the response!

I agree that given the non-deterministic training model you would see some variation in results, but I was a bit surprised by the degree of difference. However, I think I have a lead on what is going on:

I was loading the encoded data in encoded_rnaseq_onehidden_warmup_batchnorm.tsv with:

encoded_df = pd.read_table(encoded_file, index_col=0)

As is generally seen throughout the python code. Unfortunately I could not get this to reproduce the figures. However, if you load the encoded data without specifying the index column:

encoded_rnaseq_df = pd.read_table(encoded_file)

Then I can successfully reproduce Fig. 3B exactly (sans clinic colours) for encoding 53 and 66:

screen shot 2018-04-04 at 1 03 47 pm

But by not specifying index_col=0, this inserts the sample labels as the first encoding. When plotting one encoding against another, this means the encoding numbers will be off by one.

Going back, specifying index_col=0, and re-plotting encoding 52 vs 65 re-produces Fig. 3B. Re-training the model from scratch also produces figures much more in line with the paper, as long as you take into account the off by one encoding numbers.

I haven't pursued this issue any further than this, but is it possible that somewhere, maybe in the R code, something may be loading the sample labels in the first encoding column, thereby resulting in the encoding labels being off by one?

from tybalt.

gwaybio avatar gwaybio commented on June 12, 2024

Glad this was figured out - I agree that this issue is a potential pitfall in analyses (see #86)- I will bump this up in priority. Thanks again!

from tybalt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.