Comments (3)
Hi @zalky
Thanks for your interest in the paper and code! The code to reproduce figure 3B is in scripts/viz/feature_activation_plots.R
and is discussed in #65 . Essentially, I believe that the size of the points given above obscures separation.
At a more fundamental level, the sample activation scores (for example, the ones given in encoded_rnaseq_onehidden_warmup_batchnorm.tsv
) are inherently unstable. The results depend on random initialization conditions prior to training. We are working on this issue currently. But for now, it would be incorrect to train a different Tybalt
model and think that the numerical designation of the latent encodings are consistent.
Thanks!
Greg
from tybalt.
Hi Greg, thanks for the response!
I agree that given the non-deterministic training model you would see some variation in results, but I was a bit surprised by the degree of difference. However, I think I have a lead on what is going on:
I was loading the encoded data in encoded_rnaseq_onehidden_warmup_batchnorm.tsv
with:
encoded_df = pd.read_table(encoded_file, index_col=0)
As is generally seen throughout the python code. Unfortunately I could not get this to reproduce the figures. However, if you load the encoded data without specifying the index column:
encoded_rnaseq_df = pd.read_table(encoded_file)
Then I can successfully reproduce Fig. 3B exactly (sans clinic colours) for encoding 53 and 66:
But by not specifying index_col=0
, this inserts the sample labels as the first encoding. When plotting one encoding against another, this means the encoding numbers will be off by one.
Going back, specifying index_col=0
, and re-plotting encoding 52 vs 65 re-produces Fig. 3B. Re-training the model from scratch also produces figures much more in line with the paper, as long as you take into account the off by one encoding numbers.
I haven't pursued this issue any further than this, but is it possible that somewhere, maybe in the R code, something may be loading the sample labels in the first encoding column, thereby resulting in the encoding labels being off by one?
from tybalt.
Glad this was figured out - I agree that this issue is a potential pitfall in analyses (see #86)- I will bump this up in priority. Thanks again!
from tybalt.
Related Issues (20)
- Simulation Experiments HOT 2
- Keras versioning error HOT 3
- Add R packages to environment.yml HOT 2
- Reorganize repository
- ADAGE Implementation Issues HOT 2
- Replace data in encoded_adage_features.tsv HOT 1
- Something wrong in extracting weights? HOT 3
- Sampling space for specific genes HOT 4
- Zero'd out training HOT 3
- Sampling distriubtions HOT 7
- Features that represent biological signals HOT 3
- t-SNT visualization HOT 2
- Matching pancancer expression to metadata HOT 4
- ERROR: VAE Model reconstruct the gene expression data HOT 4
- Error when setting up environment HOT 7
- Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative. HOT 2
- Modify Tybalt to handle missing values for incomplete data HOT 4
- Top n - High Weight Selection Method HOT 1
- MAD: mean or median? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tybalt.