Comments (37)
Here is my presentation on dpt as a cost function for OT. Once I get results on the TedSim data I'll post them again here.
Using dpt-distance as transportation cost.pdf
from moscot.
@zoepiran, I added it to moscot-lineage_reproducibility/notebooks/analysis_notebooks/tedsim/ (feel free to delete it). Once I get access to https://github.com/theislab/moscot_framework_analysis I'll add it there.
from moscot.
Thank you. @michalk8 and I are looking into it
from moscot.
@michalk8. Do we have a flexible way yet to define costs?
Not completely there yet, working on it.
from moscot.
Once we have results from @ManuelGander, would be great to see your thoughts @michalk8 @MUCDK for how to allow flexible cost function definition, and how @ManuelGander can potentially implement the DPT approach in moscot.
from moscot.
Hi @ManuelGander, what's the current state of this? How did it go with the TedSim data?
from moscot.
@ManuelGander, what's the current status of this?
from moscot.
Hi, I've prepare a few slides on where I am right now with the TedSim-data. I've tried to make it all as comprehensible as possible. (Ask if you have questions).
Update on dpt-distance.pdf
from moscot.
Thanks @ManuelGander! Could you @michalk8 @zoepiran please take a look at Manuel's slide 3 - this simulation looks a bit weird to me. Would appreciate your input here.
from moscot.
Also @ManuelGander, how do you compute the mean error at the moment? Do you use code from @zoepiran / @michalk8 ?
from moscot.
Yes, I calculate the mean error just like Zoe. On slide 6 I also use Zoes approach, with the only difference being that I use dpt-distance instead of L2-distance when calculating the emd comparing the push of the inspected map to the push of the true map.
from moscot.
Can you share your code. Will be easier to verify?
from moscot.
Yes, I can do so
from moscot.
@ManuelGander, please push your code to our analysis repository (on a separate branch): https://github.com/theislab/moscot_framework_analysis
Please notify @zoepiran here in this issue once you pushed your code. Let's keep the discussion here so we can all be involved.
from moscot.
Here is my appraoch to get realistic looking data:
PDF:TedSim-Overview_only.pdf
Powerpoint:TedSim-Overview_only.pptx
from moscot.
As detailed on Mattermost; examining TedSim
basics don't think realistic looking is actually expected..
Though i am still not certain what is the actual proper way to cut the tree\ link cells
<-> tree nodes
I am quite convinced that the umap
will not necessarily depict the node depth given the TedSim
construction.
The basic assumption is that branches follow what they call in metadata
depth
, and as you can see in their Figure 2 and we also observed it locally - once we color umap
by depth
we indeed get a gradient but acual depth in he tree is not correlated with that - - thus will be seen as random noise..
from moscot.
I'm actually not using the node depth as time points, but the depth. And in order to obtain a true coupling I use the tree. I have to remove all the cells that do not form an edge going from my start/end depth (under this model, these are cells whose progenitors /descendants I don't know). By changing the parameter max_step and step_size in TedSim I am left with enough cells to get this working. The PDF / Powerpoint illustrates this.
from moscot.
i get your point - - and that is what i was relating to in terms of visualization
but need to think whether it is valid to use this as a time_point
. Not certain atm
from moscot.
@michalk8 @Marius1311
one motivation against is that also TedSim
peeps in their LOT
benchmarking used the tree depth. if we experimentally take a snapshot a t a time point we will get a mixture of the adata.obs['depth']
(we will have samples from each branch)
from moscot.
I think you are right, the TedSim-data was designed the way you use it. If I use it differently I cannot really trust any results I obtain.
Once we are sure about the tree structure I'll redo my analysis.
from moscot.
I redid the analysis with the new tree and the results are the same as before: the L2-map slightly outperforming the dpt-map (where the mean error is measured identically as in moscot-Lineage).
L2_vs_dpt_TedSim_Zoes_tree.pdf
.
from moscot.
Thanks @ManuelGander! Is this result robust across epsilon values? Also, how many neighbors did you use for KNN graph construction, and in which space was the graph constructed?
from moscot.
It was robust to epsilon values (I checked 0.01, 0.005, 0.001), where smaller epsilon slightly increase the accuracy for both L2 and dpt. I chose 20 neighbors in the calculation the knn-graph for dpt-distance, I didn't do any validation for a different number of neighbors. I'll check if choosing a different amount changes accuracy. And I did everything in 30-dimensional PCA. I'll check as well if representation in scVI latent space is beneficial.
from moscot.
Hi, I did the analysis for varying Nearest Neighbors (NN) and for scVI representation and the results are:
- the number of NN has only an effect if it is too small (NN>25 seems to be necessary)
- scVI representation outperforms PCA
(Here is the slide with the graphs: PCA vs scVi and nn.pdf)
I next combined L2- and dpt-cost linearly, which improves the accuracy of the resulting transport map!
I prepared a slide with the graphs for this as well: dpt+L2-graphs.pdf
:
from moscot.
I forgot to mention: the mean error I use as the validation metric is exactly the same Zoe uses
from moscot.
I'll next try if this linear combination of cost matrices also improves Geodesics for the WOT-data set
from moscot.
If its okay by @Marius1311 I will be happy to have a joint discussion (maybe next week when I am around :) ) in order to optimize this analysis also wrt to the complete story - For example which results are task specific (e.g. time) data specific (simulated\real\single cell\spatial .. ) and what are the optimal benchmarks (I have my doubts in TedSim ..) to test claims on.
from moscot.
Yes, we can do that. I too find the TedSim data not ideal to test anything on but I don't have any other working validation strategy that I trust. I was thinking that maybe I can use the C. elegans data, and do you know any other simulated scRNA data wrt. cell development?
from moscot.
In the meantime I did the validation on the WOT-data by Geodesic Interpolation and the result is that like in all the approaches before, nothing performs significantly better or worse than L2: Dpt+L2-Geodesics.pdf
from moscot.
@ManuelGander, one concrete conclusion from this seems to be that at least for TedSim data, PCA seems to work better compared to scVI? Is that a statement we can conclude?
from moscot.
BTW, I do think the idea with C.elegans is good; however, let's not go for this now but let's stick to what we agreed upon yesterday. If we do want to back to this in the future, doing this on C.elegans data is a good idea because it's more realistic compared to TedSim but has better ground truth compared to WOT data.
from moscot.
@ManuelGander, one concrete conclusion from this seems to be that at least for TedSim data, PCA seems to work better compared to scVI? Is that a statement we can conclude?
No, scVI performs better than PCA. (Sorry, I haven't explicitly shown that before.) I base the claim that scVI works better than PCA on this: scVI vs PCA (1).pdf
:
from moscot.
Okay, sorry, that's what I actually meant, I just wrote it the wrong way round. Thanks for clearifying this!
from moscot.
@ManuelGander @Marius1311 do we want to include this? If yes we should talk about implementation soon.
from moscot.
I think this needs more experimentation and validation. Alex Tong gave us really valuable input on how to validate by geodesics more efficiently using barycenters and I'm eager to compare L2 to his graph-based approach and to using dpt for distance. I'm about 60% done with the Qiu-data set, so I could work on this in a week or two.
from moscot.
alright!
from moscot.
It should be possible to pass/register custom cost functions after #492. Closing this due to inactivity.
from moscot.
Related Issues (20)
- Unify the way `push/pull` results are saved.
- Jit the `apply` function
- Contributing guide
- adapt landing page of moscot-tools.org
- Pull/push in a batch-wise fashion HOT 3
- Spatial_warp not found when running Alignment problem tutorial HOT 6
- prepare method does not warn about unknown parameters HOT 3
- update check for categorical dtype HOT 1
- batch_size not working for temporal_problem.cell_transition HOT 1
- Add `.copy` method
- wrong literals for `set_x`
- Adapt new low-rank GW solvers from OTT HOT 1
- `cell_transition` changed
- Missing `rich` requirement in `pyproject.toml` HOT 2
- symbol column key argument when using gene sets HOT 1
- adapt `moscot.plotting.pull/push` header HOT 1
- Improve code quality in tests `test_pass_arguments`
- Contributing guide in `moscot_notebooks`
- literal types for costs are wrong
- Dataset loading for moslin fails
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moscot.