We should check whether the definition of the cost function has a large impact on OT p

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you. <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Once we have results from <a class="user-mention notranslate" data-hovercard-type="use

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Also <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Support for various cost functions,about theislab/moscot

ManuelGander commented on August 21, 2024 1

Here is my presentation on dpt as a cost function for OT. Once I get results on the TedSim data I'll post them again here.
Using dpt-distance as transportation cost.pdf

from moscot.

ManuelGander commented on August 21, 2024 1

@zoepiran, I added it to moscot-lineage_reproducibility/notebooks/analysis_notebooks/tedsim/ (feel free to delete it). Once I get access to https://github.com/theislab/moscot_framework_analysis I'll add it there.

from moscot.

zoepiran commented on August 21, 2024 1

Thank you. @michalk8 and I are looking into it

from moscot.

michalk8 commented on August 21, 2024

@michalk8. Do we have a flexible way yet to define costs?

Not completely there yet, working on it.

from moscot.

Marius1311 commented on August 21, 2024

Once we have results from @ManuelGander, would be great to see your thoughts @michalk8 @MUCDK for how to allow flexible cost function definition, and how @ManuelGander can potentially implement the DPT approach in moscot.

from moscot.

Marius1311 commented on August 21, 2024

Hi @ManuelGander, what's the current state of this? How did it go with the TedSim data?

from moscot.

Marius1311 commented on August 21, 2024

@ManuelGander, what's the current status of this?

from moscot.

ManuelGander commented on August 21, 2024

Hi, I've prepare a few slides on where I am right now with the TedSim-data. I've tried to make it all as comprehensible as possible. (Ask if you have questions).
Update on dpt-distance.pdf

from moscot.

Marius1311 commented on August 21, 2024

Thanks @ManuelGander! Could you @michalk8 @zoepiran please take a look at Manuel's slide 3 - this simulation looks a bit weird to me. Would appreciate your input here.

from moscot.

Marius1311 commented on August 21, 2024

Also @ManuelGander, how do you compute the mean error at the moment? Do you use code from @zoepiran / @michalk8 ?

from moscot.

ManuelGander commented on August 21, 2024

Yes, I calculate the mean error just like Zoe. On slide 6 I also use Zoes approach, with the only difference being that I use dpt-distance instead of L2-distance when calculating the emd comparing the push of the inspected map to the push of the true map.

from moscot.

zoepiran commented on August 21, 2024

Can you share your code. Will be easier to verify?

from moscot.

ManuelGander commented on August 21, 2024

Yes, I can do so

from moscot.

Marius1311 commented on August 21, 2024

@ManuelGander, please push your code to our analysis repository (on a separate branch): https://github.com/theislab/moscot_framework_analysis

Please notify @zoepiran here in this issue once you pushed your code. Let's keep the discussion here so we can all be involved.

from moscot.

ManuelGander commented on August 21, 2024

Here is my appraoch to get realistic looking data:
PDF:TedSim-Overview_only.pdf
Powerpoint:TedSim-Overview_only.pptx

from moscot.

zoepiran commented on August 21, 2024

As detailed on Mattermost; examining TedSim basics don't think realistic looking is actually expected..
Though i am still not certain what is the actual proper way to cut the tree\ link cells <-> tree nodes I am quite convinced that the umap will not necessarily depict the node depth given the TedSim construction.
The basic assumption is that branches follow what they call in metadata depth, and as you can see in their Figure 2 and we also observed it locally - once we color umap by depth we indeed get a gradient but acual depth in he tree is not correlated with that - - thus will be seen as random noise..

from moscot.

ManuelGander commented on August 21, 2024

I'm actually not using the node depth as time points, but the depth. And in order to obtain a true coupling I use the tree. I have to remove all the cells that do not form an edge going from my start/end depth (under this model, these are cells whose progenitors /descendants I don't know). By changing the parameter max_step and step_size in TedSim I am left with enough cells to get this working. The PDF / Powerpoint illustrates this.

from moscot.

zoepiran commented on August 21, 2024

i get your point - - and that is what i was relating to in terms of visualization but need to think whether it is valid to use this as a time_point. Not certain atm

from moscot.

zoepiran commented on August 21, 2024

@michalk8 @Marius1311
one motivation against is that also TedSim peeps in their LOT benchmarking used the tree depth. if we experimentally take a snapshot a t a time point we will get a mixture of the adata.obs['depth'] (we will have samples from each branch)

from moscot.

ManuelGander commented on August 21, 2024

I think you are right, the TedSim-data was designed the way you use it. If I use it differently I cannot really trust any results I obtain.
Once we are sure about the tree structure I'll redo my analysis.

from moscot.

ManuelGander commented on August 21, 2024

I redid the analysis with the new tree and the results are the same as before: the L2-map slightly outperforming the dpt-map (where the mean error is measured identically as in moscot-Lineage).
L2_vs_dpt_TedSim_Zoes_tree.pdf
.

from moscot.

Marius1311 commented on August 21, 2024

Thanks @ManuelGander! Is this result robust across epsilon values? Also, how many neighbors did you use for KNN graph construction, and in which space was the graph constructed?

from moscot.

ManuelGander commented on August 21, 2024

It was robust to epsilon values (I checked 0.01, 0.005, 0.001), where smaller epsilon slightly increase the accuracy for both L2 and dpt. I chose 20 neighbors in the calculation the knn-graph for dpt-distance, I didn't do any validation for a different number of neighbors. I'll check if choosing a different amount changes accuracy. And I did everything in 30-dimensional PCA. I'll check as well if representation in scVI latent space is beneficial.

from moscot.

ManuelGander commented on August 21, 2024

Hi, I did the analysis for varying Nearest Neighbors (NN) and for scVI representation and the results are:

the number of NN has only an effect if it is too small (NN>25 seems to be necessary)
scVI representation outperforms PCA

(Here is the slide with the graphs: PCA vs scVi and nn.pdf)

I next combined L2- and dpt-cost linearly, which improves the accuracy of the resulting transport map!
I prepared a slide with the graphs for this as well: dpt+L2-graphs.pdf

:

from moscot.

ManuelGander commented on August 21, 2024

I forgot to mention: the mean error I use as the validation metric is exactly the same Zoe uses

from moscot.

ManuelGander commented on August 21, 2024

I'll next try if this linear combination of cost matrices also improves Geodesics for the WOT-data set

from moscot.

zoepiran commented on August 21, 2024

If its okay by @Marius1311 I will be happy to have a joint discussion (maybe next week when I am around :) ) in order to optimize this analysis also wrt to the complete story - For example which results are task specific (e.g. time) data specific (simulated\real\single cell\spatial .. ) and what are the optimal benchmarks (I have my doubts in TedSim ..) to test claims on.

from moscot.

ManuelGander commented on August 21, 2024

Yes, we can do that. I too find the TedSim data not ideal to test anything on but I don't have any other working validation strategy that I trust. I was thinking that maybe I can use the C. elegans data, and do you know any other simulated scRNA data wrt. cell development?

from moscot.

ManuelGander commented on August 21, 2024

In the meantime I did the validation on the WOT-data by Geodesic Interpolation and the result is that like in all the approaches before, nothing performs significantly better or worse than L2: Dpt+L2-Geodesics.pdf

from moscot.

Marius1311 commented on August 21, 2024

@ManuelGander, one concrete conclusion from this seems to be that at least for TedSim data, PCA seems to work better compared to scVI? Is that a statement we can conclude?

from moscot.

Marius1311 commented on August 21, 2024

BTW, I do think the idea with C.elegans is good; however, let's not go for this now but let's stick to what we agreed upon yesterday. If we do want to back to this in the future, doing this on C.elegans data is a good idea because it's more realistic compared to TedSim but has better ground truth compared to WOT data.

from moscot.

ManuelGander commented on August 21, 2024

@ManuelGander, one concrete conclusion from this seems to be that at least for TedSim data, PCA seems to work better compared to scVI? Is that a statement we can conclude?

No, scVI performs better than PCA. (Sorry, I haven't explicitly shown that before.) I base the claim that scVI works better than PCA on this: scVI vs PCA (1).pdf
:

from moscot.

Marius1311 commented on August 21, 2024

Okay, sorry, that's what I actually meant, I just wrote it the wrong way round. Thanks for clearifying this!

from moscot.

MUCDK commented on August 21, 2024

@ManuelGander @Marius1311 do we want to include this? If yes we should talk about implementation soon.

from moscot.

ManuelGander commented on August 21, 2024

I think this needs more experimentation and validation. Alex Tong gave us really valuable input on how to validate by geodesics more efficiently using barycenters and I'm eager to compare L2 to his graph-based approach and to using dpt for distance. I'm about 60% done with the Qiu-data set, so I could work on this in a week or two.

from moscot.

Marius1311 commented on August 21, 2024

alright!

from moscot.

michalk8 commented on August 21, 2024

It should be possible to pass/register custom cost functions after #492. Closing this due to inactivity.

from moscot.

Support for various cost functions about moscot HOT 37 CLOSED

Comments (37)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent