The parameter \alpha controls the weight given to GW (as opposed to W). Originally, in MK_2021-09-07_fgw_comparison_gt
, it was set to 1e-3
. I'm playing with this parameter to see how the corresponding coupling matrix changes in moscot, POT and novospark. Note that, due to convergence issues, see #13, i'm using novospark=True
for moscot, and I'm keeping the regularization parameter fixed at \epsilon=1e-1
.
Also note that POT uses unregularized FGW while moscot uses entropically regularized FGW, so we don't expect results to be identical.
Summary
We're basically looking at Fig. 10 from Vayer et al., Algorithms 2020:
\alpha = 0 (regularized OT)
This is a reference to compare to - as alpha
approaches zero, we should converge to the pure, entropically regularized, optimal transport solution. Computed using POT, same epsilon
.
This looks weird because epsilon=0.1
is actually quite large regularization.
\alpha = 1e-3
moscot:
POT:
novosparc:
\alpha = 1e-2
moscot:
POT:
novosparc:
Gives a lot of numerical warnings (Warning: numerical errors at iteration 0
)
\alpha = 1e-1 (POT flipped)
moscot:
POT:
novosparc:
Numerical warnings
\alpha = 0.2
moscot:
POT:
novosparc:
Numerical warnings
\alpha = 0.5 (moscot and novosparc flipped)
moscot:
POT:
novosparc
Numerical warnings
\alpha = 0.9
moscot:
POT:
novosparc
Numerical warnings
\alpha = 1.0 (regularized GW)
POT doesn't work here, I'm calling entropic_gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', verbose=True, log=True, epsilon=epsilon)
and i'm getting Warning: numerical errors at iteration 0
and in the end:
I checked the underlying transport matrix, it actually returns just zeros.
Using instead unregularized GW via POT, it works and I get
However, regularized GW via OTT works, if I set epsilon
high enough, e.g. 2:
Note that we also had to increase numerical precision from float32 to float64 to get this in Jax.
Conclusion
\alpha controls our interpolation between W (\alpha = 0 ) and GW (\alpha = 1) losses. For low \alpha, we expect to see a coupling that focuses on feature similarity whereas for high alpha, we expect to see a coupling that focuses on structure similarity. This seems to work differently in POT and moscot. POT flipps from W to GW behaviour at around \alpha=0.1, whereas moscot flips at around \alpha=0.5. Also, the GW behaviour looks quite different in both methods.
Novosparc seems to not converge sometimes and likes to give lots of warnings.