Subsampling loo diff_srs_est code differs from eq (9) about loo HOT 11 CLOSED

avehtari commented on June 2, 2024

Subsampling loo diff_srs_est code differs from eq (9)

from loo.

Comments (11)

MansMeg commented on June 2, 2024

Which paper are you refering to?

from loo.

avehtari commented on June 2, 2024

Our paper with subsampling loo with difference estimator http://proceedings.mlr.press/v108/magnusson20a.html (I thought you would still remember that 😆 )

from loo.

MansMeg commented on June 2, 2024

Ha ha! Yes. I just need to refresh my mind. It was four years since I wrote that code. =)

from loo.

avehtari commented on June 2, 2024

Additionl notes

In the supplementary material eq (6)

has minus sign but the eq (9) in main text doesn't
has a curly bracket denoting a, but that curly bracket doesn't include 1/n. Should it include that?
below \sum_i^n \pi_i^2 - \tilde{pi}_i^2 seems to be missing parens or another sum
below \hat{a} includes 1/n, but the curly bracket did not

The supplementary material eq (7) has

1/n^2, but that doesn't show in eq (9) of the main text

Currently, it is possible that a is negative even if it is estimating something positive, which can make \hat{sigma}_{diff,loo}^2 (eq (9) in main text) also negative. Also, I think that \hat{sigma}_{diff,loo}^2 (eq (9)) should never be smaller than eq (8) (in main text), or should include that eq (8) to take into account the uncertainty.

from loo.

MansMeg commented on June 2, 2024

Yes. I will need to check this asap after newyears. I think the supplementary material is the best since there is where we prove that the estimator is unbiased.

I think, theoretically, that we can get negative estimates, but need to check the proof.

from loo.

MansMeg commented on June 2, 2024

I have now gone through it, and I think the deviation of the unbiased estimator in the supplementary material makes sense. There is one minor error, the a-bracket should cover also 1/n (as you mention). Also, t_e is not formally defined, but between Eq. 10 and Eq. 11 in the supplementary material, it is implicitly used as the total residual error. So, when I went through the supplementary material. It looks correct to me. Except for the bracket.

Next step is going from the supplementary material to see if there is an error between the supplementary material and Eq. 9. The first line of Eq. 9 seem to be \hat{a} in the supplementary material. Hence there should be a minus, not a plus, in Eq 9. Also this error result in that also the second + in the Eq. 9 should be -, as I see it.

Also, there is a slight difference between sigma_loo and sigma_{loo,diff}, in that sigma_{loo,diff} = n sigma_loo (the aggregated variance of the total vs per observation).

I have tried to write this down in this document (mainly p. 4-5). Please look at the derivation and see if you agree with the derivation of the estimator.

Below I try to answer your comments:

In the supplementary material eq (6) has minus sign but the eq (9) in main text doesn't

I think Eq. 6 is correct given that the definition of sigma^2_loo in Eq. 5 is correct, which I think it is.

has a curly bracket denoting a, but that curly bracket doesn't include 1/n. Should it include that?

Yes. I think that is an error. If we look at the estimation of a, that is \hat{a}, it is the mean (ie. 1/n). We also see it in the expectation of \hat{a}, which includes 1/n. Hence, the bracket in Eq. 6 is missing the 1/n.

below \sum_i^n \pi_i^2 - \tilde{pi}_i^2 seems to be missing parens or another sum

Which equation is this?

below \hat{a} includes 1/n, but the curly bracket did not

Yes. I think the curly bracket in Eq. 6 is missing the 1/n.

The supplementary material eq (7) has 1/n^2, but that doesn't show in eq (9) of the main text

The is the difference between the aggregated variance (the total) and the variance per observation.

from loo.

avehtari commented on June 2, 2024

Thanks!

below \sum_i^n \pi_i^2 - \tilde{pi}_i^2 seems to be missing parens or another sum

Which equation is this?

This is the reason you should number all the equations. After eq (6) there is a three line paragraph, and the last line of that paragraph has \sum_i^n \pi_i^2 - \tilde{pi}_i^2. As there are no parens the sum is only over \pi_i^2, but that doesn't make sense as the latter term depends also on i. So it should be either \sum_i^n (\pi_i^2 - \tilde{pi}_i^2) or \sum_i^n \pi_i^2 - \sum_i^n \tilde{pi}_i^2 (which are equivalent).

What do you think of my comment on modifying the estimate to guarantee positivity?

from loo.

MansMeg commented on June 2, 2024

Yes, in that paragraph, it is a missing parenthesis. As you say, it is given by the context but is unclear.

Re: ensure positivity.
So, we prove that the estimator is unbiased wrt sigma_loo. Hence, if we ensure positivity, it will no longer be unbiased. That said, it would probably have a smaller MSE. However, in this setting, the easiest solution is to recommend taking a larger sample since potential negative estimates would come from the sampling variance of the estimator.

Have you been able to check my decision to see if you also agree? When you are on board with the derivation there, I can take a pass on the code. I try to avoid looking at the code before we trust the derivation.

from loo.

avehtari commented on June 2, 2024

So, we prove that the estimator is unbiased wrt sigma_loo. Hence, if we ensure positivity, it will no longer be unbiased. That said, it would probably have a smaller MSE. However, in this setting, the easiest solution is to recommend taking a larger sample since potential negative estimates would come from the sampling variance of the estimator.

In projpred rerunning search can be very costly, and thus taking a larger sample is not the easy option. It would be better to include the sampling variance of the estimator, as our uncertainty about the accuracy should be affected also by the sampling variance of the estimator.

The main article Section 2 says: "we propose to use the difference estimator and simple random sampling
without replacement (SRS)", and the code agrees with that. The supplement and the new pdf have "and the probability of subsampling observation i is 1/n, i.e. the subsample is uniform with replacement."

In sigma_loo.pdf eqs (20) and (21) both have \hat{b}, but eq (20) is two first terms of \hat{b} and (21) is two last terms of \hat{b}. Same for (23) and (24)

I did not find other errors

from loo.

MansMeg commented on June 2, 2024

In projpred rerunning a search can be very costly, and thus, taking a larger sample is not the easy option. It would be better to include the sampling variance of the estimator, as our uncertainty about the accuracy should be affected also by the sampling variance of the estimator.

Ok. I guess you can do that in the projpred setting? From the function, I think you get both.

The main article Section 2 says: "we propose to use the difference estimator and simple random sampling
without replacement (SRS)", and the code agrees with that. The supplement and the new pdf have "and the probability of subsampling observation i is 1/n, i.e. the subsample is uniform with replacement."

Hmmm. Yes. That is a difference and I think we would need to change this to inclusion probability instead. I need to look in my old sampling theory books.

In sigma_loo.pdf eqs (20) and (21) both have \hat{b}, but eq (20) is two first terms of \hat{b} and (21) is two last terms of \hat{b}. Same for (23) and (24)

Yes. That is because I split up \hat{b} into the two components to reflect the estimator in the main text. I could call it \hat{b}_1 and \hat{b}_2 to make it more clear.

from loo.

avehtari commented on June 2, 2024

Closed by #238

from loo.

Subsampling loo diff_srs_est code differs from eq (9) about loo HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent