Hi Alevin-fry developers, I am currently working on a combinatory in

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

How to keep all the cellbarcodes? about alevin-fry HOT 4 CLOSED

combine-lab commented on June 15, 2024

How to keep all the cellbarcodes?

from alevin-fry.

Comments (4)

rob-p commented on June 15, 2024

Hi @BingjieZhang,

Thanks for your description. The issue here has nothing to do with the invocation of the alevin command, but the fact that you are using the knee filtering in the alevin-fry generate-permit-list command. The knee filtering will only let in high read abundance (proxy for high confidence) cells.

You have a few options to get a less filtered list. One option is to look at the knee plot and then run again using —force-cells to determine how many cells you want to let through. The other is, given your combinatorial setup, to create an unfiltered permit-list, and then use the —unfiltered-pl flag to match against that list of possible barcodes. You can find the documentation on generate-permit-list here.

Let us know if you have any follow up questions.

Best,
Rob

from alevin-fry.

BingjieZhang commented on June 15, 2024

Hi Rob,

Thanks a lot for your response. It worked well with -force-cells! I just have too many cells with too shallow sequencing depth. Now the cell number looks reasonable to me. Thanks again for your help!

from alevin-fry.

wmacnair commented on June 15, 2024

I had exactly the same question, so thanks very much for this explanation.

Two small follow-ups:

What's the best way to do a knee plot showing all (not just the knee-filtered) barcodes?
The knee filtering approach works just on the basis of the distribution of droplet library sizes, while the EmptyDrops approach uses a statistical test against an empty population. I wondered if you had done any analysis of the effects they might have. Obviously the EmptyDrops paper would claim that you need the test, but maybe it rarely makes much difference in practice? Any pointers or thoughts would be great :)

Thanks!
Will

[Edit:

I've now discovered the discussion of knee vs EmptyDrops in the SI of the alevin-fry paper, which is helpful 😄 .

The conclusion there (although only based on one dataset) is that EmptyDrops is more permissive than knee, but that the quality of the clustering is worse when you do this. (You also conclude that choice of permit list method is more important than choice of UMI deduplication method.)

I'm thinking about circumstances where this might not be the best approach. I've recently been working with single nuclei RNAseq, where contamination by ambient RNA soup is a common problem (especially in human disease samples). Also the barcode knee plots are often not very clear. Here, you could imagine the knee approach identifying the larger and cleaner nuclei and excluding ambient-contaminated nuclei, while a more permissive permit list combined with an ambient RNA removal approach (such as scAR) could "rescue" those nuclei.

But I don't know - would be interesting to see this analysis!]

from alevin-fry.

wmacnair commented on June 15, 2024

Thinking a bit more, it might be worth adding this as a feature request. What generate-permit-list (typically) does is to keep all barcodes above a certain abundance threshold, while what is needed here is an option to keep all barcodes below a certain abundance threshold.

This is a pretty common use case, especially in single nuclei RNAseq, and I hope would be pretty straightforward to add. Would be great if you'd consider adding it to a release at some point!

Cheers
Will

from alevin-fry.

How to keep all the cellbarcodes? about alevin-fry HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent