Giter Site home page Giter Site logo

Comments (4)

rob-p avatar rob-p commented on June 15, 2024

Hi @BingjieZhang,

Thanks for your description. The issue here has nothing to do with the invocation of the alevin command, but the fact that you are using the knee filtering in the alevin-fry generate-permit-list command. The knee filtering will only let in high read abundance (proxy for high confidence) cells.

You have a few options to get a less filtered list. One option is to look at the knee plot and then run again using —force-cells to determine how many cells you want to let through. The other is, given your combinatorial setup, to create an unfiltered permit-list, and then use the —unfiltered-pl flag to match against that list of possible barcodes. You can find the documentation on generate-permit-list here.

Let us know if you have any follow up questions.

Best,
Rob

from alevin-fry.

BingjieZhang avatar BingjieZhang commented on June 15, 2024

Hi Rob,

Thanks a lot for your response. It worked well with -force-cells! I just have too many cells with too shallow sequencing depth. Now the cell number looks reasonable to me. Thanks again for your help!

from alevin-fry.

wmacnair avatar wmacnair commented on June 15, 2024

I had exactly the same question, so thanks very much for this explanation.

Two small follow-ups:

  • What's the best way to do a knee plot showing all (not just the knee-filtered) barcodes?
  • The knee filtering approach works just on the basis of the distribution of droplet library sizes, while the EmptyDrops approach uses a statistical test against an empty population. I wondered if you had done any analysis of the effects they might have. Obviously the EmptyDrops paper would claim that you need the test, but maybe it rarely makes much difference in practice? Any pointers or thoughts would be great :)

Thanks!
Will

[Edit:

I've now discovered the discussion of knee vs EmptyDrops in the SI of the alevin-fry paper, which is helpful 😄 .

The conclusion there (although only based on one dataset) is that EmptyDrops is more permissive than knee, but that the quality of the clustering is worse when you do this. (You also conclude that choice of permit list method is more important than choice of UMI deduplication method.)

I'm thinking about circumstances where this might not be the best approach. I've recently been working with single nuclei RNAseq, where contamination by ambient RNA soup is a common problem (especially in human disease samples). Also the barcode knee plots are often not very clear. Here, you could imagine the knee approach identifying the larger and cleaner nuclei and excluding ambient-contaminated nuclei, while a more permissive permit list combined with an ambient RNA removal approach (such as scAR) could "rescue" those nuclei.

But I don't know - would be interesting to see this analysis!]

from alevin-fry.

wmacnair avatar wmacnair commented on June 15, 2024

Thinking a bit more, it might be worth adding this as a feature request. What generate-permit-list (typically) does is to keep all barcodes above a certain abundance threshold, while what is needed here is an option to keep all barcodes below a certain abundance threshold.

This is a pretty common use case, especially in single nuclei RNAseq, and I hope would be pretty straightforward to add. Would be great if you'd consider adding it to a release at some point!

Cheers
Will

from alevin-fry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.