Comments (4)
Hi @BingjieZhang,
Thanks for your description. The issue here has nothing to do with the invocation of the alevin
command, but the fact that you are using the knee
filtering in the alevin-fry
generate-permit-list
command. The knee
filtering will only let in high read abundance (proxy for high confidence) cells.
You have a few options to get a less filtered list. One option is to look at the knee plot and then run again using —force-cells
to determine how many cells you want to let through. The other is, given your combinatorial setup, to create an unfiltered permit-list, and then use the —unfiltered-pl
flag to match against that list of possible barcodes. You can find the documentation on generate-permit-list
here.
Let us know if you have any follow up questions.
Best,
Rob
from alevin-fry.
Hi Rob,
Thanks a lot for your response. It worked well with -force-cells
! I just have too many cells with too shallow sequencing depth. Now the cell number looks reasonable to me. Thanks again for your help!
from alevin-fry.
I had exactly the same question, so thanks very much for this explanation.
Two small follow-ups:
- What's the best way to do a knee plot showing all (not just the knee-filtered) barcodes?
- The
knee
filtering approach works just on the basis of the distribution of droplet library sizes, while theEmptyDrops
approach uses a statistical test against an empty population. I wondered if you had done any analysis of the effects they might have. Obviously theEmptyDrops
paper would claim that you need the test, but maybe it rarely makes much difference in practice? Any pointers or thoughts would be great :)
Thanks!
Will
[Edit:
I've now discovered the discussion of knee
vs EmptyDrops
in the SI of the alevin-fry
paper, which is helpful 😄 .
The conclusion there (although only based on one dataset) is that EmptyDrops
is more permissive than knee
, but that the quality of the clustering is worse when you do this. (You also conclude that choice of permit list method is more important than choice of UMI deduplication method.)
I'm thinking about circumstances where this might not be the best approach. I've recently been working with single nuclei RNAseq, where contamination by ambient RNA soup is a common problem (especially in human disease samples). Also the barcode knee plots are often not very clear. Here, you could imagine the knee
approach identifying the larger and cleaner nuclei and excluding ambient-contaminated nuclei, while a more permissive permit list combined with an ambient RNA removal approach (such as scAR) could "rescue" those nuclei.
But I don't know - would be interesting to see this analysis!]
from alevin-fry.
Thinking a bit more, it might be worth adding this as a feature request. What generate-permit-list
(typically) does is to keep all barcodes above a certain abundance threshold, while what is needed here is an option to keep all barcodes below a certain abundance threshold.
This is a pretty common use case, especially in single nuclei RNAseq, and I hope would be pretty straightforward to add. Would be great if you'd consider adding it to a release at some point!
Cheers
Will
from alevin-fry.
Related Issues (20)
- Raw and filtered count data similar to cell ranger output.
- Unmaintained dependency used by alevin fry HOT 1
- Update documentation to include recommended processing for 10x scRNA 5' V2 HOT 2
- Feature request: Support for 10x "flex" fixed RNA data HOT 3
- alevin-fry not generating all required output files HOT 6
- technical limitation to bc length? HOT 2
- Alevin-fry for SMARt-seq3 data
- request for a tutorial using alevin-fry for multiome datasets
- Request for a decoy-aware index in alevin-fry (with a specific case) HOT 6
- Merging replicates with different permit lists HOT 2
- Using genotype based demultiplexing tools on alevin-fry output HOT 1
- Cannot get output HOT 2
- Don't correct barcodes HOT 1
- The barcode or umi spans multi reads HOT 7
- zero-length barcode HOT 2
- almost no genes detected
- CorrectedReads in featureDump.txt
- only 100 cells output from feature barcoding data HOT 19
- How to realize umi-tools directional algorithm in alevin-fry HOT 5
- ExitStatus(unix_wait_status(6)) HOT 24
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alevin-fry.