Comments (9)
Hi, with the help of Rob I managed to get a Rust BD Rhapsody analysis tool to work (https://github.com/stela2502/Rustody).
Awesome!
Regarding the actual feature request, could I ask for a bit more clarification? I'm not sure what the requested feature is exactly, or what the requirement would be for alevin-fry
to do. Specifically, I think it would be most useful if you could give an idea of what the provided input would be, and what the requested output would be in either of these cases to help us understand the request better. Also, I'm looping in @DongzeHE so he's in the know ;P.
--Rob
from alevin-fry.
I just found this issue when investigating whether we would easily be able to adapt our pipeline to accommodate BD Rhapsody data. It looks like the R1 sequence structure (as described in their doc (pdf) could be accommodated by salmon --bc-geometry
for the "Original" version, but the "Enhanced 3'" files have variable positions (the "diversity insert") that I am not sure how we would specify using that flag, if it is possible at all.
from alevin-fry.
Hi Rob, what I would dream up would be a way to use likely salmon here https://github.com/stela2502/Rustody/blob/8befa2e774caba0f5037b57ceb23bac7d18bac8d/src/bin/quantify_rhapsody.rs#L348.
I have seen that salmon is actually a cpp library. The point in my program is where I have tried to match the R2 read to any gene the tool knows of. As there was no hit it now should look genome wide. And I would like to NOT implement a genome wide search :-D
As I know of no Rust library that could help me here I tried to think further with storing my gene data as some kind of Index file, but failed horribly. I am not even able to read the data I wrote before :-(. The test here https://github.com/stela2502/Rustody/blob/4f36750ceaf8068c90813a94182f5f6a0f381d0e/this/src/geneids.rs#L580 fails. I seams that (1) the km (u64) ids from the file are not the same that I wrote and (2) I am also unable to read any gene name back (not utf8 formated). Although my Linux system shows the gene names just fine both with a zcat and vim.
I tried yesterday and even asked ChatGPG but could not fix that. Possible you spot my error immediately? If you can please help me. Otherwise I pause for some time now. Would be cool to get a genome wide mapper in Rust, but I do not have the time for something like that at the moment. I have never done that either so it would be quite a mission for me. If you (or anybody else) have interest in that I would be very happy for any help I can get.
from alevin-fry.
So to sum the long one up once more: I need to somehow map the R2 read to a genome wide index and do not have the time to implement that as I fail at the most basic stuff. I am no trained informatics guy after al :-(
And I would like to utilize whatever alevin-fry uses. I fear this will be complicated as the mapper is coded in cpp (if I am not mistaken). Hence I also think about implementing a genome wide mapping functionality. But I fear that is too complicated for me.
from alevin-fry.
Hi @jashapiro,
Is your use-case for single-cell transcriptomics? We are currently working on a "general" solution to such problems — with increasingly complicated barcoding mechanisms. Currently the simpleaf
-> alevin-fry
pipeline has somewhat more generic support due to it's ability to specify the geometry with the fragment geometry description language. However, there are even more involved solutions necessary in some cases. Our general purpose approach isn't ready yet, but, in the meantime, might it be possible to use a tool like Interstellar to transform the data into an appropriately "normalized" format prior to processing with existing single-cell tools?
from alevin-fry.
Hi, at the end I just (re-)implemented a whole genome enabled mapper. That thing now uses a u16 representation of a 8bp fragment of the read to identify a most likely region in a u16::MAX long vector of 8pb-32bp downstream mappers. This does work on the targeted approach and should be able to scale it up to whole genome. You can look at it here: https://github.com/stela2502/Rustody/blob/new_mapper/this/src/fast_mapper.rs.
For the cell barcodes I simply use a partial match and get the highest probability for a sequence to be linking to one cell. Not a full length match as that would also generate some issues with sequencing errors. This allows then for some fuzziness in the matching regions, too. From a first glance at interstellar - are you sure it does implement a way to convert from a variable to a fixed format?
from alevin-fry.
I normally see up to 80% PCR duplicates in the data. So I am not sure if thinking about catching each and every read is even worth it. I would not assume that the final counts change in a meaningful way.
from alevin-fry.
@jashapiro I would be interested in any approach for dealing with variable bases or "diversity inserts" in modern (2023) BD Rhapsody data. The library structure is detailed here as well: https://teichlab.github.io/scg_lib_structs/methods_html/BD_Rhapsody.html
The official CWL pipeline has not been too satisfactory for us.
from alevin-fry.
cc @noahcape & @Daniel-Liu-c0deb0t: Could this modern BD Rhapsody data be a usecase for seqproc
+ANTISEQUENCE
? Can we see what would be required to perform this transformation?
from alevin-fry.
Related Issues (20)
- Raw and filtered count data similar to cell ranger output.
- Unmaintained dependency used by alevin fry HOT 1
- Update documentation to include recommended processing for 10x scRNA 5' V2 HOT 2
- Feature request: Support for 10x "flex" fixed RNA data HOT 3
- alevin-fry not generating all required output files HOT 6
- technical limitation to bc length? HOT 2
- Alevin-fry for SMARt-seq3 data
- request for a tutorial using alevin-fry for multiome datasets
- Request for a decoy-aware index in alevin-fry (with a specific case) HOT 6
- Merging replicates with different permit lists HOT 2
- Using genotype based demultiplexing tools on alevin-fry output HOT 1
- Cannot get output HOT 2
- Don't correct barcodes HOT 1
- The barcode or umi spans multi reads HOT 7
- zero-length barcode HOT 2
- almost no genes detected
- CorrectedReads in featureDump.txt
- only 100 cells output from feature barcoding data HOT 19
- How to realize umi-tools directional algorithm in alevin-fry HOT 5
- ExitStatus(unix_wait_status(6)) HOT 24
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alevin-fry.