Feature request - Rhapsody whole genome sequences about alevin-fry HOT 9 OPEN

combine-lab commented on June 14, 2024

Feature request - Rhapsody whole genome sequences

from alevin-fry.

Comments (9)

rob-p commented on June 14, 2024 1

Hi, with the help of Rob I managed to get a Rust BD Rhapsody analysis tool to work (https://github.com/stela2502/Rustody).

Awesome!

Regarding the actual feature request, could I ask for a bit more clarification? I'm not sure what the requested feature is exactly, or what the requirement would be for alevin-fry to do. Specifically, I think it would be most useful if you could give an idea of what the provided input would be, and what the requested output would be in either of these cases to help us understand the request better. Also, I'm looping in @DongzeHE so he's in the know ;P.

--Rob

from alevin-fry.

jashapiro commented on June 14, 2024 1

I just found this issue when investigating whether we would easily be able to adapt our pipeline to accommodate BD Rhapsody data. It looks like the R1 sequence structure (as described in their doc (pdf) could be accommodated by salmon --bc-geometry for the "Original" version, but the "Enhanced 3'" files have variable positions (the "diversity insert") that I am not sure how we would specify using that flag, if it is possible at all.

from alevin-fry.

stela2502 commented on June 14, 2024

Hi Rob, what I would dream up would be a way to use likely salmon here https://github.com/stela2502/Rustody/blob/8befa2e774caba0f5037b57ceb23bac7d18bac8d/src/bin/quantify_rhapsody.rs#L348.
I have seen that salmon is actually a cpp library. The point in my program is where I have tried to match the R2 read to any gene the tool knows of. As there was no hit it now should look genome wide. And I would like to NOT implement a genome wide search :-D
As I know of no Rust library that could help me here I tried to think further with storing my gene data as some kind of Index file, but failed horribly. I am not even able to read the data I wrote before :-(. The test here https://github.com/stela2502/Rustody/blob/4f36750ceaf8068c90813a94182f5f6a0f381d0e/this/src/geneids.rs#L580 fails. I seams that (1) the km (u64) ids from the file are not the same that I wrote and (2) I am also unable to read any gene name back (not utf8 formated). Although my Linux system shows the gene names just fine both with a zcat and vim.
I tried yesterday and even asked ChatGPG but could not fix that. Possible you spot my error immediately? If you can please help me. Otherwise I pause for some time now. Would be cool to get a genome wide mapper in Rust, but I do not have the time for something like that at the moment. I have never done that either so it would be quite a mission for me. If you (or anybody else) have interest in that I would be very happy for any help I can get.

from alevin-fry.

stela2502 commented on June 14, 2024

So to sum the long one up once more: I need to somehow map the R2 read to a genome wide index and do not have the time to implement that as I fail at the most basic stuff. I am no trained informatics guy after al :-(
And I would like to utilize whatever alevin-fry uses. I fear this will be complicated as the mapper is coded in cpp (if I am not mistaken). Hence I also think about implementing a genome wide mapping functionality. But I fear that is too complicated for me.

from alevin-fry.

rob-p commented on June 14, 2024

Hi @jashapiro,

Is your use-case for single-cell transcriptomics? We are currently working on a "general" solution to such problems — with increasingly complicated barcoding mechanisms. Currently the simpleaf -> alevin-fry pipeline has somewhat more generic support due to it's ability to specify the geometry with the fragment geometry description language. However, there are even more involved solutions necessary in some cases. Our general purpose approach isn't ready yet, but, in the meantime, might it be possible to use a tool like Interstellar to transform the data into an appropriately "normalized" format prior to processing with existing single-cell tools?

from alevin-fry.

stela2502 commented on June 14, 2024

Hi, at the end I just (re-)implemented a whole genome enabled mapper. That thing now uses a u16 representation of a 8bp fragment of the read to identify a most likely region in a u16::MAX long vector of 8pb-32bp downstream mappers. This does work on the targeted approach and should be able to scale it up to whole genome. You can look at it here: https://github.com/stela2502/Rustody/blob/new_mapper/this/src/fast_mapper.rs.
For the cell barcodes I simply use a partial match and get the highest probability for a sequence to be linking to one cell. Not a full length match as that would also generate some issues with sequencing errors. This allows then for some fuzziness in the matching regions, too. From a first glance at interstellar - are you sure it does implement a way to convert from a variable to a fixed format?

from alevin-fry.

stela2502 commented on June 14, 2024

I normally see up to 80% PCR duplicates in the data. So I am not sure if thinking about catching each and every read is even worth it. I would not assume that the final counts change in a meaningful way.

from alevin-fry.

colindaven commented on June 14, 2024

@jashapiro I would be interested in any approach for dealing with variable bases or "diversity inserts" in modern (2023) BD Rhapsody data. The library structure is detailed here as well: https://teichlab.github.io/scg_lib_structs/methods_html/BD_Rhapsody.html

The official CWL pipeline has not been too satisfactory for us.

from alevin-fry.

rob-p commented on June 14, 2024

cc @noahcape & @Daniel-Liu-c0deb0t: Could this modern BD Rhapsody data be a usecase for seqproc+ANTISEQUENCE? Can we see what would be required to perform this transformation?

from alevin-fry.

Feature request - Rhapsody whole genome sequences about alevin-fry HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent