Giter Site home page Giter Site logo

Comments (13)

alexomics avatar alexomics commented on August 22, 2024

This is something that we want to incorporate. It will definitely make downstream analysis easier!

from readfish.

wdecoster avatar wdecoster commented on August 22, 2024

So right now that's not yet possible?
Then we will put our targets in the toml format in an array, and we then use coordinates in chr1:123456-234567 format?

from readfish.

mattloose avatar mattloose commented on August 22, 2024

It's almost possible... You can give a file as the parameter for a target:

e.g.

targets = "/path/to/your/targets/cancer_panel_and_MHC_targets.txt"

And that file will be of the following form:

chrX,71234624,71259140,+
chr19,53516024,53585269,+
chr12,6661648,6694510,+
chr19,21500564,21543078,+
chr7,57114614,57144864,+
chr18,25056926,25357152,+
chr22,28878592,29062487,+
chrX,15785472,15828260,+
chr6,28452797,33473354,+
chr6,28452797,33473354,-

So it isn't yet (quite) a bed file - but its close. Note the strand information. - so it's chr,start,stop,strand

from readfish.

alexomics avatar alexomics commented on August 22, 2024

So currently, the targets are only in the formats:

chr1

or

chr1,0,1000,+

More generally it should be either an entire record, so just the contig name or coordinates on a record so: contig,start,stop,strand

These should fit the patterns:

"^[^,]+$"
"^.+,[0-9]+,[0-9]+,[+-]$"

from readfish.

mattloose avatar mattloose commented on August 22, 2024

Like what Alex said ;-) - But noting that you can pass this in as a file.

from readfish.

wdecoster avatar wdecoster commented on August 22, 2024

Alright, we can work with that! Thanks!

You have me confused here for a bit that we need to specify the strand. Won't the alignment consider the reverse complement by default?

from readfish.

mattloose avatar mattloose commented on August 22, 2024

The aligner does, but this is to capture the case where you are targetting say a 10 kb region with reads of mean length 10 kb. You might want to offset your target regions such that you capture any read which starts within 5kb upstream of your target. So your target might be:

chr1:10000-20000

But you might target:

chr1,5000,20000,+
chr1,10000,25000,-

This would capture reads that have some probability of extending in to your region of interest.

Does that make sense?

It may be over-engineered right now which is why we are going to make a bed format compatible (with possibly automatic self correcting strand offsettting - PASCSO?! down the line!).

from readfish.

wdecoster avatar wdecoster commented on August 22, 2024

Ah gotcha. I already padded the intervals to take those flanking-potentially-on-target reads into account, but indeed not strand specific.

from readfish.

alexomics avatar alexomics commented on August 22, 2024

@wdecoster I've added some description to the TOML.md file. Does this make things clearer?

from readfish.

wdecoster avatar wdecoster commented on August 22, 2024

Yep, all clear now!

from readfish.

ythuang0522 avatar ythuang0522 commented on August 22, 2024

@mattloose We are still not quite sure of this bed formatting. Our interpretation is, for any given target gene, e.g., chrX,71234624,71259140, we must duplicate two coordinates in the target file even without extension.
i.e.,

chrX,71234624,71259140,+
chrX,71234624,71259140, -.

Without this duplication we will miss half alignments on one strand. Is that true?

from readfish.

alexomics avatar alexomics commented on August 22, 2024

@ythuang0522 Yes, to cover both strands you must specify both + and -.

from readfish.

ythuang0522 avatar ythuang0522 commented on August 22, 2024

Thank you.

from readfish.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.