Giter Site home page Giter Site logo

Comments (5)

mktrost avatar mktrost commented on August 13, 2024

It was not something I was planning, but if it would be useful, it
shouldn't be too difficult to add it.

Just a few questions (probably the hardest part of implementing it):

Would both ends as well as any unpaired reads (if any) all get written to 1
file in this case?
Based on my quick read of the bwa page, I think this is a no.
Would a user want the unpaired reads at all?
Should there be an option to disable outputing unpaired reads?

Filename extension question:
Typically the naming convention is:
first in pair is outBase_1.fastq
2nd in pair is outBase_2.fastq
unpaired in pair is outBase.fastq
What would you want the naming convention to be for the interleaved file
(if not written to stdout)?

I think that it makes sense to always write the first in pair first. The
logic already supports this.

Would you like a flag, like --interleave? I'm open to flag name
suggestions.
Or would you rather just specify --firstOut & --secondOut to be the same
filename?
Either way should be easy enough to implement (and maybe both should be
allowed).

I think for implementation, it would be checking the flag/filenames and if
any are the same, then only call ifopen once and set the pointers, then
only close once.

Thanks for the recommendation. Below I have some roughed out code that I
am currently thinking might work. That way if I go to implement it later I
remember what I am thinking now or if you want to implement it, you can see
what I'm thinking.

Mary Kate

If you just do --firstOut - --secondOut -
and add the following code, it might work

Instead of the current ifopen logic, use:

myUnpairedFile = ifopen(unpairedOut, "w");
if(myFirstFile == myUnpairedFile)
{
    myFirstFile = myUnpairedFile;
}
else
{
    myFirstFile = ifopen(firstOut, "w");
}
if(mySecondFile == myUnpairedFile)
{
    mySecondFile = myUnpairedFile;
}
else if(mySecondFile == myFirstFile)
{
    mySecondFile = myFirstFile;
}
else
{
    mySecondFile = ifopen(secondOut, "w");
}

Or if it never makes sense to interleave unpaired, then just check the
first & second files against each other.

Then I'd make a closeFiles() method:

if((mySecondFile != myFirstFile) && (mySecondFile != myUnpairedFile) &&
(mySecondFile != NULL))
{
ifclose(mySecondFile);
mySecondFile = NULL;
}
if((myFirstFile != myUnpairedFile) && (mySecondFile != NULL))
{
ifclose(myFirstFile);
myFirstFile = NULL;
}
if(myUnpairedFile != NULL)
{
ifclose(myUnpairedFile);
myUnpairedFile = NULL;
}

Then call closeFiles() in the destructor and at the end of execute rather
than manually closing each one.

On Fri, Sep 20, 2013 at 3:15 PM, Alex Lancaster [email protected]:

Any chance of adding the option to bam2FastQ to generate an interleaved
FASTQ file for paired-end reads? Currently it defaults to 2 FASTQ files.
Some tools (e.g. bwa) will handle this form of data, e.g. the "-p" option
for the "bwa mem" command (http://bio-bwa.sourceforge.net/bwa.shtml). It
also makes it a lot easier to pipe the output from bam2FastQ directly into
bwa without writing intermediate files. If it's not a feature that's
currently planned, I'd be willing to look into what would be required to
create a patch. Thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/10
.

from bamutil.

alexlancaster avatar alexlancaster commented on August 13, 2024

I think it would have to put unpaired reads in perhaps a separate file by default, maybe add an option to append them? For naming convention maybe outBase_interleaved.fastq by default? Yes, definitely the first pair first.

I am also testing "htscmd bam2fq" which is part of https://github.com/samtools/htslib which also generates interleaved FASTQ, but in testing it seems it does not necessarily interleave them using the (2i), (2i+1) convention as outlined in the "bwa mem" docs. In other words, if I use "bam2FastQ", I get the following:

$ grep '^\@F' 111582.test_1.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/1
@FCC0837ACXX:5:1202:10619:55648#/1
$ grep '^\@F' 111582.test_2.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/2
@FCC0837ACXX:5:1202:10619:55648#/2

whereas using the output from "htscmd bam2fq -a" on the same original bam I get this:

$ grep '^\@F' 111582.test.sorted.interleaved.fastq |head -4
@FCC0837ACXX:5:1307:4114:17474#/2
@FCD0JP0ACXX:6:1101:11361:95577#/1
@FCC0837ACXX:5:1101:8706:79404#/2
@FCD0JP0ACXX:6:1104:14809:174913#/2

so ideally Bam2FastQ would interleave them correctly.

from bamutil.

mktrost avatar mktrost commented on August 13, 2024

Ok,

I have updated bam2Fastq. Can you do a "git pull" to pull in the latest
code?

I added a --merge option. (--interleave didn't work due to a naming
conflict with the --in option).
It produces a file with _interleaved.fastq extension.

You can change the name of the interleaved file by using --firstOut.

bin/bam bam2FastQ --in input.sam --merge
Should create: input.fastq (unpaired) & input_interleaved.fastq (1st & 2nd
in pair)

So if you want interleaved written to stdout:

bin/bam bam2FastQ --in input.sam --firstOut - --merge

It will write input.fastq containing the unpaired files.

Let me know if this works for you, if you have any questions, or
suggestions to improve this new feature.

Mary Kate

On Thu, Sep 26, 2013 at 4:46 PM, Alex Lancaster [email protected]:

I think it would have to put unpaired reads in perhaps a separate file by
default, maybe add an option to append them? For naming convention maybe
outBase_interleaved.fastq by default? Yes, definitely the first pair first.

I am also testing "htscmd bam2fq" which is part of
https://github.com/samtools/htslib which also generates interleaved
FASTQ, but in testing it seems it does not necessarily interleave them
using the (2i), (2i+1) convention as outlined in the "bwa mem" docs. In
other words, if I use "bam2FastQ", I get the following:

$ grep '^@f' 111582.test_1.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/1
@FCC0837ACXX:5:1202:10619:55648#/1
$ grep '^@f' 111582.test_2.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/2
@FCC0837ACXX:5:1202:10619:55648#/2

whereas using the output from "htscmd bam2fq -a" on the same original bam
I get this:

$ grep '^@f' 111582.test.sorted.interleaved.fastq |head -4
@FCC0837ACXX:5:1307:4114:17474#/2
@FCD0JP0ACXX:6:1101:11361:95577#/1
@FCC0837ACXX:5:1101:8706:79404#/2
@FCD0JP0ACXX:6:1104:14809:174913#/2

so ideally Bam2FastQ would interleave them correctly.


Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-25202307
.

from bamutil.

alexlancaster avatar alexlancaster commented on August 13, 2024

Great!!

Sorry for only getting back to this now, so the new option is --merge? I think it seems to be working for us now. Let me run some more tests then I'll close.

from bamutil.

mktrost avatar mktrost commented on August 13, 2024

Correct, it is called "--merge" (I couldn't use "--interleave" because it
caused confusion with the "--in" parameter.)
Yes, please let me know how your testing goes.
Also, let me know if you need/notice anything else.

On Fri, Mar 14, 2014 at 4:21 PM, Alex Lancaster [email protected]:

Great!!

Sorry for only getting back to this now, so the new option is --merge? I
think it seems to be working for us now. Let me run some more tests then
I'll close.

Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-37691165
.

from bamutil.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.