Comments (5)
It was not something I was planning, but if it would be useful, it
shouldn't be too difficult to add it.
Just a few questions (probably the hardest part of implementing it):
Would both ends as well as any unpaired reads (if any) all get written to 1
file in this case?
Based on my quick read of the bwa page, I think this is a no.
Would a user want the unpaired reads at all?
Should there be an option to disable outputing unpaired reads?
Filename extension question:
Typically the naming convention is:
first in pair is outBase_1.fastq
2nd in pair is outBase_2.fastq
unpaired in pair is outBase.fastq
What would you want the naming convention to be for the interleaved file
(if not written to stdout)?
I think that it makes sense to always write the first in pair first. The
logic already supports this.
Would you like a flag, like --interleave? I'm open to flag name
suggestions.
Or would you rather just specify --firstOut & --secondOut to be the same
filename?
Either way should be easy enough to implement (and maybe both should be
allowed).
I think for implementation, it would be checking the flag/filenames and if
any are the same, then only call ifopen once and set the pointers, then
only close once.
Thanks for the recommendation. Below I have some roughed out code that I
am currently thinking might work. That way if I go to implement it later I
remember what I am thinking now or if you want to implement it, you can see
what I'm thinking.
Mary Kate
If you just do --firstOut - --secondOut -
and add the following code, it might work
Instead of the current ifopen logic, use:
myUnpairedFile = ifopen(unpairedOut, "w");
if(myFirstFile == myUnpairedFile)
{
myFirstFile = myUnpairedFile;
}
else
{
myFirstFile = ifopen(firstOut, "w");
}
if(mySecondFile == myUnpairedFile)
{
mySecondFile = myUnpairedFile;
}
else if(mySecondFile == myFirstFile)
{
mySecondFile = myFirstFile;
}
else
{
mySecondFile = ifopen(secondOut, "w");
}
Or if it never makes sense to interleave unpaired, then just check the
first & second files against each other.
Then I'd make a closeFiles() method:
if((mySecondFile != myFirstFile) && (mySecondFile != myUnpairedFile) &&
(mySecondFile != NULL))
{
ifclose(mySecondFile);
mySecondFile = NULL;
}
if((myFirstFile != myUnpairedFile) && (mySecondFile != NULL))
{
ifclose(myFirstFile);
myFirstFile = NULL;
}
if(myUnpairedFile != NULL)
{
ifclose(myUnpairedFile);
myUnpairedFile = NULL;
}
Then call closeFiles() in the destructor and at the end of execute rather
than manually closing each one.
On Fri, Sep 20, 2013 at 3:15 PM, Alex Lancaster [email protected]:
Any chance of adding the option to bam2FastQ to generate an interleaved
FASTQ file for paired-end reads? Currently it defaults to 2 FASTQ files.
Some tools (e.g. bwa) will handle this form of data, e.g. the "-p" option
for the "bwa mem" command (http://bio-bwa.sourceforge.net/bwa.shtml). It
also makes it a lot easier to pipe the output from bam2FastQ directly into
bwa without writing intermediate files. If it's not a feature that's
currently planned, I'd be willing to look into what would be required to
create a patch. Thanks!—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10
.
from bamutil.
I think it would have to put unpaired reads in perhaps a separate file by default, maybe add an option to append them? For naming convention maybe outBase_interleaved.fastq by default? Yes, definitely the first pair first.
I am also testing "htscmd bam2fq" which is part of https://github.com/samtools/htslib which also generates interleaved FASTQ, but in testing it seems it does not necessarily interleave them using the (2i), (2i+1) convention as outlined in the "bwa mem" docs. In other words, if I use "bam2FastQ", I get the following:
$ grep '^\@F' 111582.test_1.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/1
@FCC0837ACXX:5:1202:10619:55648#/1
$ grep '^\@F' 111582.test_2.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/2
@FCC0837ACXX:5:1202:10619:55648#/2
whereas using the output from "htscmd bam2fq -a" on the same original bam I get this:
$ grep '^\@F' 111582.test.sorted.interleaved.fastq |head -4
@FCC0837ACXX:5:1307:4114:17474#/2
@FCD0JP0ACXX:6:1101:11361:95577#/1
@FCC0837ACXX:5:1101:8706:79404#/2
@FCD0JP0ACXX:6:1104:14809:174913#/2
so ideally Bam2FastQ would interleave them correctly.
from bamutil.
Ok,
I have updated bam2Fastq. Can you do a "git pull" to pull in the latest
code?
I added a --merge option. (--interleave didn't work due to a naming
conflict with the --in option).
It produces a file with _interleaved.fastq extension.
You can change the name of the interleaved file by using --firstOut.
bin/bam bam2FastQ --in input.sam --merge
Should create: input.fastq (unpaired) & input_interleaved.fastq (1st & 2nd
in pair)
So if you want interleaved written to stdout:
bin/bam bam2FastQ --in input.sam --firstOut - --merge
It will write input.fastq containing the unpaired files.
Let me know if this works for you, if you have any questions, or
suggestions to improve this new feature.
Mary Kate
On Thu, Sep 26, 2013 at 4:46 PM, Alex Lancaster [email protected]:
I think it would have to put unpaired reads in perhaps a separate file by
default, maybe add an option to append them? For naming convention maybe
outBase_interleaved.fastq by default? Yes, definitely the first pair first.I am also testing "htscmd bam2fq" which is part of
https://github.com/samtools/htslib which also generates interleaved
FASTQ, but in testing it seems it does not necessarily interleave them
using the (2i), (2i+1) convention as outlined in the "bwa mem" docs. In
other words, if I use "bam2FastQ", I get the following:$ grep '^@f' 111582.test_1.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/1
@FCC0837ACXX:5:1202:10619:55648#/1
$ grep '^@f' 111582.test_2.fastq |head -2
@FCC0837ACXX:5:2307:12026:198869#/2
@FCC0837ACXX:5:1202:10619:55648#/2whereas using the output from "htscmd bam2fq -a" on the same original bam
I get this:$ grep '^@f' 111582.test.sorted.interleaved.fastq |head -4
@FCC0837ACXX:5:1307:4114:17474#/2
@FCD0JP0ACXX:6:1101:11361:95577#/1
@FCC0837ACXX:5:1101:8706:79404#/2
@FCD0JP0ACXX:6:1104:14809:174913#/2so ideally Bam2FastQ would interleave them correctly.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-25202307
.
from bamutil.
Great!!
Sorry for only getting back to this now, so the new option is --merge
? I think it seems to be working for us now. Let me run some more tests then I'll close.
from bamutil.
Correct, it is called "--merge" (I couldn't use "--interleave" because it
caused confusion with the "--in" parameter.)
Yes, please let me know how your testing goes.
Also, let me know if you need/notice anything else.
On Fri, Mar 14, 2014 at 4:21 PM, Alex Lancaster [email protected]:
Great!!
Sorry for only getting back to this now, so the new option is --merge? I
think it seems to be working for us now. Let me run some more tests then
I'll close.Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-37691165
.
from bamutil.
Related Issues (20)
- Building bamUtil returns error: undefined reference HOT 4
- Cut middle of reads HOT 2
- `bam diff` should exit with a return code
- Dedup does not work from std/in HOT 2
- Cannot Compile NonPrimaryDedup Branch HOT 5
- cannot consume -.ubam on a pipe HOT 2
- Makefile should not ignore system LDFLAGS
- `bam diff` seems to return the reads that look identical between A and B HOT 1
- the clang compiler does not support -pg option on versions of OS X 10.9 and later HOT 4
- Error building from source: libStatGen HOT 3
- BamUtil: recab error: failed to open reference genome HOT 6
- bam2FastQ generates empty fastq files
- bash -x testBam2FastQ.sh and bash -x testMergeBam.sh failed on centos8_aarch64
- can not install with conda HOT 2
- bamutil diff - Get record for every read being compared HOT 2
- build is broken due to deprecated git protocol HOT 1
- Odd stats reported for clipOverlap
- License? HOT 1
- splitChromosome without header?
- Erroneous CIGAR string after clipOverlap HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bamutil.