jsh58 / ngmerge Goto Github PK

View Code? Open in Web Editor NEW

43.0 1.0 14.0 845 KB

Merging paired-end reads and removing adapters

License: MIT License

Makefile 0.11% C 65.16% Python 34.73%

ngmerge's Introduction

NGmerge: merging paired-end reads and removing sequencing adapters

Gaspar JM. BMC Bioinformatics. 2018 Dec 20;19(1):536. [PubMed] [BMC] [PDF]

Introduction
Alignment method
Stitch mode
Adapter-removal mode
- I/O files and options
- Alignment parameters
Miscellaneous
Contact

Introduction

NGmerge operates on paired-end high-throughput sequence reads in two distinct modes (Fig. 1).

In the default stitch mode, NGmerge combines paired-end reads that overlap into a single read that spans the full length of the original DNA fragment (Fig. 1A). The ends of the merged read are defined by the 5' ends of the original reads. Reads that fail the stitching process (due to a lack of sufficient overlap, or excessive sequencing errors) are placed into secondary output files, if the user requires them.

The alternative adapter-removal mode returns the original reads as pairs, removing the 3' overhangs of those reads whose valid stitched alignment has this characteristic (Fig. 1B). Reads whose alignments do not have such overhangs (or do not align at all) will also be printed to the output files, unmodified.

Figure 1. Analysis modes of NGmerge. The diagrams show the paired-end reads (R1, R2) derived from sequencing DNA fragments (white boxes) with sequencing adapters (gray boxes) on either end.

Quick start

Given:

sample_R1.fastq.gz, sample_R2.fastq.gz (paired-end sequence files for a sample)
NGmerge (downloaded and compiled as described below)

To produce stitched reads (Fig. 1A): sample_merged.fastq.gz

$ ./NGmerge  -1 sample_R1.fastq.gz  -2 sample_R2.fastq.gz  -o sample_merged.fastq.gz

To produce reads with adapters removed (Fig. 1B): sample_noadapters_1.fastq.gz and sample_noadapters_2.fastq.gz

$ ./NGmerge  -a  -1 sample_R1.fastq.gz  -2 sample_R2.fastq.gz  -o sample_noadapters

Software compilation

The software can be downloaded from GitHub. (and you're already here! congratulations!)

A Makefile is provided for compilation with GCC, and both zlib and OpenMP are also required. The program has been tested after compilation with GCC 6.3.0, zlib 1.2.8, and OpenMP 4.0.

To compile, run make in the folder in which the software was downloaded. The executable NGmerge should be produced.

Usage message

Usage: ./NGmerge {-1 <file> -2 <file> -o <file>}  [optional arguments]
Required arguments:
  -1  <file>       Input FASTQ file with reads from forward direction
  -2  <file>       Input FASTQ file with reads from reverse direction
  -o  <file>       Output FASTQ file(s):
                   - in 'stitch' mode (def.), the file of merged reads
                   - in 'adapter-removal' mode (-a), the output files
                     will be <file>_1.fastq and <file>_2.fastq
Alignment parameters:
  -m  <int>        Minimum overlap of the paired-end reads (def. 20)
  -p  <float>      Mismatches to allow in the overlapped region
                     (a fraction of the overlap length; def. 0.10)
  -a               Use 'adapter-removal' mode (also sets -d option)
  -d               Option to check for dovetailing (with 3' overhangs)
  -e  <int>        Minimum overlap of dovetailed alignments (def. 50)
  -s               Option to produce shortest stitched read
I/O options:
  -l  <file>       Log file for stitching results of each read pair
  -f  <file>       FASTQ files for reads that failed stitching
                     (output as <file>_1.fastq and <file>_2.fastq)
  -c  <file>       Log file for dovetailed reads (adapter sequences)
  -j  <file>       Log file for formatted alignments of merged reads
  -z/-y            Option to gzip (-z) or not (-y) FASTQ output(s)
  -i               Option to produce interleaved FASTQ output(s)
  -w  <file>       Use given error profile for merged qual scores
  -g               Use 'fastq-join' method for merged qual scores
  -q  <int>        FASTQ quality offset (def. 33)
  -u  <int>        Maximum input quality score (0-based; def. 40)
  -t  <char>       Delimiter for headers of paired reads (def. ' ')
  -n  <int>        Number of threads to use (def. 1)
  -v               Option to print status updates/counts to stderr

Alignment method

In either analysis mode (Fig. 1), NGmerge evaluates all possible gapless alignments of a pair of reads in attempting to find an optimal one. The determinations of which alignments are considered, and then which alignment (if any) is both valid and optimal, are made according to several parameters: -m, -p, -d, -e, and -s.

NGmerge begins by aligning a pair of reads (R1, R2) such that the minimum overlap parameter (-m, default 20bp) is met. It then checks each possible alignment of the reads until they overlap with no 3' overhangs (Fig. 2A). If the -d option is selected (or in adapter-removal mode [-a, which automatically sets -d]), NGmerge additionally evaluates dovetailed alignments (with 3' overhangs), down to the minimum length set by the -e parameter (Fig. 2B).

Figure 2. Alignments considered by NGmerge. A: Default alignments range from those with the minimal overlap length (set by -m), to complete overlaps with no overhangs. B: When the -d option is selected, NGmerge also evaluates dovetailed alignments.

For each alignment, NGmerge computes the fraction mismatch (the number of mismatches between the R1 and R2 reads, divided by the overlap length). Alignments with calculated values no more than the threshold set by the -p parameter (default 0.10) are considered valid. If multiple valid alignments are found, the one with the lowest fraction mismatch is selected as the optimal alignment. In rare cases where multiple alignments have identical fraction mismatches, the longest is preferred by default (unless -s is set). In all of these calculations, ambiguous bases (Ns) are considered neither matches nor mismatches.

Further descriptions of these parameters are provided below.

Stitch mode

I/O files and options

Input files

  -1  <file>       Input FASTQ file with reads from forward direction
  -2  <file>       Input FASTQ file with reads from reverse direction

NGmerge analyzes unaligned paired-end reads in FASTQ format. The input files can be gzip-compressed. Multiple sets of input files can be specified, comma-separated (or space-separated, in quotes).

The input files must list the reads in the same order. The program requires that the paired reads' headers match, at least up to the first space character (or whatever alternative character is specified by -t).

An input file of interleaved reads can be analyzed by not specifying a -2 file. Also, it is possible to read from stdin using -, e.g. -1 -.

Since the merged reads are defined by the 5' ends of the paired reads' alignments (Fig. 1A), one should be wary of quality trimming the reads at those ends. For example, when using a program such as qualTrim, one should specify -3 to ensure that quality trimming occurs only at the 3' ends, prior to using NGmerge.

Output files and options

  -o  <file>       Output FASTQ file:
                   - in 'stitch' mode (def.), the file of merged reads

The primary output file in stitch mode is the file of merged reads, in FASTQ format. It is possible to write to stdout with -o - (see also -y, below).

  -f  <file>       FASTQ files for reads that failed stitching
                     (output as <file>_1.fastq and <file>_2.fastq)

When specified, all the reads that failed the merging procedure will be written to the output files, as they appeared in the original inputs.

  -z/-y            Option to gzip (-z) or not (-y) FASTQ output(s)

By default, all FASTQ output files will be gzip-compressed if and only if the input files are (with multiple sets of input files, the outputs will be compressed if either of the first set of inputs is). Specifying -z will guarantee that the outputs are gzip-compressed, whereas -y will guarantee that they are not, regardless of the inputs' formats. Note that all gzip-compressed outputs will automatically have '.gz' appended to their filenames, if necessary.

  -i               Option to produce interleaved FASTQ output(s)

In stitch mode, this applies only to the optional output from -f (above). Instead of two outputs, a single interleaved output will be produced (and no '.fastq' suffix will be appended to the filename).

  -l  <file>       Log file for stitching results of each read pair

This log file lists the following for each read pair in the input file(s):

Read	read header, not including `@`
OverlapLen	total length of the read overlap, including Ns; `NA` if reads were not merged (and remaining columns are left blank)
StitchedLen	total length of the merged read
Mismatch	fraction of mismatched bases (count of mismatches divided by overlap length [not including Ns]); must be less than or equal to `-p` value (see below)

  -c  <file>       Log file for dovetailed reads (adapter sequences)

This log file lists the following for each read pair whose optimal valid alignment has 3' overhangs:

Read	read header, not including `@`
Adapter_R1	3' overhang of R1 read; `-` if no overhang
Adapter_R2	3' overhang of R2 read; `-` if no overhang

The columns are labeled 'Adapter' because, if the reads were not trimmed on their 5' ends, these extra sequences should be adapters. If the sequences that appear in the 'Adapter' columns are not consistent, they may be false positives, and one should consider decreasing -p or increasing -e.

  -j  <file>       Log file for formatted alignments of merged reads

For each pair of reads that was successfully merged, this log file lists alignments of the reads' sequences and quality scores, along with the resulting merged sequence and quality scores. For example:

sample_read1.1
seq_R1:  CTCACACTCAATCTTTTATCACGAAGTCATGATTGAATCGCGAGTGGTCG
                       |||| ||||||||||||||| || ||||||||||||
seq_R2:                TTTACCACGAAGTCATGATTAAAGCGCGAGTGGTCGGCAGATTGCGATAA

qual_R1: 1101?B10>F111122BE1B22<EAFC12FB22BFG12>G/<<B>F/11>
qual_R2:               F/F/19B99BFFE;//;//;-----@E;/EA;AA900000:....00:/;

merged
seq:     CTCACACTCAATCTTTTATCACGAAGTCATGATTGAATCGCGAGTGGTCGGCAGATTGCGATAA
qual:    1101?B10>F1111G>HG"GEBFHHHHB>GG>>G?H="DHFFCHGHDDBD0000:....00:/;

Alignment parameters

  -m  <int>        Minimum overlap of the paired-end reads (def. 20)

This is the minimum overlap length (in bp) for valid alignments of a pair of reads (see Fig. 2A). Note that ambiguous bases (Ns) do not count toward this minimum length.

  -p  <float>      Mismatches to allow in the overlapped region
                     (a fraction of the overlap length; def. 0.10)

This parameter determines how stringent the evaluation of an alignment is. The value must be in the interval [0, 1), with lower values equating to increased stringency. Specifying -p 0 means that only perfect alignments (with no mismatches) are valid; the default value of 0.10 means that a valid alignment can have at most 10% mismatches (calculated as the number of mismatches divided by the overlap length [not counting Ns]).

  -d               Option to check for dovetailing (with 3' overhangs)

When this option is selected, alignments in which a read's 3' end extends past its pair's 5' end will be evaluated, down to a minimum length (see Fig. 2B). By default, such alignments are not even considered. Since the merged read is defined by the original reads' 5' ends, the 3' overhangs are automatically removed. These overhangs, which are typically adapters, can be printed to a separate log file (see -c, above).

  -e  <int>        Minimum overlap of dovetailed alignments (def. 50)

This is the minimum overlap length (in bp) for alignments with 3' overhangs (see Fig. 2B). This value should be set to the length of the absolute shortest DNA fragment that may have been sequenced. Using a value that is too low may result in false positives, especially if the reads contain repetitive sequences.

  -s               Option to produce shortest stitched read

Given multiple valid alignments with identical fraction mismatch scores, NGmerge will select the longest stitched read by default. With -s, the shortest stitched read will be preferred instead.

Quality score profile options

By default, NGmerge uses hard-coded profiles when determining the quality scores of overlapping bases. There are separate profiles for cases where the R1 base and the R2 base match, and for when they do not match. Those who do not wish to use these profiles have two alternative options:

  -w  <file>       Use given error profile for merged qual scores

With this option, NGmerge will use the quality score profiles in the provided file. The file must list two matrices of comma- or tab-separated values that follow header lines #match and #mismatch. One should follow the template of the given qual_profile.txt file, which mimics the hard-coded profiles of NGmerge with the quality score range of [0, 40].

  -g               Use 'fastq-join' method for merged qual scores

With this option, NGmerge will use a method similar to that of the program fastq-join. In cases where the R1 base and R2 base match, the higher quality score is used for the merged base. When they do not match, the merged base's quality score is calculated as the difference in the two quality scores.

Adapter-removal mode

  -a               Use 'adapter-removal' mode (also sets -d option)

This option must be specified for NGmerge to run in adapter-removal mode. As indicated, it automatically sets the -d option to check for dovetailed alignments.

I/O files and options

Input files

The formatting of the input files is described above.

Output files and options

  -o  <file>       Output FASTQ files:
                   - in 'adapter-removal' mode (-a), the output files
                     will be <file>_1.fastq and <file>_2.fastq

In adapter-removal mode, all reads are printed to the output files. The only modifications are the clipping of the 3' overhangs of reads whose alignments have such overhangs.

  -i               Option to produce interleaved FASTQ output(s)

With this option, instead of two outputs, a single interleaved output will be produced (and no '.fastq' suffix will be appended to the filename).

  -z/-y            Option to gzip (-z) or not (-y) FASTQ output(s)

These options are described above.

  -c  <file>       Log file for dovetailed reads (adapter sequences)

This log file is described above.

In adapter-removal mode, the following files cannot be produced:

  -f  <file>       FASTQ files for reads that failed stitching
                     (output as <file>_1.fastq and <file>_2.fastq)
  -l  <file>       Log file for stitching results of each read pair
  -j  <file>       Log file for formatted alignments of merged reads

Alignment parameters

These parameters are described above.

As noted previously, the -d option is automatically set in adapter-removal mode.

Miscellaneous

  -n  <int>        Number of threads to use (def. 1)

To reduce computational time, one can run NGmerge across multiple cores via this option. Note that gzip compression and decompression is not parallelized, so the computational savings are not linear.

  -q  <int>        FASTQ quality offset (def. 33)
  -u  <int>        Maximum input quality score (0-based; def. 40)

These two parameters set the range of quality scores for the input FASTQ files. The default values match the Sanger format, with quality scores in the range [0, 40] spanning ASCII values [33, 73].

  -t  <char>       Delimiter for headers of paired reads (def. ' ')

The headers of a pair of reads must match, at least up to the first space character, by default. An alternative delimiter (such as / or $'\t') can be specified with this option. If multiple characters are provided, only the first will be used as the delimiter.

  -b               Option to print mismatches only to -j log file

Instead of printing full alignments, the log file specified by -j will list the details of the mismatches: the read header, position, and the base and quality score for both the R1 and R2 reads. This is useful for calculating separate error rates for matches and mismatches.

Other options:

  -v/--verbose     Option to print status updates/counts to stderr
  -h/--help        Print the usage message and exit
  -V/--version     Print the version and exit

Other notes:

NGmerge cannot gzip-compress multiple output files that are stdout. For example, the following will produce an error:
- -o - -a without -i
- -f - without -a and without -i

Contact

NGmerge

ngmerge's People

Contributors

Stargazers

Watchers

Forkers

harvardinformatics springtan munizajunaid transcript slw287r healthvivo renesugar vhu43

ngmerge's Issues

Error! Quality scores outside of set range

Hi everyone
I run the command
./NGmerge -1 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R1.fastq.gz -2 /home/planktonecology/Manuscript_Thatha/Metagenome_Thatha/CG_DN_935/AST5_R2.fastq.gz -o AST5_merged.fastq.gz

I got error as follows
Error! Quality scores outside of set range

Error! sample: unknown command-line argument

Hi, I am trying to use this tool, but after running the following command:

$NGmerge -1 $FILE2 -2 $FIL3 -o sample -a -n 20 -v

I got this: Error! sample: unknown command-line argument

I cannot figure out where the error come from. I will appreciate your help.

Thanks in advance.

Error! Cannot close file

Running adapter-removal mode with two fastq.gz files, code below:

./NGmerge -1 DoxEE17_D10_S1_R1_001.fastq.gz -2 DoxEE17_D10_S1_R2_001.fastq.gz -a -o DoxEE17_D10_noA -y -n 12

How do I fix this error?

Documentation Requested for Custom Quality Profile

Can we have a more detailed explanation in the documentation for how to appropriately structure a custom quality profile for the -w option.

Merging problem

Hello

I'm trying to merge my paired end reads into a single read by NGmerge. The problem is when I run a command like

NGmerge-master/NGmerge -1 AH1-R1.fastq -2 AH1-R2.fastq -o AH1-merged.fastq

the resultant merged file has a huge reduction in the file size and number of reads, for example from 600M to 70M, and from 15,000,000 reads to only 1,000,000 reads!

Could you please tell me what the issue reason might be?

Thank you

doesn't easily install on Mac OS

I had trouble installing this on Mac OS (10.14.6) due to Apple clang not supporting OpenMP by default, so I got the error message when I ran 'make':

clang: error: unsupported option '-fopenmp'

So a more Mac-friendly installer or a pre-compiled binary would be appreciated.

Regards, Eric

is there any option for batch processing?

Is there an option to multi process in a loop for all R1 and R2 fastq inside a folder? I have more than 1000 fastqs to process.. Would be tedious to process them one by one.

support for Bash process substitution

it would be nice if this program allowed Bash process substitution to be used for the input files. For example, one might want to run a command like the following:

NGmerge -a -1 <(zcat R1.fastq.gz | head -n4000) -2 <(R2.fastq.gz | head -n4000) -o temp.fastq -i -v

Currently, the above command causes the program to fail with the following error:

Processing files: /dev/fd/63,/dev/fd/62
Error! Input file does not follow fastq format

Below is an example of modifications to the code that work on my system (Ubuntu 16.04). The modified code starts after the comment "push back chars". The solution is to use gzdopen instead of gzopen. See also the attached diff file diff.txt.

bool openRead(char* inFile, File* in) {

  // open file or stdin
  bool stdinBool = (strcmp(inFile, "-") ? false : true);
  FILE* dummy = (stdinBool ? stdin : fopen(inFile, "r"));
  if (dummy == NULL)
    exit(error(inFile, ERROPEN));

  // check for gzip compression: magic number 0x1F, 0x8B
  bool gzip = true;
  int save = 0;  // first char to pushback (for stdin)
  int i, j;
  for (i = 0; i < 2; i++) {
    j = fgetc(dummy);
    if (j == EOF)
      exit(error(inFile, ERROPEN));
    if ( (i && (unsigned char) j != 0x8B)
        || (! i && (unsigned char) j != 0x1F) ) {
      gzip = false;
      break;
    }
    if (! i)
      save = j;
  }

  // push back chars
  if (ungetc(j, dummy) == EOF)
    exit(error("", ERRUNGET));
  if (i && ungetc(save, dummy) == EOF)
    exit(error("", ERRUNGET));

  // open file
  if (! stdinBool)
    rewind(dummy);
  in->f = dummy;
  if (gzip) {
    in->gzf = gzdopen(fileno(in->f), "r");
    if (in->gzf == NULL)
      exit(error(inFile, ERROPEN));
  }

  return gzip;
}

Error! Input file does not follow fastq format

Dear John,

When I run the following, I get "Error! Input file does not follow fastq format", although I am convinced that my input files are in fastq format (reads.zip):

NGmerge -1 AMBV1527_forward.fastq -2 AMBV1527_reverse.fastq -o merged.fastq

Any idea what the problem might be?

Best regards,
Stijn

adapters remains after using NGmerge

Hello,
I’ve just tried to use NGmerge to cut the adapter from about paired-end data. Fastqc Report shows that Nextera Transposase Sequence is the adapter (Fig1).
I use NGmerge to cut the adapter with the following command:
NGmerge/NGmerge -z -a -1 R1.fastq.gz -2 R2.fastq.gz -o cut_R
But the cut file still contains some adapters (Fig2)
Do you have any idea about that? Did I use it properly?
Thank you very much
Hien

Reads are good but throws error: "Sequence/quality scores do not match"

I try GNmerge in Linux but is not running with my simulated datasets.

The header patterns are the following:

@gi|110798562|ref|NC_008261.1|-100.101.325660/1
AAGTTCATCATAGTTATTTTGAATAAAATTTAATCTATCAAGTATCATCTATTATCACTCCGTATACAGATTTTCATATTTTACAATTATAGCACACTAC
+
>G9GFGCFGGGG#G#8#G)E##G6GGGBGGGGCGGGGGEFGG8GG:CGGF9,G9EGGGFGGGGGG6GGGGFGGGGCGGGGFGGGGGGGGGGGGGGFGGCG

@gi|110798562|ref|NC_008261.1|-100.101.325660/2
TAGTAGTGGGCTCTCTTTGTAAAATATAAACATCCGTATACGGAGTGATAATAGATTATACTTGATAGATTAAATTTTATTGAAAATAAATATGATGAAC
+
C2C*G*5)(*@4(G##:GGGF4G3,*D*#G#G(G#G*E05GGGGGG+.E+*5DGFG*4G8G1G+G+*GG87CGGCFGEG0FGCGFGGG+GGGGGGGGGGF

My version seems to expect a " " as delimiter to create a single key. Thus, I was getting the error : ..... ": not matched in input files"
I add a " " before the "/" and it solve the issue. I, notice after that a new parameter (-t) was added to handle these situations.

After, another error prompted: "Sequence/quality scores do not match". This is thrown because of "ERRQUAL". The reads do not have any issue and I have been able to run the datasets with many other tools (BBMerge, USEARCH, FLASH, PEAR, etc...)

I am sharing a small dataset, in case you want to investigate what could be the problem?
reads_NC_008261.1.100.101.10_R1.fq.gz
reads_NC_008261.1.100.101.10_R2.fq.gz

Thanks

Error! -2 cannot open file for reading

Getting a very bizarre Error! -2 cannot open file for reading when trying to run Ngmerge in stitch mode but only when NGmerge is run via a SLURM batch script

If I run the exact same NGMerge command (./NGmerge -1 r1.fq -2 r2.fq -o output.fq, e.g.) through the interactive command line, works no problem

If I take that same command and run it as a part of a bash submission script for a SLURM job on a HPCC it fails with Error! -2 cannot open file for reading

It looks like theres some issue when it attempts to stat both files into memory?

Error! not matched in input files

Hi,

I am working on reprocessing some samples and I want to use NGmerge to properly merge the PE reads. For this I convert an existing .bam file to fastQ files and use them as input for NGmerge. I execute the program like this:

NGmerge -w resources/qual_profile.txt -u 41 -n 8 -z -1 FILE_R1.fastq.gz -2 FILE_R2.fastq.gz -o FILE_merged.fastq.gz -f FILE_nonmerged -l FILE.log

For most samples, everything works like a charm, but for some I get errors like this:
Error! @HISEQ_172:2:2211:1315:83788 BC:Z:NAGCGTTANGAGTCAA: not matched in input files

Any idea what the problem might be?

qual_profile

My fastq file is Illumina-1.8 Phred+33 format, so I need to edit qual_profile.txt to expand the score range. What numbers in the rows and columns should I add to each "match" and "mismatch" matrix in the file?

NGmerge failing if read IDs are indicated by a forward slash

Hello, I'm working on merging HMP data, where read IDs in forward vs. reverse reads are delineated by a forward slash, "/".

For example, the first read is @HWI-EAS319_616WC:3💯10067:14224/1 in the forward reads and @HWI-EAS319_616WC:3💯10067:14224/2 in the reverse reads. Other mergers have been able to accommodate this, but NGmerge reports these as different reads and fails.

Is there a method to adjust for these? Is there a different forward/reverse read delineator that NGmerge expects?

feature request: use false positive rate instead of error rate?

Hi, I'm a big fan of this software but was wondering if it might make sense to provide the option to threshold based on a false positive rate instead of error rate (similar to what SeqPurge does using the binomial distribution calculation), since longer overlaps should be more tolerant of higher error rates. We've found that we obtain the best performance when piping multiple instances of NGmerge to grossly simulate this effect; e.g. to simulate a 1E-6 FP threshold, we allow 8% errors for overlaps of 10-14 bp, 17% errors for overlaps of 15-19 bp, and 23% errors for overlaps of 20+ bp. But obviously this is still overly stringent for longer overlaps, not to mention time consuming.

Bioconda package

Hi,

a Bioconda package would be super useful for this program.

Thanks,
Bjoern

False events

(I received this question by email and am including it and the response below - jmg)

Maybe I missed it but do you have any data on number of false merging events and false non merging events (when insitu data predicted that the reads could have been joined or such?)

Undertanding how merged read when bases unmatched

Hello,

I am not able to understand this

CCCTCGTACTAGTTAAAGTGGCCTAAGAAGACGACACATGAGAGCGAGAA GGG ATGCCCAAAGCTGTCGTTTGTTACAAAGCCGCATAGCTTGATGATGTTGG R1
CCAACATCATCAAGCTATGCGGCTTTGTAACAAACGACAGCTTTGGGCAT AAA TTCTCGCTCTCATGTGTCGTCTTCTTAGGCCACTTTAACTAGTACGAGGG R2
CCCTCGTACTAGTTAAAGTGGCCTAAGAAGACGACACATGAGAGCGAGAA GGT ATGCCCAAAGCTGTCGTTTGTTACAAAGCCGCATAGCTTGATGATGTTGG merged with NGmerge


CCCTCGTACTAGTTAAAGTGGCCTAAGAAGACGACACATGAGAGCGAGAA GGG ATGCCCAAAGCTGTCGTTTGTTACAAAGCCGCATAGCTTGATGATGTTGG R1
CCAACATCATCAAGCTATGCGGCTTTGTAACAAACGACAGCTTTGGGCAT AAA TTCTCGCTCTCATGTGTCGTCTTCTTAGGCCACTTTAACTAGTACGAGGG R2
CCCTCGTACTAGTTAAAGTGGCCTAAGAAGACGACACATGAGAGCGAGAA TTT ATGCCCAAAGCTGTCGTTTGTTACAAAGCCGCATAGCTTGATGATGTTGG merged with PEAR

These are simulated reads I have created to understand and compare these programs. I cannot figure out why in your program, a T is added when R1 has a G and R2 an A. Almost similar happens with PEAR.

I provide the command used in both programs

pear \
-f simulated_R1_dif.fastq \
-r "simulated_R2_dif.fastq \
-o pear_output.fastq


apptainer run ngmerge_latest.sif \
/bin/bash -c "cd /NGmerge && NGmerge \
-1 simulated_R1_dif.fastq" \
-2 simulated_R2_dif.fastq" \
-o /merged.fastq"

bioconda install

Hi, I went to install NGmerge through bioconda on an ubuntu terminal (operated on a Windows computer). I received the following error about solving the environment. This error is unique to NGmerge as I have been able to install several other packages through bioconda. I'm using the latest version of anaconda3 for linux x64 (https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh)

Thank you for your help!

(ngmerge) passeguelab@BB11CSCI-M003:~$ conda install -c bioconda ngmerge

Collecting package metadata (current_repodata.json):
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError:'

feature request: ubam input/output?

Pretty self-explanatory. We are trying to eliminate the need to ever process data in fastq format in our pipeline. We probably wouldn't need the ability to convert fastq to ubam or vice-versa (although I wouldn't object), but having the ability to run ubam < ngmerge > ubam would be very appreciated.

no adapter removal with dovetailed alignments

for the dovetailed alignments possible to retain the adapter sequences at both ends?

(bio)conda recipe needs to be updated

Hi,

https://github.com/bioconda/bioconda-recipes/blob/master/recipes/ngmerge/meta.yaml uses the outdated https://github.com/harvardinformatics/NGmerge repository.

Regards,
Stephan

Error! Quality score file missing values for score range

My fastq file is Illumina-1.8 Phred+33 format, how to solve this problem?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

jsh58 / ngmerge Goto Github PK

ngmerge's Introduction

NGmerge: merging paired-end reads and removing sequencing adapters

Table of Contents

Introduction

Quick start

Software compilation

Usage message

Alignment method

Stitch mode

I/O files and options

Input files

Output files and options

Alignment parameters

Quality score profile options

Adapter-removal mode

I/O files and options

Input files

Output files and options

Alignment parameters

Miscellaneous

Contact

ngmerge's People

Contributors

Stargazers

Watchers

Forkers

ngmerge's Issues

Recommend Projects

Recommend Topics

Recommend Org