agordon / fastx_toolkit Goto Github PK

FASTA/FASTQ pre-processing programs

License: Other

Shell 11.86% Perl 1.32% C++ 28.21% C 36.98% Makefile 9.99% M4 5.02% Prolog 6.62%

fastx_toolkit's Introduction

FASTX-Toolkit
=============

*******************************************************************
*                                                                 *
* FASTX TOOLKIT is unmaintained software.                         *
* No new features have been added since 2010.                     *
*                                                                 *
* There are many better alternatives for low-level FASTQ/FASTA    *
* manipulation. Use at your own risk.                             *
*                                                                 *
*******************************************************************


Short Summary
===============

The FASTX-Toolkit is a collection of command line tools for Short-Reads 
FASTA/FASTQ files preprocessing.



More Details
==============

Next-Generation sequencing machines usually produce FASTA or FASTQ files, 
containing multiple short-reads sequences (possibly with quality information).

The main processing of such FASTA/FASTQ files is mapping (aka aligning)
the sequences to reference genomes or other databases using specialized
programs. 

Example of such mapping programs are:
Blat (http://www.kentinformatics.com/index.asp), 
SHRiMP (http://compbio.cs.toronto.edu/shrimp),
LastZ (http://www.bx.psu.edu/miller_lab),
MAQ (http://maq.sourceforge.net/)
And many many others.

However, 
It is sometimes more productive to preprocess the FASTA/FASTQ files before 
mapping the sequences to the genome - manipulating the sequences to 
produce better mapping results.

The FASTX-Toolkit tools perform some of these preprocessing tasks.



Available Tools
===============

FASTQ-to-FASTA - Converts a FASTQ file to a FASTA file..

FASTQ-Statistics - scans a FASTQ file, and produces some statistics about the 
	quality and the sequences in the file.
	
FASTQ-Quality-BoxPlot, and
FASTQ-Nucleotides-Distribution - Generates charts based on the statistics 
	generated by FASTQ-Statistics. These charts can be used to quickly
	see the quality of the sequenced library.
	
FASTQ-Quality-Converter - Converts from ASCII to numeric quality scores.

FASTQ-Quality-Filter - removes low-quality sequences from FASTQ files.

FASTX-Artifacts-Filter - removes some sequencing artifacts from FASTA/Q files.

FASTX-Barcode-Splitter - A common practice is to sequence multiple biological
	samples in the same library (marking each sample using a dedicated 
	barcode). The resulting FASTA/Q file contains intermixed sequences 
	from those samples. This tool separates FASTA/Q files into several 
	individual files, based on the barcodes.
	
FASTX-Clipper - Adapters (aka Linkers) are added to the library (before 
	sequencing), and should be removed from the resulting FASTA/Q file.
	This tool removes (clips) adapters.
	
FASTA-Clipping-Histogram - After clipping a FASTA file, this tool generates a
	chart showing the length of the clipped sequences.
	
FASTX-Reverse-Complement - Produces a reverse-complement of FASTA/Q file.
	If a FASTQ file is given, the quality scores are also reversed.
	
FASTX-Trimmer - Extract sub-seqeunces from FASTA/Q file. Two examples are:
	Removing barcodes from the 5'-end of all sequences in a FASTQ file;
	Cutting 7 nucleotides from the 3'-end of all sequences in a FASTA file.



Galaxy
======

Galaxy (https://usegalaxy.org) is web-based framework for computational biology.

While the programs in the FASTX-Toolkit are command-line based, the package 
include the necessary files to integrate the tools into a Galaxy server,
Allowing users to execute this tools from their web-browser.

If you run your own local mirror of a Galaxy server, you can integrate the
FASTX-Toolkit into your Galaxy server.



Software Requirements
=====================

1. GCC is required to compile most tools.

2. FASTA-Clipping-Histogram tool requires Perl, the "PerlIO::gzip",
   "GD::Graph::bars" modules.
   
   Installing the perl modules can be accomplised by running:

   $ sudo cpan 'PerlIO::gzip'
   $ sudo cpan 'GD::Graph::bars'
   
3. FASTX-Barcode-Splitter requires the GNU Sed program.
   
4. FASTQ-Quality-Boxplot and FASTQ-Nucleotides-Distribution requires the
   'gnuplot' program.


Installation
============

When downloading the git repository from github, use the following:

   $ git clone https://github.com/agordon/fastx_toolkit
   $ cd fastx_toolkit
   $ ./reconf
   $ ./configure
   $ make

When downloading a released version archive:

   $ wget https://github.com/agordon/fastx_toolkit/releases/download/0.0.14/fastx_toolkit-0.0.14.tar.bz2
   $ tar -xjvf fastx_toolkit-0.0.14.tar.bz2
   $ cd fastx_toolkit-0.0.14
   $ ./configure
   $ make

The available releases are here:
   https://github.com/agordon/fastx_toolkit/releases

To install the tools, run (as root):

  $ sudo make install

This will install the tools into /usr/local/bin.
To install the tools to a different location, change the 'configure' step to:

  $ ./configure --prefix=/DESTINATION/DIRECTORY


The libgtextutils package is required to build fastx-toolkit,
see https://github.com/agordon/libgtextutils/ .


Command Line Usage
==================

Most tools support "-h" argument to show a short help screen.
Better documentation is not available at this moment.
Some more details and examples are available in the <help> section
of the XML tool files (in the 'galaxy' subdirectory).
  
 
Galaxy Installation
===================

Galaxy Installation should be done manually, and requires technical
understading of the Galaxy framework.

1. build and install the command line tools (as described above).

2. Make backup of your galaxy installation (better safe than sorry).

3. Run the 'install_galaxy_files.sh' script, 
   and specify the galaxy root directory.
   This script copies the files from the 'galaxy' sub-directory into
   your galaxy mirror directory.
   
4. Manually add the content of ./galaxy/fastx_toolkit_conf.xml file,
   into your Galaxy's tool_conf.xml
   
5. Edit [YOUR-GALAXY]/tool-data/fastx_clipper_sequences.txt file,
   And add your custom adapters/linkers.
   
6. Modify the "fastx_barcode_splitter_galaxy_wrapper.sh" as explained
   Below (see section "Special configuration for Barcode-Splitter").

7. Restart Galaxy.

Always make backup of your galaxy server files before trying to install 
the FASTX-Toolkit. 



Galaxy Testing
==============

The following tools support Galaxy's functional testing:
(Run from Galaxy's main directory)
  $ sh run_functional_tests.sh -id cshl_fastq_qual_conv
  $ sh run_functional_tests.sh -id cshl_fastq_to_fasta
  $ sh run_functional_tests.sh -id cshl_fastq_qual_stat
  $ sh run_functional_tests.sh -id cshl_fastx_trimmer
  $ sh run_functional_tests.sh -id cshl_fastx_reverse_complement
  $ sh run_functional_tests.sh -id cshl_fastx_artifacts_filter
  $ sh run_functional_tests.sh -id cshl_fasta_collapser
  $ sh run_functional_tests.sh -id cshl_fastx_clipper
 

Special configuration for Barcode-Splitter
==========================================

When running the barcode-splitter tool from the command line you specify a 
prefix direcotry - the output files will be written to that directory (similar
to GNU's split program usage).

Running the barcode-splittter inside galaxy requires a special hack beacuse
(I don't know how to|Galaxy can't) create a variable number of output datasets.
The number of required output files is determined by the tool only AFTER reading 
the barcodes description file.

The Galaxy-version of Barcode-Splitter works like this:
1. A FASTA/FASTQ file, and a Barcode description file are fed to the tool.
2. The tool produces a single output dataset (inside galaxy). This output
   is an HTML file, containing links to the split FASTA files.
3. Users can use the links to get the split FASTA files.
   (Since Galaxy's 'upload data' tool accepts URLs, this is not a real problem).
   
4. As the galaxy administrator, you'll have to edit 
   'fastx_barcode_splitter_galaxy_wrapper.sh' script and change BASEPATH and 
   PUBLICURL to point to a publicly accesibly path on your server.
   
Example:

fastx_barcode_splitter_galaxy_wrapper.sh contains:

   BASEPATH="/media/sdb1/galaxy/barcode_splits/"
   PUBLICURL="http://tango.cshl.edu/barcode_splits/"

When a user runs the barcode splitter tool, the FASTA files will be generated in 
"/media/sdb1/galaxy/barcode_splits/".  
The URL "http://tango.cshl.edu/barcode_splits" is set (in an apache server) to
serve files from "/media/sdb1/galaxy/barcode_splits/", with the following 
configuration:

    Alias /barcode_splits "/media/sdb1/galaxy/barcode_splits/"
    <Directory "/media/sdb1/galaxy/barcode_splits/">
        AllowOverride None
        Order allow,deny
        Allow from all
    </Directory>




Licenses
========

FASTX-Toolkit is distributed under the Affero GPL version 3 or later (AGPLv3),

EXCEPT

All files under the 'galaxy' sub-directory are distributed under the
same license as Galaxy itself (which is an MIT-style license).


While IANAL, these licenses basically mean that:
1. You're free to use FASTX-toolkit,

2. You're free to integrate FASTX-toolkit in your Galaxy mirror server 
   (or any other server).
   
3. You're free to modify the files under 'galaxy',
   without making your modifications public.
   
4. If you modify the FASTX-toolkit tools, and make those modifications 
   publicly available (either as downloadable tools, part of another product),
   or as a web-based server - you must make the modified source code freely 
   available (free as in speech).
   
See the COPYING file for the full Affero GPL.
See the GALAXY-LICENSE file for galaxy's license.

Please remember: 
  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.


=============
Please send all comments, suggestions, bug reports (or better yet - bug fixes)
to [email protected] .

fastx_toolkit's People

Contributors

Stargazers

Watchers

Forkers

debian konstantinshemyak laurencedaubois janee ajo2995 sandorg leipinji snewhouse mpre yfu abdul59 dustinrae pooranis fw1121 rbloom5 patchper kgozman annafhuff rkitchen liupfskygre kyleabeauchamp shicheng-guo rheadsmith ofanoyi dsurujon gmdzy2010 skerker sdwfrost jiaolongsun dry-lab aishaaj davebx tianyunwang rpatil8 tw7649116 puneet-shivanand dfajar2 haoziyeung luanxiao wang928822536 inzilico biobuilds hartzell umich-brcf-bioinf-projects xjyx novapyth conciergesoftwaredesign linhxxx artifex00-00 wook2014 ypchan yingshanli mattheww95 genostack cellularhacker taka19881102 merv1n34k

fastx_toolkit's Issues

Using fastx_trimmer with paired reads

Hi to all!

I am trying to replicate some results from an article and here the authors claim that they use fastx_quality_filter + fast_trimmer with paired end reads. I am not sure how to do this. I mean, it seems that there is not an option to use this programs in two paired reads at the same time. In case that I use it for every read independently, I supose that I have to sincronize again the reads. Is this correct?

Thank you,
Vera

using reconf

It took me a while to figure out that the installation sequence is

cd fastx_toolkit/
./reconf
./configure
make
make install

I think the second step needs documentation.
Thanks! Volker

Recognize gzip compressed files as input

Would be nice to have the toolkit automatically recognize gzip compressed files as input instead of having to uncompress and pipe to fastx toolkit.

It seems that fastq_quality_filter version 0.0.14 use Q33 as default

This is different from version 0.0.13 which use Q64 as default. Am I right?
If it is, it will be better if this change can be made more clear to user.

Release 0.0.14 has a broken link

The link to download release 0.0.14 is broken:

https://github.com/agordon/fastx_toolkit/releases/download/0.0.14/fastx_toolkit-0.0.14.tar.bz2

Could you please provide binaries for Linux x86_64?

I cloned your repo, and successfully created a configure file, but it won't complete successfully quite yet. Could you provide any tips? Thanks!

git clone https://github.com/agordon/fastx_toolkit.git
cd fastx_toolkit/
libtoolize --force
aclocal
autoheader 
automake --force-missing --add-missing
vim configure.ac 
automake --force-missing --add-missing
autoconf
autoreconf
automake --force-missing --add-missing
./configure --prefix=/home/unix/slowikow/.local/

./configure: line 14512: syntax error near unexpected token `GTEXTUTILS,gtextutils'
./configure: line 14512: `PKG_CHECK_MODULES(GTEXTUTILS,gtextutils)'

Error with fastx make command

In file included from seqalign_test.cpp:5:0:
../libfastx/sequence_alignment.h:146:32: error: ‘ssize_t’ does not name a type
score_type safe_score ( const ssize_t query_index, const ssize_t target_index)
^
../libfastx/sequence_alignment.h:146:59: error: ‘ssize_t’ does not name a type
score_type safe_score ( const ssize_t query_index, const ssize_t target_index)
^
../libfastx/sequence_alignment.h:244:37: error: ‘ssize_t’ has not been declared
void find_alignment_starting_point(ssize_t &new_query_index, ssize_t &new_targ
^
../libfastx/sequence_alignment.h:244:63: error: ‘ssize_t’ has not been declared
void find_alignment_starting_point(ssize_t &new_query_index, ssize_t &new_targ
^
Makefile:256: recipe for target 'seqalign_test.o' failed
make[3]: *** [seqalign_test.o] Error 1
make[3]: Leaving directory '/media/wkstn/Data/Course/Project/fastx_toolkit-0.0.12/src/seqalign_test'
Makefile:252: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/media/wkstn/Data/Course/Project/fastx_toolkit-0.0.12/src'
Makefile:279: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/media/wkstn/Data/Course/Project/fastx_toolkit-0.0.12'
Makefile:209: recipe for target 'all' failed
make: *** [all] Error 2

Build for Arm Architecture

Excuse me, I have a question that I would like answered. I built this software under the arm architecture, but when I tested it, I found that its output file was inconsistent with the given standard output, but it was consistent with the result under the x86 architecture. Is this a successful build?

Suggested patch for clang 6

Build fails under clang 6 due to #pragma pack change during compilation.

We could either build everything with -fpack-struct=1 or restore default packing size after the struct def as shown in the patch below.

--- src/libfastx/fastx.h.orig   2018-05-16 14:50:08 UTC
+++ src/libfastx/fastx.h
@@ -58,7 +58,7 @@ typedef enum {
        OUTPUT_SAME_AS_INPUT=3
 } OUTPUT_FILE_TYPE;
 
-#pragma pack(1) 
+#pragma pack(push,1) 
 typedef struct 
 {
        /* Record data - common for FASTA/FASTQ */
@@ -115,6 +115,7 @@ typedef struct 
        FILE*   input;
        FILE*   output;
 } FASTX ;
+#pragma pack(pop)
 
 
 void fastx_init_reader(FASTX *pFASTX, const char* filename,

No "configure" file

There is no "configure" in https://github.com/agordon/fastx_toolkit/archive/0.0.14.tar.gz, just "configure.ac".

Installation
============

To compile to tools, run:

  $ ./configure
  $ make

I ran "autoreconf -fvi" to make it and the Makefile, but the README is incorrect.

fastx_quality_stats bug in median calculation

Hi, when I run the command

fastx_quality_stats -i input.fastq -o output.stats

, where input.fastq consists of

@0 <unknown description>
A
+
]
@1 <unknown description>
A
+
]

, then output.stats consists of

column  count   min     max     sum     mean    Q1      med     Q3      IQR     lW      rW      A_Count C_Count G_Count T_Count N_Count Max_count
1       2       60      60      120     60.00   60      50      50      -10     75      35      2       0       0       0
       0       2

. Note the med column has a value of 50, whereas the mean column has a value of 60, and the two quality scores in input.fastq are identical (]).

I believe this is a bug?

PS fastx_quality_stats -h prints:

usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE]
Part of FASTX Toolkit 0.0.14 by A. Gordon ([email protected])

   [-h] = This helpful help screen.
   [-i INFILE]  = FASTQ input file. default is STDIN.
   [-o OUTFILE] = TEXT output file. default is STDOUT.
   [-N]         = New output format (with more information per nucleotide/cycle).

The *OLD* output TEXT file will have the following fields (one row per column):
	column	= column number (1 to 36 for a 36-cycles read solexa file)
	count   = number of bases found in this column.
	min     = Lowest quality score value found in this column.
	max     = Highest quality score value found in this column.
	sum     = Sum of quality score values for this column.
	mean    = Mean quality score value for this column.
	Q1	= 1st quartile quality score.
	med	= Median quality score.
	Q3	= 3rd quartile quality score.
	IQR	= Inter-Quartile range (Q3-Q1).
	lW	= 'Left-Whisker' value (for boxplotting).
	rW	= 'Right-Whisker' value (for boxplotting).
	A_Count	= Count of 'A' nucleotides found in this column.
	C_Count	= Count of 'C' nucleotides found in this column.
	G_Count	= Count of 'G' nucleotides found in this column.
	T_Count	= Count of 'T' nucleotides found in this column.
	N_Count = Count of 'N' nucleotides found in this column.
	max-count = max. number of bases (in all cycles)


The *NEW* output format:
	cycle (previously called 'column') = cycle number
	max-count
	For each nucleotide in the cycle (ALL/A/C/G/T/N):
		count   = number of bases found in this column.
		min     = Lowest quality score value found in this column.
		max     = Highest quality score value found in this column.
		sum     = Sum of quality score values for this column.
		mean    = Mean quality score value for this column.
		Q1	= 1st quartile quality score.
		med	= Median quality score.
		Q3	= 3rd quartile quality score.
		IQR	= Inter-Quartile range (Q3-Q1).
		lW	= 'Left-Whisker' value (for boxplotting).
		rW	= 'Right-Whisker' value (for boxplotting).

Buffer overflow caused by MAX_SEQ_LINE_LENGTH being longer than MAX_SEQUENCE_LENGTH.

Hi Gordon and all,

In Computer Security, Privacy, and DNA Sequencing: Compromising Computers with Synthesized DNA, Privacy Leaks, and More (2017), Ney, Koscher, Organick, Ceze & Kohno, University of Washington, report a buffer overflow in FASTX-Toolkit, caused by the difference between MAX_SEQ_LINE_LENGTH (25000) and MAX_SEQUENCE_LENGTH (2000). Would it suffice to set MAX_SEQ_LINE_LENGTH to 2000 to solve the problem?

Does fastx_toolkit provide official test cases?

Excuse me , does fastx_toolkit provide official test cases?

how to use fastx_barcode_splitter.pl to deal with paired ends fastq

Hi，I am very confused about how to use fastx_barcode_splitter.pl to deal with paired ends fastq.
For I have R1 and R1 two ends sequence fastq files and fastx_barcode_splitter.pl is seemed to deal with single end sequence fastq files.
what should I do?
Thanks !

compile-error on gcc 7

Hi, fastx_toolkit and llibgtextutils do no longer compile when using a recent compiler like GCC 7:

make[3]: Entering directory '/tmp/SBo/fastx_toolkit-0.0.14/src/fasta_formatter'
g++ -DHAVE_CONFIG_H -I. -I../.. -I/usr/local/include/gtextutils -I../../src/libfastx -O2 -fPIC -Werror=implicit-fallthrough -Wall -Wextra -Wformat-nonliteral -Wformat-security -Wswitch-default -Wswitch-enum -Wunused-parameter -Wfloat-equal -Werror -DDEBUG -g -O1 -MT fasta_formatter.o -MD -MP -MF .deps/fasta_formatter.Tpo -c -o fasta_formatter.o fasta_formatter.cpp
fasta_formatter.cpp: In function ‘void parse_command_line(int, char**)’:
fasta_formatter.cpp:105:9: error: this statement may fall through [-Werror=implicit-fallthrough=]
usage();
~~~~~^~
fasta_formatter.cpp:107:3: note: here
case 'i':
^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:425: fasta_formatter.o] Error 1
make[3]: Leaving directory '/tmp/SBo/fastx_toolkit-0.0.14/src/fasta_formatter'

Installation includes $prefix/share/aclocal m4 macros

The installation of fastx_toolkit 0.0.14 installs m4 macros into $prefix/share/aclocal. It seems wrong to install these files when they are not actually needed at runtime.

Document `-Q` option

All tools parse (and many use) the -Q option to specify the quality offset, with default 33. However this is not documented on the website or when using the -h option of the interested tools.

configure file is in .gitignore

Shouldn't the actual configure file be included in the repo? It is currently in your .gitignore, so when I clone the repo there is no ./configure to do the build. If I download the .tar.bz2 directly from your site it includes the configure file.

Your repo does include that configure.ac file, but I don't remember that be used for the initial build.

Thanks!

src/fasta_formatter/fasta_formatter.cpp

Line 105: usage();

should be followed by "exit();"

Otherwise, compilation fails on Ubuntu 18.04 with message:

fasta_formatter.cpp:105:9: error: this statement may fall through [-Werror=implicit-fallthrough=]
usage();
~~~~~^~
fasta_formatter.cpp:107:3: note: here
case 'i':
^~~~
cc1plus: all warnings being treated as errors

Fails to build when hardening flags are enabled.

Hi Gordon,

similarly to libgtextutils, FASTX-Toolkit fails to build when hardening flags are enabled.

make[5]: Entering directory `/home/charles/debian/debian-med/fastx-toolkit/src/libfastx'
gcc -DHAVE_CONFIG_H -I. -I../..   -D_FORTIFY_SOURCE=2  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall -Wextra -Wformat-nonliteral -Wformat-security -Wswitch-default -Wswitch-enum -Wunused-parameter -Wfloat-equal -Werror -DDEBUG -g -O1 -MT chomp.o -MD -MP -MF .deps/chomp.Tpo -c -o chomp.o chomp.c
mv -f .deps/chomp.Tpo .deps/chomp.Po
gcc -DHAVE_CONFIG_H -I. -I../..   -D_FORTIFY_SOURCE=2  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall -Wextra -Wformat-nonliteral -Wformat-security -Wswitch-default -Wswitch-enum -Wunused-parameter -Wfloat-equal -Werror -DDEBUG -g -O1 -MT fastx.o -MD -MP -MF .deps/fastx.Tpo -c -o fastx.o fastx.c
In file included from /usr/include/stdio.h:937:0,
                 from fastx.c:18:
In function 'fgets',
    inlined from 'fastx_read_next_record' at fastx.c:324:11:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:261:2: error: call to '__fgets_chk_warn' declared with attribute warning: fgets called with bigger size than length of destination buffer [-Werror]
  return __fgets_chk_warn (__s, __bos (__s), __n, __stream);
  ^
In function 'fgets',
    inlined from 'fastx_read_next_record' at fastx.c:370:12:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:261:2: error: call to '__fgets_chk_warn' declared with attribute warning: fgets called with bigger size than length of destination buffer [-Werror]
  return __fgets_chk_warn (__s, __bos (__s), __n, __stream);
  ^
cc1: all warnings being treated as errors
make[5]: *** [fastx.o] Error 1
make[5]: Leaving directory `/home/charles/debian/debian-med/fastx-toolkit/src/libfastx'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/home/charles/debian/debian-med/fastx-toolkit/src'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/home/charles/debian/debian-med/fastx-toolkit'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/charles/debian/debian-med/fastx-toolkit'
dh_auto_build: make -j1 returned exit code 2
make[1]: *** [override_dh_auto_build] Error 2
make[1]: Leaving directory `/home/charles/debian/debian-med/fastx-toolkit'
make: *** [build] Error 2
dpkg-buildpackage: error: debian/rules build gave error exit status 2

Cheers,

Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan

suggested clang patch: unused variable when compiling fastx_toolkit

I'm fairly new to suggesting fixes for code, but I think I've got a useful fix for an error that comes up when compiling the toolkit using clang 17. I haven't had this issue when compiling with gcc 11.4.0 on ubuntu, but my colleague had the issue with clang 17 on macOS.

There is an unused variable in the fastx_artifacts_filter code that throws an error, specifically "n_count". I used grep to make sure that variable is not used in any other code (it's not) and just deleted it using a sed script. Here is the script that fixes the issue, run this from the fastx_toolkit-0.0.14 directory:

$ sed -i '88,90d;58d' src/fastx_artifacts_filter/fastx_artifacts_filter.c

no instruction to cite fastx

I couldn't find any instruction on how to cite fastx toolkit in a paper. What reference should we use to give credit to the authors?
-Gael

error for fastq_to_fasta

I am trying to translate ONT Minion reads from fastq to fasta but get the following error

fastq_to_fasta: Error: invalid quality score data on line 19124 (quality_tok = "+"

Any suggestions?