stevenwingett / fastq-screen Goto Github PK

View Code? Open in Web Editor NEW

54.0 4.0 15.0 1.92 MB

Detecting contamination in NGS data and multi-species analysis

Home Page: https://stevenwingett.github.io/FastQ-Screen/

License: GNU General Public License v3.0

Perl 14.90% HTML 85.10%

ngs metagenomics bowtie bwa bowtie2 contamination detection fastq

fastq-screen's People

Contributors

Stargazers

Watchers

Forkers

kpbioteam nadezdabogdanovic lrb-iimcb gashanjakim ascottsimons wook2014 jing-xinxing suharoschi molecules qindan2008 joshloecker nate-chapin mp15 xiaobo199405

fastq-screen's Issues

Using pigz for parallel gzip operations

Hello, it is possible to use pigz for parallel gzip operations? I've allowed 10 threads for fastq_screen, but only a single thread is used for gzip. It would be super helpful for large workflows!

Thanks for any help!

pair-end screen

Hi,

using fastq-screen, i check contamination of ngs data.
but i got pair-end data and fastq-screen does not support this mode.

so how can i handle this?
would i merge R1 and R2 to check the sample itself?

Best Regards

Jeongmin

[Help/Improvment] - Filter options plus no-hits

Hi,

I would like to know if it's possible to filter, at the same time, reads that do not map + another filter options?

I mean, some way to avoid multiple filtering rounds in the hypothetical situation described below:

genomes: Human:Mouse:Ecoli:PhiX:Adapters
desired filters: --filter 53555 and --filter 00000

Best regards

Changeable Bowtie2 options (--very-fast)

I used fastQScreen and had many unmapped reads (Hit_No_Genomes).

I wondered if the options for Bowtie2 "--very-sensitive-local" would result in a better mapping. Indeed around 20 % of the unmapped reads are now sorted.

I changed the line (1259) inside the fastq_screen file for this. I think it is not possible to change this via the command line options for the aligner.

Is there any reason why this is not changeable via the options? Do i miss something important here? In the manual it is stated not to change this:

bowtie2 : Specify extra parameters to be passed to Bowtie 2. These parameters should be quoted to clearly delimit Bowtie 2 parameters from FastQ Screen parameters. You should not try to use this option to override the normal search or reporting options for bowtie which are set automatically but it might be useful to allow reads to be trimmed before alignment etc.

[Enhancement] Add option to download bisulfite indexes

If possible, could you please add another option --bisulfite which, when specified in conjunction with --get_genomes, downloads pre-made indexes for Bismark in addition to the Bowtie2 indexes you are already offering.

It might be useful to add to the --help text that this will be in the range of 10-15GB for each normal as well as bisulfite indexes.

Many thanks! Felix

Pairing filtered FASTQ files

FastQ Screen processes paired FASTQ files independently. This means that the order of the reads in the filtered results (derived from paired input files) will most likely not correspond to one another.

Come up with a solution that pairs reads in FASTQ files (a read may be present in one file, but not in its pair). Also, check whether the output is in the same order as the input for this processing.

Question- how to include no hit reads into final fastq data while screening?

Hi, i ran fastqc-screen on bacterial fastq data. As the reference data contain only E. coli data my most of the reads were not mapped to any of the reference genome and comes as no hit. I want to screen my data and take the not hit reads in my final fastq files. How can i do it please suggest.

How to run Fastq-screen on paired end RNA-Seq data?

Hello There,

I noticed that the flag for paired end data is removed in one of the version. Could you let me know how I can run Fastq-Screen for paired data? Should I run each read separately?

Thank you,
Lida

[Feature Request] Simplify adding genomes to config file

It would be nice to add a new option to FastQ Screen to streamline adding new genomes. One might e.g.
specify:

--new_genome_name Potato and --new_genome_path /bi/scratch/Genomes/Solanum_tuberosum
or
--new_genome_name Mycoplasma and --new_genome_path /bi/scratch/Genomes/Mycoplasma

FastQ Screen would then scan the specified folder for valid Bowtie2/BWA/Bismark indexes, and modify the Config file accordingly. Many thanks!

can we download a subset of index database?

Hi,
I would like to download only Mouse, Human, rRNA and Mitochondria index databases. Is this possible using --get_genomes?

Thanks,
M

FastQ Screen fails to clean up after itself

In --bisulfite mode, FastQ Screen seem to leave some files behind, e.g.:

Cat.lane1_another_sample_R2_trimmed.fq.gz_temp_subset.fastq_C_to_T.fastq
Cat.lane1_another_sample_R2_trimmed.fq.gz_temp_subset.fastq_G_to_A.fastq
lane1_another_sample_R2_trimmed.fq.gz_temp_subset.fastq
lane1_other_sample_R1_trimmed.fq.gz_temp_subset.fastq
lane1_other_sample_R2_trimmed.fq.gz_temp_subset.fastq
lane1_some_sample_R2_trimmed.fq.gz_temp_subset.fastq
Opossum.lane1_other_sample_R1_trimmed.fq.gz_temp_subset.fastq_C_to_T.fastq
Opossum.lane1_other_sample_R1_trimmed.fq.gz_temp_subset.fastq_G_to_A.fastq
Opossum.lane1_other_sample_R2_trimmed.fq.gz_temp_subset_bismark_bt2.bam
Opossum.lane1_other_sample_R2_trimmed.fq.gz_temp_subset_bismark_bt2_SE_report.txt
Opossum.lane1_other_sample_R2_trimmed.fq.gz_temp_subset.fastq_ambiguous_reads.fq.gz
Opossum.lane1_other_sample_R2_trimmed.fq.gz_temp_subset.fastq_C_to_T.fastq
Opossum.lane1_other_sample_R2_trimmed.fq.gz_temp_subset.fastq_G_to_A.fastq
Opossum.lane1_some_sample_R2_trimmed.fq.gz_temp_subset.fastq_C_to_T.fastq
Opossum.lane1_some_sample_R2_trimmed.fq.gz_temp_subset.fastq_G_to_A.fastq

Not sure why this happens.

Fail to install GD module

I follow the installation guide to install prerequisite GD module, but it gives errors:

$ perl -MCPAN -e "install GD"
Reading '/home/ericjuo/.cpan/Metadata'
  Database was generated on Sat, 19 Mar 2022 01:55:59 GMT
Running install for module 'GD'
Checksum for /home/ericjuo/.cpan/sources/authors/id/R/RU/RURBAN/GD-2.76.tar.gz ok
'YAML' not installed, will not store persistent state
Configuring R/RU/RURBAN/GD-2.76.tar.gz with Makefile.PL
Package gdlib was not found in the pkg-config search path.
Perhaps you should add the directory containing `gdlib.pc'
to the PKG_CONFIG_PATH environment variable
No package 'gdlib' found
 at Makefile.PL line 530.
*** can not find package gdlib
*** check that it is properly installed and available in PKG_CONFIG_PATH
 at Makefile.PL line 530.
Warning: No success on command[/usr/bin/perl Makefile.PL INSTALLDIRS=site]
  RURBAN/GD-2.76.tar.gz
  /usr/bin/perl Makefile.PL INSTALLDIRS=site -- NOT OK

I am using Ubuntu 20.04.4 LTS

Zipping of the subset file

It was pointed out to me that both bowtie and bowtie2 can use .gz and .bz files as input. Maybe modify the code so that the susbet file is itself zipped.

more information about "vectors"?

Dear Steven,

Can I have more information about the category of "vectors" in the fastq_screen?

Thanks.

Shicheng

Error when running on cluster "undefined symbol: perl_xs_handshake"

Hi,
Whenever I try to run fastq_screen on linux-based cluster I get the error "undefined symbol: perl_xs_handshake" I am using Perl 5.24

thanks

Create a DockerHub container

Make FastQ Screen available on DockerHub

FASTA files

Make FastQ Screen Compatible with FASTA Files

Filtering was killed

Hello, using the following command, I was filtering a compressed fastq file (.*fq.gz) for only unique or multiple hits on the human genome:

/mnt/d/dados_geneticos/nebula_genomics/FastQ-Screen-0.15.3/./fastq_screen --conf /mnt/d/dados_geneticos/nebula_genomics/FastQ-Screen-0.15.3/FastQ_Screen_Genomes/fastq_screen.conf --aligner bowtie2 --tag /mnt/d/dados_geneticos/nebula_genomics/fastq_screen/NG1U1B2ET6_1.fq.gz --filter 30000000000000 --outdir /mnt/d/dados_geneticos/nebula_genomics/fastq_screen/fastq_screen_filtered
However, after a long time (at least 10h), it was killed and only a temporary file was left: "NG1U1B2ET6_1.fq.gz_temp_subset.fastq". I will post the log below:

Using fastq_screen v0.15.3
Reading configuration from '/mnt/d/dados_geneticos/nebula_genomics/FastQ-Screen-0.15.3/FastQ_Screen_Genomes/fastq_screen.conf'
Adding database Human
Adding database Mouse
Adding database Rat
Adding database Drosophila
Adding database Worm
Adding database Yeast
Adding database Arabidopsis
Adding database Ecoli
Adding database rRNA
Adding database MT
Adding database PhiX
Adding database Lambda
Adding database Vectors
Adding database Adapters
Using 7 threads for searches
Option --subset set to 0: processing all reads in FASTQ files
Processing NG1U1B2ET6_1.fq.gz
Not making data subset
Searching NG1U1B2ET6_1.fq.gz_temp_subset.fastq against Human
Killed

My computer has 32 GB memory, and I ran the command in a Mamba environment with bowtie2 installed. Could this issue caused by the lack of memory?

Change single MT genome to multi-species mitochondria category

We just encountered a case where RNA-seq libraries contained 50-70% mitochondrial reads, but FastQ Screen reported only 0.5-2%.

After some tests we found that the mitochondrial sequence used in FastQ Screen right now is the human sequence (MT dna:chromosome chromosome:GRCh38:MT:1:16569:1 REF), but the pig sequence (MT dna:chromosome chromosome:Sscrofa11.1:MT:1:16613:1 REF) is sufficienctly different to explain this discrepancy.

A suggested change would be to concatenate the MT sequences of several known organisms together, so an 'MT contamination' can be spotted more easily.

v0.10.0 much slower when using higher number of threads

Hi,

I'm using FastQ-Screen version 0.10.0 and I found that using 48 threads took seven times longer than using 8 thread for the same data (41 million pairs of 125 bp read). I wonder if it is expected or if this issue is already tackled in the later versions.

By the way, I used bowtie2 v2.3.4.1 when using 48 threads and v2.3.0 when using 8 threads.

Thank you for any clue.

With BWA, R1 & R2 files can have very different mapping signatures

When the aligner is set to bwa, in two separate data sets we are seeing some libs with very different "hit profiles" or "mapping signatures" between R1 and R2 files. For example, R1 may have 40% of reads map to our fish genome fasta, and R2 from the same library can have 0%. We have not observed this behavior when the aligner is set to bowtie2.

I've attached two report files, one from bwa and one from bowtie2. Realize that we are still optimizing the slurm array that runs fastq_screen, which explains why some results are missing. I will get the exact options and arguments that were used from the person running the analyses and post them.

The fastq files were previously processed with fastp to remove questionable reads (<140 bp after trimming, low complexity were filtered out) and clumpify.sh to remove PCR and optical duplicates, so the data fed to fastq_screen should be fairly pristine.

bwa_vs_bowtie2.zip

fastq_screen does not yield non-zero exit code when aligner returns non-zero exit code

fastq_screen does not yield non-zero exit code when aligner returns non-zero exit code.

This seems like a problem since the expected output files are all made, but the reported values will be different when the aligner fails. Also it would be nice to have a non-zero exit code when the aligner fails since I am using fastq screen with nextflow, and it does not crash the workflow when it fails.

This is the out put of the log:
See the line:
Aligner warning: Killed
Aligner warning: (ERR): bowtie2-align exited with value 137

+ fastq_screen --aligner bowtie2 seq006983_S23_L004_R2_001.fastq.gz
Using fastq_screen v0.14.0
Reading configuration from '/usr/local/share/fastq-screen-0.14.0-0/fastq_screen.conf'
Adding database Mitochondria
Adding database Arabidopsis
Adding database PhiX
Adding database Rat
Adding database E_coli
Adding database Mycoplasma
Adding database Vectors
Adding database Mouse
Adding database Adapters
Adding database Drosophila
Adding database Worm
Adding database Lambda
Adding database rRNA
Adding database Yeast
Using 1 threads for searches
Option --subset set to 100000 reads
Processing seq006983_S23_L004_R2_001.fastq.gz
Counting sequences in seq006983_S23_L004_R2_001.fastq.gz
Not making subset of 100000 since 24708 actual sequences is too low or close enough
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Mitochondria
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Arabidopsis
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against PhiX
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Rat
Aligner warning: Killed
Aligner warning: (ERR): bowtie2-align exited with value 137
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against E_coli
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Mycoplasma
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Vectors
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Mouse
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Adapters
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Drosophila
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Worm
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Lambda
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against rRNA
Searching seq006983_S23_L004_R2_001.fastq.gz_temp_subset.fastq against Yeast
Processing complete

Support for STAR

Amazing tool, thanks!

I was wondering if there's any plan to include fast splice-aware mappers like STAR to the list of supported tools

Thanks!

Error while performing --bisulfite read mapping

Hello !

I am running fastq-screen installed through conda. But I am getting this error

> cat .command.err
Using fastq_screen v0.14.0
Defaulting to Bowtie 2 for --bisulfite mode
Reading configuration from 'FastQ_Screen_Genomes/fastq_screen.conf'
Adding database Rat
Adding database Drosophila
Adding database Worm
Adding database Yeast
Adding database Arabidopsis
Adding database Ecoli
Adding database PhiX
Adding database Lambda
Adding database Vectors
Adding database Adapters
Using 8 threads for searches
Option --subset set to 500000 reads
Processing test_reads_1.fq
Counting sequences in test_reads_1.fq
Not making subset of 500000 since 6146 actual sequences is too low or close enough
Searching test_reads_1.fq_temp_subset.fastq against Rat
Could not run Bismark with command: '/users/rahul.pisupati/.conda/envs/fastq_screen/bin/bismark  --path_to_bowtie /users/rahul.pisupati/.conda/envs/fastq_screen/bin/ --ambiguous --bowtie2   --non_directional --prefix Rat --output_dir work_nf/bf/158e263020dddd370b9b9d6fbc42c9/ FastQ_Screen_Genomes/Rat/ work_nf/bf/158e263020dddd370b9b9d6fbc42c9/test_reads_1.fq_temp_subset.fastq 1>/dev/null 2>work_nf/bf/158e263020dddd370b9b9d6fbc42c9/aligner_standard_error.qd9tUFOW.txt'.

But I could run the exact command directly on the terminal. Can you please check if there is something I am missing?

fastq-screen works perfectly well and generates png output without --bisulfite option.

Cheers,
R

rRNA genome

Create a rRNA genome for download (--get_genomes)that comprises multiple species and document this in the config file.

R1 and R2 generate map differently to the human genome

I run FastQScreen on my WGS samples and I got different values for mapping read1 and read 2 to the genome? I can't understand what causes this 2 % bias for reverse read? The example is given below.

Name	Human counts	Human %	total_reads	HOMD counts	HOMD %	No hits counts	No hits %
A_R1	99308	99.32	99996	3248	3.26	679	0.68
A_R2	97035	97.03	99996	4240	4.24	2869	2.87
B_R1	99538	99.55	99994	4313	4.33	439	0.44
B_R2	97512	97.52	99994	5330	5.34	2369	2.37
C_R1	99432	99.42	100004	3161	3.15	570	0.57
C_R2	97515	97.51	100004	4366	4.37	2410	2.41
D_R1	99557	99.55	100004	4329	4.32	420	0.42
D_R2	97522	97.52	100004	5596	5.59	2310	2.31

FastQC on output

I'm trying to run a FASTQC on the output of any filtered FastQ_screen --filter and am receiving errors about a corrupted fastq file

Code ran on the test dataset.

First tagged the fastq file

FastQ-Screen-0.15.2/fastq_screen --tagged fqs_test_dataset.fastq.gz

Filtered out reads mapped to yeast

FastQ-Screen-0.15.2/fastq_screen --filter ---0 fqs_test_dataset.tagged.fastq.gz

Run FastQC

FastQC/fastqc -t 8 fqs_test_dataset.tagged_filter.fastq

I receive the following error.

Failed to process fqs_test_dataset.tagged_filter.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)

I checked and FastQC runs on both the original file, and the tagged file. It's only the filtered file where there's corruption. Is there anyway I can either recover the integrity of this filtered fastq file OR ensure that the filtered file is output without corruption so I can run a FASTQC and use it for other downstream analysis?

SAM or BAM files extraction after alignment

Is there anyway to extract SAM or BAM files from the alignment process?

More info about rRNA and mitochondrial indexes

Thanks for creating this useful tool. Would it be possible to provide more information about the rRNA and mitochondrial default indexes? For example, which species are included?

Secondly, in order to evaluate the 'rRNA' alignment rate properly, it would be really helpful to know which sequences are included. In particular, does this database include sequence outside of the genome assemblies? For example, the 'Rn45s' annotated on chr17:39842997-39848829 in mm10 seems to be a partial 45S compared to the full pre-rRNA 45S (NR_046233.2) but that is the best match when I BLAST NR_046233.2 against mm10. Previously, I have also observed a decreased unmapped rate for RNA-seq (ribo-depletion, so we were expecting higher rRNA) when I add NR_046233.2 to the mm10 reference.

Would appreciate your thoughts on accurately quantifying rRNA and mitochondrial reads in a library. Thanks again!

How many nucleotides (bp) need to be matched between one read of our genome and reference genome?

Hello, I searched in your paper and manual but couldn't find how many nucleotides (bp) need to be matched between one read of our genome and reference genome?

Thank you for your help.
Lida

Error with certificate for --get-genome

Hi,
I have trouble getting genomes using the fastq_screen --get_genomes command line.

Here is my command

fastq_screen --get_genomes --outdir genomes

Here is the error

Output directory 'genomes' does not exist, creating directory
Downloading FastQ Screen Genomes
--2021-10-08 14:12:24--  http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/genome_locations.txt
Resolving www.bioinformatics.babraham.ac.uk... 149.155.133.4
Connecting to www.bioinformatics.babraham.ac.uk|149.155.133.4|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/genome_locations.txt [following]
--2021-10-08 14:12:29--  https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/genome_locations.txt
Connecting to www.bioinformatics.babraham.ac.uk|149.155.133.4|:443... connected.
ERROR: cannot verify www.bioinformatics.babraham.ac.uk’s certificate, issued by “/C=US/O=Let's Encrypt/CN=R3”:
  Issued certificate has expired.
To connect to www.bioinformatics.babraham.ac.uk insecurely, use ‘--no-check-certificate’.
Could not run command 'wget www.bioinformatics.babraham.ac.uk/projects/fastq_screen/genome_locations.txt'

I tried it in multiple folder to ensure it is not a permission issue.

Can you help me please?

Have a nice day,
Sebastien

Configuration file

Check/make sure the example configuration file is consistent with regard to how Bismark/BWA/Bowtie2 are mentioned.

FastQ Screen Anaconda

Make FastQ Screen available on via Anaconda.

Filtering bisulfite fastq files

Hello,

I'm trying to run FastQ-Screen on bisulfite data on sheep. I've managed to run FastQ-Screen successfully before as a QC check on the data (on 80GB memory), but am having trouble with memory now (currently trying 150GB memory) that I am trying to create a tag file to filter out controls (puc19 and lambda) which were spiked in with my samples to test for bisulfite conversion efficiency. I am running this on a HPC and have noticed that a number of 'core.#####' files are created in the directory which seems to be ~23GB each and I wonder if this is related to the problem. I never noticed these files before when running fastq-screen without --tag (or on any other jobs on the HPC) - are these files created by FastQ-Screen and are they needed? If you have any advice on how to reduce the memory needed for the job that would be really appreciated - I am currently only working on a pilot dataset and will be working with a much larger dataset in the future.

The code I'm using is:

fastq_screen --tag --conf fastq_screen.conf --bisulfite --outdir tagged_data $datapath/*.gz

Thanks so much in advance,
Alex

In bisulfite mode, run Bismark using local alignments

To avoid having to run FastQ Screen on trimmed FastQ files in --bisulfite mode, could you please allow Bismark to be run using local alignments (flag: --local), similar to what you are doing already for Bowtie2 and BWA?

How to extract rRNA reads from metagenomic RNA-seq data

Hi Steven,

I notice fastq_screen could counts the proportion of rRNA in the metagenome fastq files. I want to know how to extract all the rRNA reads. I want to do some further analysis to rRNA reads from RNA-seq data of metagenome samples.

Thanks.

get_genome bisulfite combination

Make sure the command:
fastq_screen --bisulfite --get_genome

is explained in all the documentation.

Wrong path in config file when using --get_genomes

The config file obtained with --get_genomes has a path which I assume is from a Babraham Institute server. I guess this is due to the replacement code failing in the following line, probably due to the path being present in the original config file [1] instead of the placeholder text.

FastQ-Screen/fastq_screen

Line 2407 in 16c4cb4

$line =~ s/\[FastQ_Screen_Genomes_Path\]/\/$outdir\/FastQ_Screen_Genomes/;

[1] http://ftp1.babraham.ac.uk/ftpusr24/FastQ_Screen_Genomes/fastq_screen.conf

Message when passed non-valid path to bowtie2

The message below could be more elegant:

(base) ultraviolet@ultraviolet-X555QA:~$ fastq_screen_v0.13.0/fastq_screen fsq_test_dataset.fastq.gz
Using fastq_screen v0.13.0
Reading configuration from '/home/ultraviolet/fastq_screen_v0.13.0/fastq_screen.conf'
Aligner (--aligner) not specified, but Bowtie2 path and index files found: mapping with Bowtie2
Using '/usr/local/bin/bowtie2-2.3.3.1-linux-x86_64/bowtie2' as Bowtie 2 path
Adding database Human
Adding database Mouse
Adding database Ecoli
Adding database rRNA
Adding database MT
Adding database Lambda
Adding database Vectors
Adding database Adapters
Use of uninitialized value $path_to_bowtie in string eq at fastq_screen_v0.13.0/fastq_screen line 1515, line 252.
Aligner bowtie2 not exectable at '/usr/local/bin/bowtie2-2.3.3.1-linux-x86_64/bowtie2', please adjust configuration

Improvement: Account for multiple mapping in bisulfite mapping mode

Dear Steven,

I was recently looking at for potential contamination of a bisulfite sequencing sample with FastQ Screen (using --bisulfite), and got the following result:

This suggested that some 60% was of human origin, and most of the other reads had no hits to any genome I tested against (quite a few).

When I then aligned the data to the human genome using Bismark I got these alignment statistics:

Final Alignment report
======================
Sequences analysed in total:    430404
Number of alignments with a unique best hit from the different alignments:      274999
Mapping efficiency:     63.9%
Sequences with no alignments under any condition:       33043
Sequences did not map uniquely: 122362

This confirmed that 64% of reads aligned to the human genome, and further showed that another 1422362 reads, or 28.4%, aligned to the human genome in an ambiguous manner. Taken together, the sample seemed be of at least 92% human origin, which wasn't obvious from the initial FastQ Screen plot.

May I suggest you simply extract the number of ambiguously aligned sequences from the Bismark report and add it as Multiple hits/one genome (and/or Multiple hits/multiple genomes)?

Many thanks, Felix

Reference genomes on Babraham Bioinformatics website are no longer available for download

Running

fastq_screen --get_genomes

as instructed here will return a 404 NOT FOUND against https://ftp1.babraham.ac.uk/ftpusr46/FastQ_Screen_Genomes/

--get_genomes download failed

Hi,

I am trying to download the pre-built Bowtie2 indices using fastq_screen --get_genomes

However, I have been receiving the following error:
"_Connecting to ftp1.babraham.ac.uk (ftp1.babraham.ac.uk)|149.155.133.2|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-01-11 15:13:35 ERROR 403: Forbidden.

Could not run command 'wget --no-check-certificate -r --no-parent -R 'index.html*' ftp1.babraham.ac.uk/ftpusr46/FastQ_Screen_Genomes/'_ "

Is this a known problem and is there a solution to it or an alternative way of downloading the dataset?

Thanks for the help

Kind regards,

Tatiana

Unable to download v0.15.3 FastQ-Screen or --get_genomes

Hi,

I am very new to this and trying to follow the instructions in the introductory videos.

After putting in this command,
"wget https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/fastq_screen_v0.15.3.tar.gz"

I got this error,
"Resolving www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)...149.155.133.4
Connecting to www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk) | 149.155.133.4 | :443...connected.
HTTP request sent, awaiting response...404 Not Found
2023-08-11 11:10:49 ERROR 404: Not Found."

I can only manage to download v0.14.0 for now. Then, when I used the command "fastq_screen --get_genomes"

The following error appeared,
"Unknown option: get_genomes
Could not parse options, please adjust configuration."

Could you kindly help me?
Thanks!

Invalid bowtie path

Email to me:
I am writing to you because I recently discovered that if anyone specifies the path to bowtie in the configuration file for fastqscreen, and for whatever reason that path is no longer valid (in our case IT upgraded to a new OS and that changed the path), then fastqscreen does not exit from the job but creates all the relevant plots and data files showing no alignment.

Doesn't match if U nucleotide used in sequence

If you have an RNA sequence with U's, the U's will have to be converted to T's to get an alignment. Just something users might want to be aware of.

Skipping DATABASE

Hello,

I have been adding a reference genome to fastq_screen.conf file but there are some problems.

I add this path to the .conf file:
## Western honey bee
DATABASE Apis_Mellifera /MGSvm_data/mpolat/apis_mellifera_ref/fasta

In the fasta folder I have my reference genome in fast format and five additional files produced by bwa index command.

"/MGSvm_data/mpolat/apis_mellifera_ref/fasta" this path contains the following files:

apis.fasta
apis.fasta.pac
apis.fasta.amb
apis.fasta.ann
apis.fasta.bwt
apis.fasta.sa

Whatever I do (I changed the file name into "apis", "apis.fasta", "fasta") but it gives the same error:

Using fastq_screen v0.15.1
Reading configuration from '/MGSvm_data/mpolat/fastq_screen/FastQ-Screen-0.15.2/fastq_screen.conf'
Using '/usr/bin/bwa' as BWA path
Skipping DATABASE 'Apis_Mellifera' since no BWA index was found at '/MGSvm_data/mpolat/apis_mellifera_ref/fasta'
Using 8 threads for searches
No reference genomes were configured, please adjust configuration.

I need help 👯

Processing very long reads

Devise a way to handle very long reads (e.g. Oxford Nanopore).

Maybe make compatible with Minimap2 - as discussed by email with users. Alternatively, break up reads and process with Bowtie/Bowtie2

Discrepancy in ambiguous alignments between default and bisulfite mode

Currently, there seems to be a discrepancy in the counting of ambiguously mappable sequences between the default mode, and the --bisulfite mode. Here is an example of a human RRBS sample which was aligned with FastQ Screen in default mode:

It doesn't really produce uniquely aligned reads, which is fine as this is a bisulfite library. Of note, the sample contains ~35% of microsatellite sequences, a multimer of (TGGAA)n (see also here FelixKrueger/Bismark#265). This satellite repeat contamination, which is present in all animal species tested, is responsible for a generally low unique mapping efficiency.

When I ran FastQ Screen in --bisulfite mode, it does identify the sample as mainly human, but interestingly it does not show the ambiguously aligned micro-satellite sequences in all species:

I suspect that the counting of ambiguous alignments in --bisulfite mode might be missing this contaminant. Maybe this has to do with the formatting of the read ID that is written out into the ambiguous.fastq file?

issue installing..

used this

perl -MCPAN -e "install GD"

and got this
test Summary Report

t/windows_bmp.t (Wstat: 65280 Tests: 1 Failed: 0)
Non-zero exit status: 255
Parse errors: Bad plan. You planned 4 tests but ran 1.
Files=11, Tests=50, 1 wallclock secs ( 0.06 usr 0.02 sys + 0.53 cusr 0.26 csys = 0.87 CPU)
Result: FAIL
Failed 1/11 test programs. 0/50 subtests failed.
make: *** [test_dynamic] Error 255
RURBAN/GD-2.78.tar.gz
/usr/bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
reports RURBAN/GD-2.78.tar.gz
Running make install
make test had returned bad status, won't install without force

... I don't know how to fix the issue or what the issue is....

any suggestion??

Thanks a lot

Only "no hits" in the output!!

Hi!

I used FastQ-Screen in order to test smallRNA sequencing data. As you can see in the attached report, my human samples does not map in any reference genome. Can you explain me what could be the problem in my samples?

Thank you so much,
Hela
FastQScreen_report.pdf