Comments (3)
Just download the latest version.
from seqkit.
will this always produce read_1.part_###.fasta with matching set of reads in read_2.part_###.fasta?
Yes.
time seqkit fq2fa read_1.fastq.gz \
| seqkit split2 --by-size 500000 --out-dir split_seqs --by-size-prefix read_1.part_ --extension .gz
time seqkit fq2fa read_1.fastq.gz \
| seqkit split2 --by-size 500000 --out-dir split_seqs --by-size-prefix read_1.part_ --extension .gz
Is there a way to check/validate the split files to make sure the reads are in correct order?
Use seqkit pair
(match up paired-end reads from two fastq files), which saves unpaired reads if there are.
$ seqkit pair -1 split_seqs/read_1.part_010.fasta.gz -2 split_seqs/read_2.part_010.fasta.gz -u
[INFO] 500000 paired-end reads saved to split_seqs/read_1.part_010.paired.fasta.gz and split_seqs/read_2.part_010.paired.fasta.gz
[INFO] no unpaired reads in split_seqs/read_1.part_010.fasta.gz
[INFO] no unpaired reads in split_seqs/read_2.part_010.fasta.gz
$ seqkit sum split_seqs/read_[12].part_010.fasta.gz split_seqs/read_[12].part_010.paired.fasta.gz | more
processed files: 4 / 4 [======================================] ETA: 0s. done
seqkit.v0.1_DLS_k0_e734aaf2f526e889d5da00a7df2ccdde split_seqs/read_1.part_010.fasta.gz
seqkit.v0.1_DLS_k0_f97ee32096bade173d093c37f4f592c8 split_seqs/read_2.part_010.fasta.gz
seqkit.v0.1_DLS_k0_e734aaf2f526e889d5da00a7df2ccdde split_seqs/read_1.part_010.paired.fasta.gz
seqkit.v0.1_DLS_k0_f97ee32096bade173d093c37f4f592c8 split_seqs/read_2.part_010.paired.fasta.gz
from seqkit.
I am using seqkit
version 2.3.0 and I get the following error when I use seqkit pair
with FASTA files:
$ seqkit stats split_seqs/SRR25005537_1.part_001.fasta.gz split_seqs/SRR25005537_2.part_001.fasta.gz
file format type num_seqs sum_len min_len avg_len max_len
split_seqs/SRR25005537_1.part_001.fasta.gz FASTA DNA 500,000 125,500,000 251 251 251
split_seqs/SRR25005537_2.part_001.fasta.gz FASTA DNA 500,000 125,500,000 251 251 251
$ seqkit pair -1 split_seqs/SRR25005537_1.part_001.fasta.gz -2 split_seqs/SRR25005537_2.part_001.fasta.gz -u
[ERRO] fastq files needed
from seqkit.
Related Issues (20)
- Default to not removing ‘*’ for seqkit sum of peptides? HOT 2
- `seqkit subseq` with bed file does properly handle uppercase and lowercase accession (not case-sensitive?) HOT 4
- using seqkit locate for k-mer counting HOT 2
- Locate memory usage HOT 2
- Remove reads with certain amount of ambiguous bases HOT 2
- seqkit 2.5.0 reports wrong version number (2.4.0) HOT 2
- Possible bug in seqkit stats HOT 2
- Allow grep to handle pair suffix in ID HOT 2
- Fast way to run seqkit locate HOT 1
- Issue with sequence of length 1 and quality '+' HOT 3
- Exit status 141 running sample and head HOT 1
- Add average quality score to `stats` HOT 5
- Add subseq selected coordinates to FASTA? HOT 2
- [INFO] read BED file ... [ERRO] chr1: bad start: 144008994
- Retrieve the nucleotide sequences initially used from the SeqIDs of the protein sequences (question) HOT 9
- seqkit common/seqkit grep HOT 3
- How to calculate the length of GC content? HOT 2
- [feature suggestion] Reverse translate protein search expression into nucleotide regex or degenerate base sequence HOT 2
- Can I filter my fastq.gz file for read length with this tool? If yes, what commands would I use? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seqkit.