Comments (10)
csvtk uniq -Ht -f 2,5
not csvtk uniq -Ht -f 2,4
? As I also consider the sequence?
from seqkit.
https://bioinf.shenwei.me/seqkit/usage/#sequence-id
$ seqkit rmdup --id-regexp '\|([^\|]+)$' seq.fa
[INFO] 1 duplicated records removed
>ab1234|china|human|2020-10-01
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1235|china|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1236|china|human|2020-10-04
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1238|USA|animal|2020-10-05
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
from seqkit.
why the below is removed, japan only just once?
>ab1237|japan|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
``
from seqkit.
$ seqkit fx2tab seq.fa \
| sed 's/|/\t/g' \
| csvtk uniq -Ht -f 2,4 \
| awk '{print $1"|"$2"|"$3"|"$4"\t"$5}' \
| seqkit tab2fx
>ab1234|china|human|2020-10-01
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1235|china|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1236|china|human|2020-10-04
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1237|japan|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1238|USA|animal|2020-10-05
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
from seqkit.
Dear @shenwei356,
please help me test this one:
>ab1233|china|human|2020-10-01
AAACCCTTTTCCCCCAAACCCTTTTCCCCT
>ab1234|china|human|2020-10-01
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1235|china|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1236|china|human|2020-10-04
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1237|japan|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1238|USA|animal|2020-10-05
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
>ab1239|china|human|2020-10-03
AAACCCTTTTCCCCCAAACCCTTTTCCCCC
Thanks again.
from seqkit.
please help me test this one:
Why don't you test it by yourself?
As I also consider the sequence?
If yes, just change the option values as needed.
Please explore by yourself. I'm not responsible for replying to all questions, especially those that could be resolved by reading documents/usage.
from seqkit.
Hi @shenwei356,
just tested, not work for my data.
csvtk uniq -Ht -f 2,5
ab1233 china human 2020-10-01 AAACCCTTTTCCCCCAAACCCTTTTCCCCT
ab1234 china human 2020-10-01 AAACCCTTTTCCCCCAAACCCTTTTCCCCC
ab1237 japan human 2020-10-03 AAACCCTTTTCCCCCAAACCCTTTTCCCCC
ab1238 USA animal 2020-10-05 AAACCCTTTTCCCCCAAACCCTTTTCCCCC
from seqkit.
Dear @shenwei356,
hope you can help me out, thanks.
from seqkit.
Does this work? The previous -f 2,4
is changed to -f 2-4
.
$ seqkit fx2tab seq.fa \
| sed 's/|/\t/g' \
| csvtk uniq -Ht -f 2-4 \
| awk '{print $1"|"$2"|"$3"|"$4"\t"$5}' \
| seqkit tab2fx
from seqkit.
Dear @shenwei356,
works for me, thanks.
from seqkit.
Related Issues (20)
- Apple preventing software load HOT 3
- The calculation of average quality score appears to be lower than it actually is HOT 1
- [feature request] file based `restart` HOT 2
- sekqit fish goroutine error HOT 7
- Does subseq use random seed? HOT 3
- how to replace `n` to `-`? HOT 7
- how to remove sequences that have length < 29000bp in fasta format, not count for "-"? HOT 18
- seqkit version 2.8.1 reporting version as 2.8.0 HOT 5
- can seqkit handle fasta format sequence in `*.tar.xz`? HOT 6
- `seqkit amplicon --primer-file` returns a different result than `seqkit amplicon -F -R` HOT 4
- seqkit rmdup HOT 2
- rmdup memory consumption HOT 2
- seqkit split with regexp does not respect letter case overwriting file output HOT 11
- Attempting to split fastq.gz into two files based on sequence HOT 3
- "seqkit pair" command problem HOT 2
- seqkit subseq multi region HOT 2
- [ERR0] no more than one file needed (2) - seqkit sub sampling HOT 2
- seqkit stats restarts processing after x number of files HOT 2
- rna2dna to convert the sequence 'U' to 'T' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seqkit.