stephenturner / oneliners Goto Github PK
View Code? Open in Web Editor NEWUseful bash one-liners for bioinformatics.
Useful bash one-liners for bioinformatics.
Nice to see this "blog"! Thanks for sharing!
fastq_deinterleave.sh contents:
echo "deinterleave_fastq.sh < infile f.fastq r.fastq"
paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > $1) | cut -f 5-8 | tr "\t" "\n" > $2
This is probably the fastest way to do this. ( This is not my own code, got it from somebody else, but I think its nice to share)
Performance differences on 1 1 GB fastq file
seqtk: 30 seconds
alternative:17 seq.
For fastA deinterleave, change the code from 1-4 to 1-2 and from 5-8 to 3-4.
The one-liner for listing or removing all files that do not match a pattern from the etc section requires that extended globbing be enabled (shopt -s extglob
).
It may be a good idea to include this as a note since not all distros have this enabled by default. (mine didn't)
For more info, see here
Print each line where the 5th field is equal to ‘abc123’:
awk '$5 == "abc123"' file.txt
It doesnt work
WHile following command works confirming that data is present in file
Print each line where the 5th field is equal to ‘abc123’:
awk '/abc123/' file.txt
sed 's/[ \t]*$//' file.txt
for
cat -e test.txt
bla bla t $
will remove "t" character
sed "s/[[:space:]]*$//" test.txt | cat -e
works correctly
Would you consider adding a script with the PDF creation command (probably using PANDOC)?
It may be useful to other people who want to create small informational repos like yours.
Hey,
Thanks very much for putting these explanations and tools up. I think the one liner you have put for converting bam to fastq is inappropriate (or should be described differently). The problem is that your awk prints fields 1, 10, and 11 in the bam.
Field 10 is called SEQ and represents the query sequence to which the read is aligned. However alignment sequences are always represented on the plus strand of the reference (http://chagall.med.cornell.edu/NGScourse/SAM.pdf, http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams this tool is inappropriate.
Thanks,
Luke
Lots of good ones here http://alias.sh/
Hi,
maybe it would be interesting to add how to count the occurrences of each symbol in a fasta file:
sed '/^>/d;s/\(.\)/\1\n/g' file.fa | sort | uniq -ic
Best!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.