inab / trimal Goto Github PK
View Code? Open in Web Editor NEWA tool for automated alignment trimming in large-scale phylogenetic analyses. Development version: 2.0
Home Page: http://trimal.cgenomics.org
License: GNU General Public License v3.0
A tool for automated alignment trimming in large-scale phylogenetic analyses. Development version: 2.0
Home Page: http://trimal.cgenomics.org
License: GNU General Public License v3.0
Different genetic codes: Universal, mammalian mt, yeast mt, mold mt, invertebrate mt, cilate nuclear, echinoderm mt, euplotid mt, alternative yeast nuclear, ascidian mt.
When it comes to labeling stop codons, you need just a nuclear and mitochondrial code parameter.
FYI
https://github.com/tseemann/homebrew-bioinformatics-linux/blob/master/trimal.rb
This will eventually be migrated to full homebrew-science if it can compile on OS X cleanly.
I have detected an unexpected behaviour when trimming an alignment based on consistency methods, which compute their scores based on the level of agreement across a set of input alignment.
trimAl doesn't return those columns which are removed when applying -ct parameter. I have detected it using -ct alone or with other methods e.g. -gt
@Vicfero could you please verify to what extend this affect to the new version?
This is due to this line: compareFiles.cpp:56
Currently: numResiduesAlig = new int[numSeqs];
Should be: numResiduesAlig = new int[numAlignments];
Hi,
I have tried to find an answer by searching google but couldn't find anything.
I aligned my Data with mafft and now wanted to trim with TrimAl. The first two sequences worked but then I received following error message:
ERROR: The sequences in the input alignment should be aligned in order to use trimming method.
I will attach the file in question. CoaE.mafft.zip
The command used was:
trimal -in CoaE.mafft -out CoaE.triaml.fasta -fasta - automated1
Thank you for helping.
PS I really do hope this isn't a stupid question.
I'm not sure whether is worthy to implement this very specific format in trimAl (or any associated program)
Provide an updated Windows-compatible version incorporating the latest trimAl features and fixed bugs.
Ideally we should have a (automated) mechanism for providing this specific compilation
When using -compareset, -ct, and -gt together (perhaps this is not allowed?) I get an output alignment with some extra garbage to the right of the last legitimate column:
(sorry I don't know how to get fixed-width font here) .
For example:
tomfy@t410:~/trimAl_1.4/dataset$ trimal -compareset fileset1 -ct 0.1 -gt 0.1
6 46
Sp8 ---GKVIV-YGIVLGTKSDQFSVVWLFPWNGLQIHMMGII
Sp17 FAYTDLLL-IGFLLKTV-ATFGDTWFQLWQGLDLNKMPVF
Sp10 ----AVL--FVIMLGTI-TKFSSEWFFAWLGLEINMMVII
Sp26 AAAAALLTYLGLFLGTDYENFAAAAANAWLGLEINMMAQI
Sp33 ----TILNIAGLHMETD-INFSLAWFQAWGGLEINKQAIL
Sp6 ---AAILT-LGIYLFTLCAVISVSWYLAWLGLEINMMAIINKMPVF
tomfy@t410:~/trimAl_1.4/dataset$ trimal -compareset fileset1 -ct 0.2 -gt 0.5
6 38
Sp8 GIVLGTKSFSVVWLFPWNGLQIHMMGIIQAIL
Sp17 GFLLKTV-FGDTWFQLWQGLDLNKMPVFMAQI
Sp10 VIMLGTI-FSSEWFFAWLGLEINMMVIIMVII
Sp26 GLFLGTDYFAAAAANAWLGLEINMMAQIMPVF
Sp33 GLHMETD-FSLAWFQAWGGLEINKQAILMGII
Sp6 GIYLFTLCISVSWYLAWLGLEINMMAII
The documentation needs to be intensively extended to incorporate the latest improvements.
It is also lacking enough clarity about how to use specific functions.
Ideally, to put trimal in Brew / Linuxbrew we need a tagged release.
Would it be possible to make one using the "Releases" tab?
Even it if is 1.4b that's fine - we just need a .tar.gz to download.
Dear,
When I trim the output CLUSTAL file by muscle(3.8) with trimAl (v1.4.rev22), all the Symbol become the filrst one. could you give me some suggestion?
code
trimal -in region_muscle.clw -out region_muscle_trimal_auto -automated1
my email:
region_muscle.txt
says this:
ERROR: Parameter "-phylip-m10" not valid.
Improve "-colnumbering" parameter by providing more information about which columns from the old alignment corresponds to the new one.
Check whether it is already fixed to consider lower and upper case letter the same symbol.
Lots of commits since 1.4.1 ?
Running statal -in file.aln
runs for a while and prints nothing.
I assume that one of the -sc*
options is needed to get an output.
Can you flag an error if no output option is provided?
trimAl ends with a Segmentation fault when using -compareset and any of the alignments specified can't be loaded.
WARNING: Cutting sequence "Phy006C668_LYNLY" at first appearance of stop codon "TAA" (residue "") at position 1045 (length: 1047) <<<
This warning is not necessary since the protein sequence end has been reached.
If the input alignment (DNA/RNA) type is classified as DNADeg or RNADeg, both programs fail to output correctly:
NEXUS OUTPUT:
#NEXUS
BEGIN DATA;
DIMENSIONS NTAX=12 NCHAR=637;
;
...
NEXUS DESIRED OUTPUT
#NEXUS
BEGIN DATA;
DIMENSIONS NTAX=12 NCHAR=637;
FORMAT DATATYPE=DNA INTERLEAVE=yes GAP=-;
MEGA OUTPUT:
#MEGA
!Title dataset/huge/example4.sequential.phy.nexus;
NSeqs=12 Nsites=637 indel=- CodeTable=Standard;
MEGA DESIRED OUTPUT
#MEGA
!Title dataset/huge/example4.sequential.phy.nexus;
!Format DataType=DNA NSeqs=12 Nsites=637 indel=- CodeTable=Standard;
NBRF OUTPUT
>;VRA17
VRA17 637 bases
NBRF DESIRED OUTPUT
>DL;VRA17
VRA17 637 bases
The issue seems to be located on these lines:
/* Compute output file datatype */
getTypeAlignment();
if (dataType == DNAType)
alg_datatype = "DL";
else if (dataType == RNAType)
alg_datatype = "RL";
else if (dataType == AAType)
alg_datatype = "P1";
Just to simplify trimAl command line, these symbols '{' and '}' should be removed
Create an option that allow the degenerate codes in the alignments.
should be
matrix[i][j] = matrix[j][i];
Dear developer,
I experienced an error while using trimAL. When I using "trimal -compareset Api0000040.cmp" command. I got an error: "Alignment not loaded: "" Check the file's content.". I also tried to use the absolute path of the alignment, but I still got that error. Could you please tell me how to resolve this? I am using trimAl 1.2 on Windows (downloaded from http://trimal.cgenomics.org/downloads).
By the way, could you please compile a new version of trimAL for windows? I can currently only use version 1.2 on Windows.
Best wishes,
Dong Zhang
Both programs output an empty file when given a non-aligned file / alignment (ex: dataset/example.007.AA.only_seqs) and asked to save in a alignment-format (clustal, phylip, etc)
The program warns about this problem, but the check is done in functions 'alignmentXToFile' when the file has been created.
The problem seems to be on function "alignment::saveAlignment(char *destFile)" on "alignment.cpp", line 607-673:
bool alignment::saveAlignment(char *destFile) {
ofstream file;
if(sequences == NULL)
return false;
if((residNumber == 0) || (sequenNumber == 0)) {
cerr << endl << "WARNING: Output alignment has not been generated. "
<< "It is empty." << endl << endl;
return true;
}
/* File open and correct open check */
file.open(destFile);
if(!file) return false;
/* Depending on the output format, we call to the appropiate function */
switch(oformat) {
case 1:
alignmentClustalToFile(file);
break;
case 3:
alignmentNBRF_PirToFile(file);
break;
case 8:
alignmentFastaToFile(file);
break;
case 11:
alignmentPhylip3_2ToFile(file);
break;
case 12:
alignmentPhylipToFile(file);
break;
case 13:
alignmentPhylip_PamlToFile(file);
break;
case 17:
alignmentNexusToFile(file);
break;
case 21: case 22:
alignmentMegaToFile(file);
break;
case 99:
getSequences(file);
break;
case 100:
alignmentColourHTML(file);
break;
default:
return false;
}
/* Close the output file */
file.close();
/* All is OK, return true */
return true;
}
The warning of this problem:
ERROR: Sequences are not aligned. Format (X) not compatible with unaligned sequences.
Is done after opening the ofstream, thus, making an empty file.
The option to remove sequences using -seqoverlap and -resoverlap produces results I find unexpected. From what I can tell, it seems that when -resoverlap compares if a residue is the same in the other sequences it does not consider the base identity (for a DNA alignment), only whether there is a gap character or any DNA base. Therefore, for a gapfree alignment, if I change all bases in a sequence to e.g. "T" the sequence will not be removed from the alignment, even with strict settings, (e.g. -resoverlap 0.9 seqoverlap 95).
Is this how it is supposed to work and if so I'm curious why it works like this and not as I might have expected it to? Thanks
Dear developers,
Can it be better that the blosum matrix in this package (matrix.BLOSUM62) contains one more state for gaps (ie. '-'). When I tried to remove fully conserved columns, the result was expected to exclude monotonic columns, such as:
A
A
A
A
A
However, a column like
A
-
A
A
A
also had a 1.0 conservation score, computed by -scc function, although it was not fully conserved. It seems not so straightforward to specifically detect the first case with -gt and -st. Can there be any suggestion for people trying to remove monotonic columns (ie. containing single residue type and without gaps)?
Dear Salvador Capella-Gutiérrez and Toni Gabaldón,
I was testing trimAl with the -compareset option (employing a few nucleotide alignments with the same sequences across alignments in same order), but i got a few errors (please see in attachment). Am I missing something?
TrimAl was compiled from the latest version (v1.4.rev22 build[2015-05-21]) available in https://github.com/scapella/trimal.
Are these errors related to alignment differences? If not, is it possible that a future version can fix these?
Thanks for your attention.
Best regards,
Emanuel Maldonado.
CIIMAR, University of Porto.
Hello!
On some alignment files I get multiple errors, along the line of:
'Error: the symbol 'R' accesing the matrix is not defined in this object' (the symbol is changing).
The thing is, that my sequences are DNA, but as far as I've seen in the code, the default similarity matrix is for proteins. I have tested about 10 MSAs (which are from very similar sources), and some work, some produce the error. On some files statal -ssc gives those errors, but trimal works as expected.
Do you have some hints how could this be resolved?
Thanks!
Is there an output option where the alignment length is not added to taxon names? Thanks.
Hi there,
I have been using trimal v.1.2 on a local computer using command prompt. When I tried to run a command line to trim my alignment this message "trimal.exe has stopped working message" popped up and the program stopped working. Can you please tell me what I have done wrong?
The command line was trimal -in -out -automated1
My guess was that the alignment file was too big. I'm trying to find if there's any size limitation for trimal but couldn't find any information on the main website. Could you please clarify this issue?
Thank you in advance
Are there any plans to support multi-threading?
I understand it is not trivial to implement, but OpenMP pragmas could make it easy to parallelize parts of the code that loop over columns because they are independent operations?
I work on DNA alignments with 5,000,000 columns and 100s of rows, and most operations are surprisingly slow.
When the alignment contain "n" and not "N", trimal gives an error:
Error: the symbol 'n' accesing the matrix is not defined in this object
Is there a way for statal
to give me the basic information of
AGTC
AGTCN-
Perhaps this could be the default when no -sg*
option is provided?
A feature request: would be very convenient to have an option to pipe into trimal:
my_pipe | trimal -in /dev/stdin
or
my_pipe | trimal -in -
ERROR: Parameter "-selectcols" not valid.
I found somewhere pdf manual, where "select" is used instead - still the same error
I deal a lot with core genome SNP alignments (DNA) across 100s of bacterial samples. A useful report would be like this:
ID #A #G #T #C #N #-
aln1 12 31 11 31 0 8
aln2 11 44 12 32 2 5
aln3 10 33 12 32 10 2
Hi
I use trimAl v1.4.rev9 build[2012-08-09] and have problems with some sequence names.
Some names look like 'TCOGS2:TC012457-PA'.
(They are "official" names from the Tribolium castaneum genome)
If I run trimAl on the dataset containing this sequence
trimal -in input.fas -noallgaps -keepseqs
'TCOGS2:TC012457-PA' is truncated to 'TCOGS2'
Could you prevent trimAl to truncate sequence names in case of ':' ?
Thanks
Regards
My alg contains * representing stop codons. Trimal seems to delete them, so it raises the that file contains unaligned seqs.
Hi,
I tried to compile both v1.2 & v1.4. Below are the error messages.
########
g++ -Wall -c compareFiles.cpp
g++: warning: couldn’t understand kern.osversion ‘17.0.0
In file included from /usr/include/Availability.h:194:0,
from /usr/include/stdlib.h:61,
from compareFiles.h:30,
from compareFiles.cpp:27:
/usr/include/AvailabilityInternal.h:25584:74: error: missing binary operator before token "("
#if defined(__has_feature) && defined(__has_attribute) && __has_attribute(availability)
^
In file included from /usr/include/stdlib.h:61:0,
from compareFiles.h:30,
from compareFiles.cpp:27:
/usr/include/Availability.h:387:74: error: missing binary operator before token "("
#if defined(__has_feature) && defined(__has_attribute) && __has_attribute(availability)
^
make: *** [compareFiles.o] Error 1
Hi,
While compilation, I had t stop because the make command was using all my system memory (12 GB). Is that normal? Below is where I had to stop the compilation process.
I'm running MACOSX Lion.
Thank you,
Bernardo
dhcp-172-17-27-227:source bernardo$ make
g++ -Wall -O2 -c alignment.cpp rwAlignment.cpp autAlignment.cpp
g++ -Wall -O2 -c statisticsGaps.cpp
g++ -Wall -O2 -c utils.cpp
g++ -Wall -O2 -c similarityMatrix.cpp
g++ -Wall -O2 -c statisticsConservation.cpp
g++ -Wall -O2 -c sequencesMatrix.cpp
g++ -Wall -O2 -c compareFiles.cpp
g++ -Wall -O2 -o readal readAl.cpp -lm alignment.o statisticsGaps.o utils.o similarityMatrix.o statisticsConservation.o sequencesMatrix.o compareFiles.o
g++ -Wall -O2 -o trimal main.cpp -lm alignment.o statisticsGaps.o utils.o similarityMatrix.o statisticsConservation.o sequencesMatrix.o compareFiles.o
^Cmake: *** [trimal] Interrupt: 2
Hi,
It would be nice to have a flag to remove all ambiguous columns, beside ACTG for DNA, for example, something similar to -gt
which we can fine tune how much ambiguities are allowed!
Thanks,
Mohammad
Hello, I have a codon alignment file and I wondered if Trimal could have an option to select blocks made to contain only complete codons?
Thank you for your answer.
Best regards
Hi there,
I just downloaded the trimal v.1.2 for Mac and try to manually install by following the instruction on readme file but it doesn't work. Also, I couldn't find a bin directory in the downloaded folder. Could you please help me with this issue?
Thanks
S
Dear Salvador,
I get a segmentation fold in trimal, when running the following command:
trimal -in COG0185.0.faa -out COG0185.0.fna -backtrans inMSA0.fna -ignorestopcodon -gt 0.1 -cons 60
Without the -backtrans option, the program runs fine. I was hoping you could help me.
All the best,
Falk Hildebrand
Answer from Salvador:
Dear Falk,
Thanks for using trimAl and contacting me regarding this unexpected behaviour.
I played a bit with your input file and realized that you have some repeated IDs for the nucleotide files ...
2 >394503_COG0185
2 >411474_COG0185
2 >445973_COG0185
2 >699246_COG0185
2 >718252_COG0185
... and for the protein files:
2 >394503_COG0185
I just kept the first appearance of such sequences in the attached files, and everything worked as expected.
When only input and output are given to trimal (as new users may expect a default trim mode).
trimal output the input.
It seems to me that it should instead complain that no option has been given.
This may save time for new users of trimal.
Hi there,
I just ran into an error that I believe is caused by the BLOSUM45 matrix (it seems that the wild character is needed but not taken by trimal) . Both commands below produce a Segmentation error
trimal -in PTHR26451_2456.ali -out PTHR26451_2456.phy -gt 0.9 -cons 60 -st 0.3 -matrix BLOSUM45b
trimal -in PTHR26451_2456.ali -out PTHR26451_2456.phy -gt 0.9 -cons 60
Please find attached the files that cause the error for your consideration.
Many thanks,
DE
When providing stats, for instance, similarity values or gapp values. It would be good to have the average value for the whole alignment
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.