Giter Site home page Giter Site logo

iqtree / iqtree2 Goto Github PK

View Code? Open in Web Editor NEW
228.0 13.0 54.0 72.75 MB

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org

License: GNU General Public License v2.0

Python 0.11% CMake 0.59% C++ 68.58% C 27.36% Shell 0.07% Makefile 0.19% SAS 0.01% CLIPS 0.04% Pascal 0.50% Ada 0.64% Assembly 1.01% C# 0.40% Batchfile 0.01% M4 0.01% DIGITAL Command Language 0.20% HTML 0.21% Module Management System 0.01% Roff 0.03% Perl 0.03% Nix 0.01%

iqtree2's Issues

iq-tree crashes with signal aborted

Hello.

I am running iqtree2 with fconst on a set of ncov19 genomes processed through the ivar pipeline. I have iqtree2 set to run through a shell script with a bootstapping size of 1000. When executing through the shell script, I receive the following error message:

iqtree2_error_signal_aborted

The error does not occur when taking the exact commands out of the shell script. Is there a way to specify these commands better in the sh file to prevent a signal abort?

Constant site positions are determined through snp-sites with the following commands in the script:

coor=$(snp-sites -C 618_aln.fa)
iqtree2 -mset HKY,TIM2,GTR -mfreq F -mrate G,R -alrt 1000 -bb 1000 -fconst $coor -pre 618 -o MN908947 -s 618_aln_VarsOnly.fasta

build failure with -DIQTREE_FLAGS=single

when trying to compile iqtree2 (both from master or the v1.2.3 tag) without omp I get the following failure:

tree/phylotree.cpp:3366:25: error: use of undeclared identifier 'omp_get_thread_num'
        int threadNum = omp_get_thread_num();

which makes sense as it should likely be in an #ifdef _OPENMP statement; perhaps like this?

#ifdef _OPENMP
    int threads = omp_get_max_threads();
#else 
    threads = 1;

Core dump memory error!

Hello,

I previously had a memory error, so I tried the -t PARS option as recommended in other posts, but this has also resulted in core dump.

NOTE: 42238 MB RAM (41 GB) is required!
WARNING: Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads.
CHECKPOINT: Model parameters restored, LogL: -7582723.2167717889
Best tree printed to Angiosperm_cat.fftns.gt0.1.gt50.phy.treefile
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Computing ML distances based on estimated model parameters...ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: Angiosperm_cat.fftns.gt0.1.gt50.phy.log
ERROR: ***    Alignment files (if possible)
Aborted (core dumped)

I also tried to limit the memory usage by using the -mem 7G option, which resulted in core dump again.

Best tree printed to Angiosperm_cat.fftns.gt0.1.gt50.phy.treefile
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Computing ML distances based on estimated model parameters...ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: Angiosperm_cat.fftns.gt0.1.gt50.phy.log
ERROR: ***    Alignment files (if possible)
Aborted (core dumped)

My alignment contains 64,096 sequences that are 444 aa long and the computer has 160 cores with 1.11 TB of memory. Are there any other options I could use to overcome this memory error?

Thanks in advance,

Jenny

Limit fast initial parsimony tree by random order stepwise addition

Is it possible to limit the number of threads that the computer/cluster uses to build the initial parsimony tree at all?

When I send a job with e.g. -T AUTO, IQTREE2 will utilise all available threads, which takes an infinitely long amount of time for the first stage. What I need it to do is analyse the best number of threads for downstream analysis, and then use the same number of threads for all stages.

Does that make sense?

Best wishes,
Steve

compile error with “cmake -DIQTREE_FLAGS=KNL ../”

I followed the instructions to create and open "build" folder in the root directory of the "iqtree2"

Because my server is equipped intel xeon 8260 cpu, I want to improve performance by using intel C++ compiler and adding "KNL" tags while "cmake":
cmake -DIQTREE_FLAGS=KNL ../
So far, no problems,

Then I use "make" command to make file, it comes a error:
/root/iqtree2/tree/phylokernelavx512.cpp(100): error: no instance of function template "PhyloTree::computeNonrevLikelihoodBranchGenericSIMD" matches the required type
computeLikelihoodBranchPointer = &PhyloTree::computeNonrevLikelihoodBranchGenericSIMD ;
^

/root/iqtree2/tree/phylokernelavx512.cpp(101): error: no instance of function template "PhyloTree::computeNonrevLikelihoodDervGenericSIMD" matches the required type
computeLikelihoodDervPointer = &PhyloTree::computeNonrevLikelihoodDervGenericSIMD ;
^

/root/iqtree2/tree/phylokernelavx512.cpp(102): error: no instance of function template "PhyloTree::computeNonrevPartialLikelihoodGenericSIMD" matches the required type
computePartialLikelihoodPointer = &PhyloTree::computeNonrevPartialLikelihoodGenericSIMD;

So I follow the error message, i found 3 lines(100-102) of code in "phylokernelavx512.cpp":
100: computeLikelihoodBranchPointer = &PhyloTree::computeNonrevLikelihoodBranchGenericSIMD ;
101: computeLikelihoodDervPointer = &PhyloTree::computeNonrevLikelihoodDervGenericSIMD ;
102: computePartialLikelihoodPointer = &PhyloTree::computeNonrevPartialLikelihoodGenericSIMD;
then, my c++ editor (VS Code) shows: error: no instance of function template "..." matches the required type "..."

I moved to line 112-114:
112: computeLikelihoodBranchPointer = &PhyloTree::computeLikelihoodBranchSIMD <Vec8d, SAFE_LH, 4, true>;
113: computeLikelihoodDervPointer = &PhyloTree::computeLikelihoodDervSIMD <Vec8d, SAFE_LH, 4, true>;
114: computeLikelihoodDervMixlenPointer = &PhyloTree::computeLikelihoodDervMixlenSIMD<Vec8d, SAFE_LH, 4, true>;

I found line100-102 and line 112-114 are almost same but at the end:
100: ... ;
101: ... ;
102: ... ;

112: ... <Vec8d, SAFE_LH, 4, true>;
113: ... <Vec8d, SAFE_LH, 4, true>;
114: ... <Vec8d, SAFE_LH, 4, true>;

So I thouht maybe I can fix it with add " , SAFE_LH, 4, true" at the end of line 100-102:
100: computeLikelihoodBranchPointer = &PhyloTree::computeNonrevLikelihoodBranchGenericSIMD <Vec8d, SAFE_LH, 4, true>;
101: computeLikelihoodDervPointer = &PhyloTree::computeNonrevLikelihoodDervGenericSIMD <Vec8d, SAFE_LH, 4, true>;
102: computePartialLikelihoodPointer = &PhyloTree::computeNonrevPartialLikelihoodGenericSIMD<Vec8d, SAFE_LH, 4, true>;

After that, I run “cmake -DIQTREE_FLAGS=KNL ../” and “make” again, no error shows, and I successfully get “iqtree2” program

Then I run “iqtree2” to analyze my data, no error show

I think it might help!

Taxon name error

Dear Bui Quang Minh:
I am meeting a error report from IQTREE2 (IQ-TREE multicore version 2.1.2 COVID-edition for Linux 64-bit built Oct 22 2020). Here is the partial detailed reports from that.
--------->
Rate parameters: A-C: 2.09932 A-G: 3.94693 A-T: 1.47380 C-G: 1.38623 C-T: 5.05624 G-T: 1
Base frequencies: A: 0.280 C: 0.189 G: 0.208 T: 0.323
Site proportion and rates: (0.085,0.021) (0.048,0.194) (0.028,0.194) (0.042,0.199) (0.076,0.4
Parameters optimization took 99 rounds (1005.335 sec)
Computing ML distances based on estimated model parameters...
Computing ML distances took 0.072589 sec (of wall-clock time) 1.404860 sec(of CPU time)
Computing RapidNJ tree took 0.012983 sec (of wall-clock time) 0.263654 sec (of CPU time)
ERROR: Alignment sequence AY278489|SARS-CoV_GD01|Betacoronavirus does not appear in the tree
ERROR: Alignment sequence AY390556|SARS-CoV_GZ02|Guangzhou|Betacoronavirus does not appear in
ERROR: Alignment sequence AY485277|SARS-CoV_Sino1-11|Betacoronavirus does not appear in the tr
ERROR: Alignment sequence AY508724|SARS-CoV_NS-1|Betacoronavirus does not appear in the tree
ERROR: Alignment sequence KT444582|WIV16|Yunnan|Betacoronavirus does not appear in the tree
ERROR: Alignment sequence KY417146|Rs4231|Yunnan|Betacoronavirus does not appear in the tree
ERROR: Alignment sequence KY417151|Rs7327|Yunnan|Betacoronavirus does not appear in the tree
ERROR: Alignment sequence KY417152|Rs9401|Yunnan|Betacoronavirus does not appear in the tree
ERROR: Alignment sequence MK211376|BtRs-BetaCoV/YN2018B|Yunnan|Betacoronavirus does not appear
(KJ473815|BtRs-BetaCoV/GX2013|Guangxi|Betacoronavirus,JX993988|Yunnan2011|Betacoronavirus,((((
ERROR: Tree taxa and alignment sequence do not match (see above)
----------->

It is about the taxon name error. However, it run accurately in IQTREE (Version 1.6.12). I am waiting for a reply from you. Tanks.
Sincerely
Zhenzhi Han

-fast flag

the -fast flag should also turn on the following options:

--suppress-zero-disatnce
--suppress-list-of-sequences
--suppress-duplicate-sequence
--no-opt-gamma-inv

Any others that generally speed things up?

Perhaps:

-optalg 1-BFGS

Error while compiling on MacOSX with -DUSE_LSD2=ON

Hi there,
First, thanks heaps for providing and maintaining IQTREE, it's one of the most useful and best working molecular evolution tool. It's been working flawless for me, to this day that is, where I'm trying a more experimental feature.
I'm trying to use LSD2 on my Mac. Apparently that feature is not included in the compiled version. I tried to install from source, but got an error.
I have a MacBook Pro with MacOSX 11.4 (20F71). I had to install Boost 1.76.0 and eigen 3.3.9 through brew to get to cmake to work. I cloned iqtree2 (3267aec) with submodules to get lsd2:

git clone --recurse-submodules --remote-submodules https://github.com/iqtree/iqtree2.git
cd iqtree2
mkdir build
cd build
cmake -DIQTREE_FLAGS=omp -DUSE_LSD2=ON ..

That works (see log below). Now trying to make:
% make -j

I'm getting the following errors:

[ 23%] Building CXX object main/CMakeFiles/main.dir/timetree.cpp.o
/Users/liogu139/bin/iqtree2/main/timetree.cpp:275:28: error: no matching constructor
      for initialization of 'lsd::InputOutputStream'
  ...io(tree_stream.str(), outgroup_stream.str(), date_stream.str(), "", "");
     ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/liogu139/bin/iqtree2/lsd2/src/lsd.h:56:9: note: candidate constructor not
      viable: requires 6 arguments, but 5 were provided
        InputOutputStream(string tree, string outgroup, string date,string rate...
        ^
/Users/liogu139/bin/iqtree2/lsd2/src/lsd.h:15:11: note: candidate constructor (the
      implicit copy constructor) not viable: requires 1 argument, but 5 were provided
    class InputOutputStream {
          ^
/Users/liogu139/bin/iqtree2/lsd2/src/lsd.h:48:9: note: candidate constructor not
      viable: requires 0 arguments, but 5 were provided
        InputOutputStream();
        ^
1 error generated.

I've tried to make just lsd2 and that works and I can execute lsd2. My command of C is not good enough to go further, can you help me?

On a side note, I didn't have to install a newer version of Clang separately, the omp version compiled without problem.

Cheers,

Lionel

Log of CMake:

 CMake Deprecation Warning at CMakeLists.txt:58 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is AppleClang 11.0.3.11030032
-- The CXX compiler identification is AppleClang 11.0.3.11030032
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/local/lib/cmake/Boost-1.76.0/BoostConfig.cmake (found version "1.76.0")
IQ-TREE flags : omp
Build mode   : Release
Target OS     : Mac OS X
Compiler      : Clang
Compiler version: 11.0.3.11030032
Target binary : 64-bit
OpenMP        : Yes
MPI           : NONE
Vectorization : SSE3/AVX/AVX2
C flags       :  -pthread  -O3 -ffunction-sections -fdata-sections
CXX flags     :  -std=c++11 -stdlib=libc++ -Xpreprocessor -fopenmp -pthread  -O3 -ffunction-sections -fdata-sections
LINKER flags  :  --target=x86_64-apple-macos10.7 -lomp  -Wl,-dead_strip
-- Looking for gettimeofday
-- Looking for gettimeofday - found
-- Looking for getrusage
-- Looking for getrusage - found
-- Looking for GlobalMemoryStatusEx
-- Looking for GlobalMemoryStatusEx - not found
-- Looking for strndup
-- Looking for strndup - found
-- Looking for strtok_r
-- Looking for strtok_r - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include
-- Found ZLIB: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/lib/libz.tbd (found version "1.2.11")
Using system zlib
-- Performing Test FLAG_WEXTRA
-- Performing Test FLAG_WEXTRA - Success
-- clang-tidy not found.
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/liogu139/bin/iqtree2/build

clean up modelfinder output

Right now the modelfinder output is too verbose. See the e.g. of what it looks like below. In a terminal window this spreads over multiple lines, and makes the output useless.

I suggest we switch to something more like the partitionfinder output, which mostly looks like this:

INFO     | 2020-12-02 13:39:23,160 |       Finished subset 6077/14028, 43.32 percent done
INFO     | 2020-12-02 13:39:23,273 |       Finished subset 6078/14028, 43.33 percent done
INFO     | 2020-12-02 13:39:23,468 |       Finished subset 6079/14028, 43.33 percent done
INFO     | 2020-12-02 13:39:23,898 |       Finished subset 6080/14028, 43.34 percent done
INFO     | 2020-12-02 13:39:23,997 |       Finished subset 6081/14028, 43.35 percent done
INFO     | 2020-12-02 13:39:24,124 |       Finished subset 6082/14028, 43.36 percent done
INFO     | 2020-12-02 13:39:24,849 |       Finished subset 6083/14028, 43.36 percent done
INFO     | 2020-12-02 13:39:25,058 |       Finished subset 6084/14028, 43.37 percent done
INFO     | 2020-12-02 13:39:25,384 |       Finished subset 6085/14028, 43.38 percent done
INFO     | 2020-12-02 13:39:25,940 |       Finished subset 6086/14028, 43.38 percent done
INFO     | 2020-12-02 13:39:26,051 |       Finished subset 6087/14028, 43.39 percent done
INFO     | 2020-12-02 13:39:26,385 |       Finished subset 6088/14028, 43.40 percent done
INFO     | 2020-12-02 13:39:26,718 |       Finished subset 6089/14028, 43.41 percent done
INFO     | 2020-12-02 13:39:26,829 |       Finished subset 6090/14028, 43.41 percent done
INFO     | 2020-12-02 13:39:27,156 |       Finished subset 6091/14028, 43.42 percent done
INFO     | 2020-12-02 13:39:27,666 |       Finished subset 6092/14028, 43.43 percent done
INFO     | 2020-12-02 13:39:27,887 |       Finished subset 6093/14028, 43.43 percent done
INFO     | 2020-12-02 13:39:28,274 |       Finished subset 6094/14028, 43.44 percent done
INFO     | 2020-12-02 13:39:28,475 |       Finished subset 6095/14028, 43.45 percent done
INFO     | 2020-12-02 13:39:28,898 |       Finished subset 6096/14028, 43.46 percent done

and then only reports the merged subset that was chosen as the best at each step.

Basically - we don't need to bombard the user with all the names of every merge we attempt. Instead, we need an output that will have a mostly fixed width (or at least, a width that changes rarely), so as it scrolls past at a million miles an hour users can see the progress.

19064 GTR+F+I+G4   560192.840  1.204       ES316890nucl_1stpos+ES319935nucl_1stpos+ES321806nucl_1stpos+ES321879nucl_1stpos+ES321882nucl_1stpos+O540nucl_1stpos+O6843nucl_1stpos+O8775nucl_1
stpos+O10295nucl_1stpos+O10419nucl_1stpos+O10567nucl_1stpos+O10821nucl_1stpos+O11092nucl_1stpos+O11094nucl_1stpos+O11722nucl_1stpos+O13507nucl_1stpos+O13660nucl_1stpos+O15665nucl_1stpos+O
16400nucl_1stpos+O17424nucl_1stpos+O19102nucl_1stpos+O19189nucl_1stpos+O19758nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O23857nucl_1stpos  0h:2m:52s (0h:1m:21s left)
19065 GTR+F+I+G4   560901.345  1.032       ES316508nucl_2ndpos+ES321882nucl_2ndpos+O988nucl_2ndpos+O988nucl_3rdpos+O6843nucl_2ndpos+O7569nucl_2ndpos+O8775nucl_2ndpos+O10419nucl_2ndpos+O10
567nucl_3rdpos+O10600nucl_1stpos+O10600nucl_2ndpos+O11635nucl_2ndpos+O11722nucl_2ndpos+O13507nucl_2ndpos+O17252nucl_2ndpos+O18599nucl_2ndpos+O19058nucl_2ndpos+O19102nucl_2ndpos+O21222nucl
_2ndpos+O21286nucl_2ndpos+O21944nucl_2ndpos+O23857nucl_2ndpos   0h:2m:53s (0h:1m:21s left)
19066 GTR+F+I+G4   560712.417  0.827       ES321879nucl_2ndpos+ES321882nucl_1stpos+O1187nucl_2ndpos+O5959nucl_2ndpos+O8818nucl_2ndpos+O10295nucl_1stpos+O10567nucl_1stpos+O10567nucl_2ndpos
+O10821nucl_1stpos+O10821nucl_2ndpos+O11092nucl_1stpos+O11094nucl_1stpos+O11094nucl_2ndpos+O11240nucl_2ndpos+O11569nucl_2ndpos+O11722nucl_1stpos+O13507nucl_1stpos+O13706nucl_2ndpos+O14581
nucl_2ndpos+O15665nucl_1stpos+O16052nucl_2ndpos+O16400nucl_1stpos+O16400nucl_2ndpos+O17424nucl_1stpos+O19102nucl_1stpos+O19189nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O22156nucl_2n
dpos+O22172nucl_2ndpos+O22441nucl_2ndpos+O23857nucl_1stpos      0h:2m:54s (0h:1m:22s left)
19067 GTR+F+I+G4   560435.179  0.698       ES316508nucl_2ndpos+ES321882nucl_2ndpos+O1187nucl_1stpos+O6843nucl_2ndpos+O7569nucl_2ndpos+O8775nucl_2ndpos+O8818nucl_1stpos+O10353nucl_1stpos+O
10353nucl_2ndpos+O10353nucl_3rdpos+O10419nucl_2ndpos+O11635nucl_1stpos+O11635nucl_2ndpos+O11722nucl_2ndpos+O12061nucl_2ndpos+O12201nucl_2ndpos+O13507nucl_2ndpos+O16052nucl_1stpos+O17252nu
cl_2ndpos+O18599nucl_2ndpos+O19102nucl_2ndpos+O21222nucl_1stpos+O21222nucl_2ndpos+O21286nucl_2ndpos+O21944nucl_2ndpos+O22172nucl_1stpos+O22441nucl_1stpos+O23857nucl_2ndpos     0h:2m:54s (
0h:1m:22s left)
19068 GTR+F+I+G4   560200.537  1.488       ES316508nucl_1stpos+O1187nucl_1stpos+O5959nucl_1stpos+O8818nucl_1stpos+O10353nucl_1stpos+O10353nucl_2ndpos+O10353nucl_3rdpos+O11240nucl_1stpos+O
11569nucl_1stpos+O11635nucl_1stpos+O12061nucl_1stpos+O12061nucl_2ndpos+O12201nucl_1stpos+O12201nucl_2ndpos+O13706nucl_1stpos+O14581nucl_1stpos+O16052nucl_1stpos+O17252nucl_1stpos+O18599nu
cl_1stpos+O19058nucl_1stpos+O21222nucl_1stpos+O21286nucl_1stpos+O22172nucl_1stpos+O22441nucl_1stpos     0h:2m:54s (0h:1m:22s left)
19069 GTR+F+I+G4   561382.102  2.125       ES321882nucl_1stpos+O9874nucl_3rdpos+O10295nucl_1stpos+O10567nucl_1stpos+O10821nucl_1stpos+O11092nucl_1stpos+O11094nucl_1stpos+O11722nucl_1stpos
+O13507nucl_1stpos+O15665nucl_1stpos+O16400nucl_1stpos+O17424nucl_1stpos+O17424nucl_3rdpos+O19102nucl_1stpos+O19189nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O22172nucl_3rdpos+O23857
nucl_1stpos+O30441nucl_3rdpos   0h:2m:55s (0h:1m:22s left)
19070 GTR+F+I+G4   560268.431  0.966       ES321882nucl_1stpos+O7569nucl_1stpos+O9874nucl_1stpos+O10295nucl_1stpos+O10567nucl_1stpos+O10821nucl_1stpos+O10867nucl_1stpos+O11092nucl_1stpos+
O11094nucl_1stpos+O11722nucl_1stpos+O13507nucl_1stpos+O14147nucl_1stpos+O15665nucl_1stpos+O16400nucl_1stpos+O17157nucl_1stpos+O17424nucl_1stpos+O17764nucl_1stpos+O19102nucl_1stpos+O19189n
ucl_1stpos+O21786nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O23857nucl_1stpos+O30441nucl_1stpos    0h:2m:55s (0h:1m:22s left)
19071 GTR+F+I+G4   560272.161  1.415       ES316508nucl_1stpos+ES321882nucl_1stpos+O5959nucl_1stpos+O10295nucl_1stpos+O10567nucl_1stpos+O10821nucl_1stpos+O11092nucl_1stpos+O11094nucl_1stp
os+O11240nucl_1stpos+O11569nucl_1stpos+O11722nucl_1stpos+O12061nucl_1stpos+O12201nucl_1stpos+O13507nucl_1stpos+O13706nucl_1stpos+O14581nucl_1stpos+O15665nucl_1stpos+O16400nucl_1stpos+O172
52nucl_1stpos+O17424nucl_1stpos+O18599nucl_1stpos+O19058nucl_1stpos+O19102nucl_1stpos+O19189nucl_1stpos+O21286nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O23857nucl_1stpos 0h:2m:55s (
0h:1m:22s left)
19072 GTR+F+I+G4   561142.489  1.197       ES321806nucl_2ndpos+O540nucl_2ndpos+O988nucl_2ndpos+O988nucl_3rdpos+O9874nucl_2ndpos+O10567nucl_3rdpos+O10600nucl_1stpos+O10600nucl_2ndpos+O1086
7nucl_2ndpos+O15665nucl_2ndpos+O17157nucl_2ndpos+O17764nucl_2ndpos+O19058nucl_2ndpos+O19758nucl_2ndpos+O21786nucl_2ndpos+O30441nucl_2ndpos      0h:2m:55s (0h:1m:22s left)
19073 GTR+F+I+G4   560396.428  9.169       ES316890nucl_3rdpos+O10295nucl_3rdpos+O11092nucl_3rdpos+O11569nucl_3rdpos+O11635nucl_3rdpos+O13507nucl_3rdpos+O13706nucl_3rdpos+O14581nucl_3rdpo
s+O16052nucl_3rdpos+O16400nucl_3rdpos+O17157nucl_3rdpos+O19189nucl_3rdpos+O19758nucl_3rdpos+O21786nucl_3rdpos+O21944nucl_3rdpos+O22156nucl_3rdpos+O22441nucl_3rdpos     0h:2m:55s (0h:1m:22
s left)
19074 GTR+F+I+G4   560986.855  3.010       ES316508nucl_1stpos+O5959nucl_1stpos+O9874nucl_3rdpos+O11240nucl_1stpos+O11569nucl_1stpos+O12061nucl_1stpos+O12201nucl_1stpos+O13706nucl_1stpos+
O14581nucl_1stpos+O17252nucl_1stpos+O17424nucl_3rdpos+O18599nucl_1stpos+O19058nucl_1stpos+O21286nucl_1stpos+O22172nucl_3rdpos+O30441nucl_3rdpos 0h:2m:56s (0h:1m:22s left)
19075 GTR+F+I+G4   560191.363  9.109       ES316890nucl_3rdpos+O9874nucl_3rdpos+O10295nucl_3rdpos+O11092nucl_3rdpos+O11569nucl_3rdpos+O11635nucl_3rdpos+O13507nucl_3rdpos+O13706nucl_3rdpos
+O14581nucl_3rdpos+O16400nucl_3rdpos+O17157nucl_3rdpos+O17424nucl_3rdpos+O21786nucl_3rdpos+O21944nucl_3rdpos+O22156nucl_3rdpos+O22172nucl_3rdpos+O22441nucl_3rdpos+O30441nucl_3rdpos    0h:
2m:56s (0h:1m:22s left)
19076 GTR+F+I+G4   561584.405  2.980       ES316890nucl_1stpos+ES319935nucl_1stpos+ES321806nucl_1stpos+ES321879nucl_1stpos+O540nucl_1stpos+O6843nucl_1stpos+O8775nucl_1stpos+O9874nucl_3rdp
os+O10419nucl_1stpos+O13660nucl_1stpos+O17424nucl_3rdpos+O19758nucl_1stpos+O22172nucl_3rdpos+O30441nucl_3rdpos  0h:2m:56s (0h:1m:22s left)
19077 GTR+F+I+G4   560158.457  0.385       ES316508nucl_2ndpos+ES316890nucl_2ndpos+ES319935nucl_2ndpos+ES321882nucl_2ndpos+O6843nucl_2ndpos+O7569nucl_2ndpos+O8775nucl_2ndpos+O10295nucl_2n
dpos+O10419nucl_2ndpos+O11092nucl_2ndpos+O11635nucl_2ndpos+O11722nucl_2ndpos+O13507nucl_2ndpos+O13660nucl_2ndpos+O14147nucl_2ndpos+O17252nucl_2ndpos+O17424nucl_2ndpos+O18599nucl_2ndpos+O1
9102nucl_2ndpos+O19189nucl_2ndpos+O21222nucl_2ndpos+O21286nucl_2ndpos+O21944nucl_2ndpos+O23857nucl_2ndpos       0h:2m:56s (0h:1m:23s left)
19078 GTR+F+I+G4   560403.485  1.676       ES316508nucl_1stpos+ES316890nucl_1stpos+ES319935nucl_1stpos+ES321806nucl_1stpos+ES321879nucl_1stpos+O540nucl_1stpos+O5959nucl_1stpos+O6843nucl_1
stpos+O8775nucl_1stpos+O10419nucl_1stpos+O11240nucl_1stpos+O11569nucl_1stpos+O12061nucl_1stpos+O12201nucl_1stpos+O13660nucl_1stpos+O13706nucl_1stpos+O14581nucl_1stpos+O17252nucl_1stpos+O1
8599nucl_1stpos+O19058nucl_1stpos+O19758nucl_1stpos+O21286nucl_1stpos   0h:2m:56s (0h:1m:23s left)
19079 GTR+F+I+G4   560153.284  1.061       ES321882nucl_1stpos+O1187nucl_1stpos+O8818nucl_1stpos+O10295nucl_1stpos+O10353nucl_1stpos+O10353nucl_2ndpos+O10353nucl_3rdpos+O10567nucl_1stpos+
O10821nucl_1stpos+O11092nucl_1stpos+O11094nucl_1stpos+O11635nucl_1stpos+O11722nucl_1stpos+O12061nucl_2ndpos+O12201nucl_2ndpos+O13507nucl_1stpos+O15665nucl_1stpos+O16052nucl_1stpos+O16400n
ucl_1stpos+O17424nucl_1stpos+O19102nucl_1stpos+O19189nucl_1stpos+O21222nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O22172nucl_1stpos+O22441nucl_1stpos+O23857nucl_1stpos    0h:2m:56s (
0h:1m:23s left)

No space left on device

lto-wrapper: fatal error: write: No space left on device
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/iqtree2.dir/build.make:136: iqtree2] Error 1
make[1]: *** [CMakeFiles/Makefile2:535: CMakeFiles/iqtree2.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

But thos is 2G RAM, 7G swap machine

Filtering duplicate topologies in tree topology tests

Hello IQ-Tree team 👋

I am using the tree topology tests in my analyses and the documentation says:

Finally, note that IQ-TREE will automatically detect duplicated tree topologies and omit them during the evaluation.

However, while reading through the code, I noticed that the duplicate filtering is off by default and has to be explicitly set using the -treediff command line switch. Is there a reason why this option is off by default for the topology tests? If so, maybe the documentation should rather state that the user has to request filtering explicitly

Failing compilation

Hello,
I'm trying to package v. 2 using openSUSE Build Service, which is tool compiling and packaging various software for multiple Linux distributions. For security reasons it has relatively strict checks during compilation, e.g. -Werror=return-type as part of optflags, so that some compilations fails. IQ-TREE is affected by these checks, so it fails:

...
[ 25%] Linking CXX shared library libterraphast.so
cd /home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build/terraphast && /usr/bin/cmake -E cmake_link_script CMakeFiles/terraphast.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -DNDEBUG -std=c++11 -fopenmp -pthread  -O2 -g -DNDEBUG -flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now -shared -Wl,-soname,libterraphast.so -o libterraphast.so CMakeFiles/terraphast.dir/lib/advanced.cpp.o CMakeFiles/terraphast.dir/lib/bigint.cpp.o CMakeFiles/terraphast.dir/lib/bipartitions.cpp.o CMakeFiles/terraphast.dir/lib/bitmatrix.cpp.o CMakeFiles/terraphast.dir/lib/clamped_uint.cpp.o CMakeFiles/terraphast.dir/lib/constraints.cpp.o CMakeFiles/terraphast.dir/lib/errors.cpp.o CMakeFiles/terraphast.dir/lib/multitree.cpp.o CMakeFiles/terraphast.dir/lib/multitree_iterator.cpp.o CMakeFiles/terraphast.dir/lib/nodes.cpp.o CMakeFiles/terraphast.dir/lib/parser.cpp.o CMakeFiles/terraphast.dir/lib/rooting.cpp.o CMakeFiles/terraphast.dir/lib/simple.cpp.o CMakeFiles/terraphast.dir/lib/subtree_extraction.cpp.o CMakeFiles/terraphast.dir/lib/supertree_helpers.cpp.o CMakeFiles/terraphast.dir/lib/trees.cpp.o CMakeFiles/terraphast.dir/lib/union_find.cpp.o CMakeFiles/terraphast.dir/lib/validation.cpp.o   -L/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/lib
make[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
make[1]: Entering directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
[ 27%] Built target terraphast
make[1]: Leaving directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
make[2]: Entering directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
[ 27%] Linking CXX shared library libncl.so
cd /home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build/ncl && /usr/bin/cmake -E cmake_link_script CMakeFiles/ncl.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -DNDEBUG -std=c++11 -fopenmp -pthread  -O2 -g -DNDEBUG -flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now -shared -Wl,-soname,libncl.so -o libncl.so CMakeFiles/ncl.dir/nxsassumptionsblock.cpp.o CMakeFiles/ncl.dir/nxsblock.cpp.o CMakeFiles/ncl.dir/nxscharactersblock.cpp.o CMakeFiles/ncl.dir/nxsdatablock.cpp.o CMakeFiles/ncl.dir/nxsdiscretedatum.cpp.o CMakeFiles/ncl.dir/nxsdiscretematrix.cpp.o CMakeFiles/ncl.dir/nxsdistancedatum.cpp.o CMakeFiles/ncl.dir/nxsdistancesblock.cpp.o CMakeFiles/ncl.dir/nxsemptyblock.cpp.o CMakeFiles/ncl.dir/nxsexception.cpp.o CMakeFiles/ncl.dir/nxsreader.cpp.o CMakeFiles/ncl.dir/nxssetreader.cpp.o CMakeFiles/ncl.dir/nxsstring.cpp.o CMakeFiles/ncl.dir/nxstaxablock.cpp.o CMakeFiles/ncl.dir/nxstoken.cpp.o CMakeFiles/ncl.dir/nxstreesblock.cpp.o   -L/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/lib
make[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
make[1]: Entering directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
[ 27%] Built target ncl
make[1]: Leaving directory '/home/abuild/rpmbuild/BUILD/iqtree2-2.1.3/build'
make: *** [Makefile:159: all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.sX7bMn (%build)

Rationale is that if ignoring these warnings, software can lead into undefined state, which can cause various crashes. It'd be nice if You could fix it. :-) I thought (most of?) such errors should be already fixed, but the compilation still fails. Basically same crash I see also for all newest releases (and also for version 1). Manual compilation without the above mentioned checks works, but it breaks openSUSE packaging guidelines and requirements...

Integer overflow during memory allocation

Hi!

Large memory requirements cause a segmentation fault when using mixture models because of integer overflow. This is a long standing bug.

See, e.g.,

I have fixed this issue for quite some time on the linked_models branch in IQ-TREE 1, see
https://github.com/Cibiv/IQ-TREE/blob/linked_models/model/modelmixture.cpp#L1339 and the following lines.

It would be great if this could be fixed for the distributed versions since I get repeated bug reports for this issue.

Thanks!

flags to suppress starting trees

Right now if I only want an NJ starting tree I can do something like:

-t BIONJ

This is great. In this case, IQ-TREE estimates two NJ trees: one with JC distances, and one with ML distances. This behaviour is not always desired though. Here are two specific cases where it doesn't make sense.

  1. If I specify -m JC, IQ-TREE will estimate the same NJ tree twice, including re-calculating all the JC distances. This usually wouldn't be a problem, but as we move to focus more on huge datasets it becomes silly, and just doubles runtimes unnecessarily.

  2. If I want an NJ tree from ML distances, and nothing else, and I already have the model parameters. E.g. if I specify -m "JC+I{0.3}". In this case I don't need the first NJ tree from JC distances - this would only have been useful if I didn't know and fix the model parameters ahead of time.

I think we can address this with a simple flag: --mlnj-only

This would specify that I am only interested the NJ tree I will get from the model I have specified via -m. The behaviour is simple: if I didn't specify all of the model parameters, then nothing changes (we still need that first JC distance tree on which to estimate the unknown model parameters); but if I have specified all model parameters already, then we can skip that first set of JC distances.

Thoughts @bqminh and @JamesBarbetti?

ModelFinder issue - Numerical Underflow

I'm working on a new approach to model selection (lets call it ModelFinder2). It incorporates a different algorithm to traverse models of substitution and models of rate heterogeneity across sites, and it incorporates the new GHOST models. I am using IQ-TREE 2.1.3 for this, but I am encountering various problems, including this error message:

WARNING: Numerical underflow for lh-derivative-mixlen

Currently, I have log files (+165Mb), which predominantly include this line. I don't think it is necessary to report the warning every time the issue arises (i.e., once should be enough for the model in question).

To handle this issue I sometimes have to stop the program, remove all copies of the warning from the *.log file, and then start the program again. Surely, this should not be necessary.

Can this be arranged?

IQTREE Error

Team, running into an iqtree error using iqtree in the nextstrain workflow.
The error message looks like this:
Waiting at most 30 seconds for missing files. MissingOutputException in line 750 of /mnt/c/Users/bryan/Documents/ncov/workflow/snakemake_rules/main_workflow.smk: Job Missing files after 30 seconds: results/Nebraska/nt_muts.json This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 38 completed successfully, but some output files are missing. 38
Not sure why this is happening.

support for M1 chip (ARM)

Dear iqtree2 team,

Any possibility to support M1 chip, the older intel version it in homebrew, but cannot install gcc+-11 -msse error...

Thanks,

Jianshu

Improve optimisers for +R models

On many datasets I notice warnings like:

WARNING: Log-likelihood -7226.94 of K2P+R4 worse than K2P+R3 -7059.91

Obviously this shouldn't happen. The lnL should always be better with R4 than R3. I'm guessing this is just a limitation of the current optimiser. In many cases it seems like a fairly big limitation too. E.g. in the example above the difference is >150 likelihood units.

So, I have a suggestion. When we optimise RN+1 (e.g. R4) we should do an intialisation step where we start with the ML rate parameters from RN (e.g. R3), and just add an extra one while holding the initial N parameters constant. We can then try to optimise this constrained model, e.g. by sliding the new parameter from the minimum up to double the maximum rate from RN. My bet is that this will often get us a model with RN+1 that has a better likelihood. But even if it doesn't, we can then pass these RN+1 rates to the BFGS or EM optimiser to further optimise them all together.

Thoughts @bqminh and @thomaskf? This is really just a constrained EM step to start with. And maybe we already do something like this.

Either way, it seems like there's room for improvement here.

Dramatically different model parameter estimates and tree length between 2.2.0 and 2.1.4

We have been running IQ tree on simulated data along an HIV phylogeny and obtain dramatically different results when using version 2.2.0 vs 2.1.4. Total tree length differs by a factor of 3 and the estimated rate categories and distribution are very different.

The alignment used was:

aln.fasta.gz

IQ-tree was run as

iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s IQtree_2.2.0/aln.fasta -m GTR+F+R10 -czb

For version 2.2.0, we obtain a tree-length of 96 and a wide rate distribution peaked at 0.

Rate parameters:  A-C: 1.71553  A-G: 56.49825  A-T: 0.40073  C-G: 0.74865  C-T: 100.00000  G-T: 1.00000
Base frequencies:  A: 0.385  C: 0.169  G: 0.234  T: 0.212
Site proportion and rates:  (0.263,0.050) (0.139,0.145) (0.130,0.239) (0.109,0.396) (0.119,0.629) (0.081,0.933) (0.050,1.492) (0.045,2.557) (0.020,5.333) (0.044,10.219)

[...]

BEST SCORE FOUND : -404620.664
Collapsing near-zero internal branches... 41 collapsed
Total tree length: 95.702

For version 2.1.4, total tree length is 27 and the rate distribution is peaked around 1.

Optimal log-likelihood: -403882.383
Rate parameters:  A-C: 2.14594  A-G: 58.12361  A-T: 0.56247  C-G: 0.93941  C-T: 100.00000  G-T: 1.00000
Base frequencies:  A: 0.385  C: 0.169  G: 0.234  T: 0.212
Site proportion and rates:  (0.098,0.121) (0.185,0.177) (0.010,0.308) (0.156,0.480) (0.117,0.698) (0.115,1.068) (0.149,1.657) (0.117,2.134) (0.043,3.103) (0.009,4.676)

[...]

BEST SCORE FOUND : -403378.895
Collapsing near-zero internal branches... 43 collapsed
Total tree length: 33.723

Capital letter in source file name causes compilation hiccup

Hi everyone,
I stumbled across a small issue trying to compile IQ-TREE2 on a linux computer. Configuring the source code with cmake .. causes an error because decentTree.cpp cannot be found:
CMake Error at utils/CMakeLists.txt:25 (add_executable): Cannot find source file: decentTree.cpp (Full cmake log)

I noticed that the source file is named utils/DecentTree.cpp with a capital D. After mv utils/DecentTree.cpp utils/decentTree.cpp it compiles successfully.
Cheers,
Lukas

Change sequence name

Greetings! I am loving the features of iqtree2 but I wanted to point out an issue that caused me a bit of a headache. I understand that some special characters and very long names are problematic, but the automatic changing of sequence name feature kind of messed up a dataset I spent a lot of time curating. It would be helpful if it made a backup of the input file before changing the sequence names.

I"m referring to this sort of thing:
NOTE: Change sequence name 'Acanthomeniidae sp 1 FB-2020 voucher ZSM Mol 20190564|MN531195.1' -> Acanthomeniidae_sp_1_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 2 FB-2020 voucher ZSM Mol 20190565|MN531196.1' -> Acanthomeniidae_sp_2_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 2 FB-2020 voucher ZSM Mol 20190566|MN531197.1' -> Acanthomeniidae_sp_2_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 2 FB-2020 voucher ZSM Mol 20190567|MN531198.1' -> Acanthomeniidae_sp_2_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 3 FB-2020 voucher ZSM Mol 20190568|MN531199.1' -> Acanthomeniidae_sp_3_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 5 FB-2020 voucher ZSM Mol 20190570|MN531200.1' -> Acanthomeniidae_sp_5_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 6 FB-2020 voucher ZSM Mol 20190572|MN531201.1' -> Acanthomeniidae_sp_6_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 6 FB-2020 voucher ZSM Mol 20190573|MN531202.1' -> Acanthomeniidae_sp_6_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 6 FB-2020 voucher ZSM Mol 20190574|MN531203.1' -> Acanthomeniidae_sp_6_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 6 FB-2020 voucher ZSM Mol 20190575|MN531204.1' -> Acanthomeniidae_sp_6_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 6 FB-2020 voucher ZSM Mol 20190577|MN531205.1' -> Acanthomeniidae_sp_6_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 7 FB-2020 voucher ZSM Mol 20190578|MN531206.1' -> Acanthomeniidae_sp_7_FB-2020
NOTE: Change sequence name 'Acanthomeniidae sp 8 FB-2020 voucher ZSM Mol 20190579|MN531207.1' -> Acanthomeniidae_sp_8_FB-2020

Which led to errors like this later:
ERROR: Duplicated sequence name Acanthomeniidae_sp_2_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_2_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_6_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_6_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_6_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_6_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_SB-1_FB-2020
ERROR: Duplicated sequence name Acanthomeniidae_sp_SB-1_FB-2020

Thanks!
Kevin

removing processer number issues during modelfinder

When I run modelfinder, i pick a number of threads appropriate for that (usually, A LOT).

Towards the end of the run, the output is dominated by this error message:

WARNING: Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads.

We should suppress that message during the modelfinder run.

related: we should be very careful about how we assign numbers of threads during modelfinder when people use -T AUTO. I think a sensible default would be for -T AUTO's behaviour to use ALL available threads for modelfinder, and then the auto-detected number of threads after that.

Which releases should be considered stable?

Thanks for making IQtree, we use it extensively for all nextstrain builds and as part of Nextclade. Very useful.

I'm a little confused about your release management. A lot of beta releases, then there's a release but it's marked pre-release. Then there's another one marked latest but it's also beta.

Could you maybe clarify which version you recommend me to use?

image

image

LSD2: Mis-formating date outputs

Hi there,

I'm trying to run IQ-TREE multicore version 2.2.0-beta to obtain dates with confidence intervals for ancestral nodes. The command, iqtree -s gubbins.filtered_polymorphic_sites.fasta --date input_dates.txt --date-ci 100 -m GTR+F+ASC+G4
appears to run fine, with the only warning:

*WARNINGS:
- 82.9044% internal branches were collapsed.

However, in all output files and std out, the year of the lower bound confidence interval date is misformatted and uninterpretable (e.g. tMRCA 1969-11-13 [-278019809-01-21; 1999-05-11]). All other dates appear to be formatted properly as do months and years. I'm assuming this isn't an issue with the formatting of my date file since all other dates are fine. If someone could offer some guidance in fixing this issue, it would be greatly appreciated.

Many thanks,

Emma

Another issue with ModelFinder: "ERROR: Alignment sequence XXX does not appear in the tree"

While analysing an amino acid alignment, using ModelFinder, the IQ-2.1.3 suddenly abort reporting that a number of sequences do "not appear in the tree". This is the command used:

iqtree2 -s 488_AA_supermatrix.fst -st AA -m MF -mset LG,Dayhoff,JTT,WAG,VT,DCMut,PMB,JTTDCMut,Blosum62 -mrate E,I,G,I+G,R,I+R -mfreq FU,FO,F -mtree -T 12 -merit BIC -safe

The .log file ends as follows:

Total number of iterations: 102
CPU time used for tree search: 5833.667 sec (1h:37m:13s)
Wall-clock time used for tree search: 303.996 sec (0h:5m:3s)
Total CPU time used: 5999.865 sec (1h:39m:59s)
Total wall-clock time used: 312.507 sec (0h:5m:12s)
3 finished checkpoint entries erased
71 Dayhoff+R2 1297255.173 95 2594700.346 2594700.497 2595621.828

===> Testing model Dayhoff+R3

NOTE: 891 MB RAM (0 GB) is required!
Estimate model parameters (epsilon = 0.100)

  1. Initial log-likelihood: -1286354.130
  2. Current log-likelihood: -1286318.064
  3. Current log-likelihood: -1286315.377
  4. Current log-likelihood: -1286314.752
  5. Current log-likelihood: -1286314.648
    Optimal log-likelihood: -1286314.545
    Site proportion and rates: (0.734,0.089) (0.201,2.060) (0.065,8.018)
    Parameters optimization took 5 rounds (7.179 sec)
    Computing ML distances based on estimated model parameters...
    Computing ML distances took 0.060216 sec (of wall-clock time) 1.248999 sec(of CPU time)
    Computing RapidNJ tree took 0.005007 sec (of wall-clock time) 0.109779 sec (of CPU time)
    ERROR: Alignment sequence Ailuropoda_melanoleuca does not appear in the tree
    ERROR: Alignment sequence Balaenoptera_acutorostrata does not appear in the tree
    ERROR: Alignment sequence Bos_taurus does not appear in the tree
    ERROR: Alignment sequence Camelus_ferus does not appear in the tree
    ERROR: Alignment sequence Canis_lupus does not appear in the tree
    ERROR: Alignment sequence Ceratotherium_simum does not appear in the tree
    ERROR: Alignment sequence Condylura_crisata does not appear in the tree
    ERROR: Alignment sequence Equus_caballus does not appear in the tree
    ERROR: Alignment sequence Erinaceus_europaeus does not appear in the tree
    ERROR: Alignment sequence Felis_catus does not appear in the tree
    ERROR: Alignment sequence Leptonychotes_weddellii does not appear in the tree
    ERROR: Alignment sequence Manis_javanica does not appear in the tree
    ERROR: Alignment sequence Manis_pentadactyla does not appear in the tree
    ERROR: Alignment sequence Molossus_molossus does not appear in the tree
    ERROR: Alignment sequence Mustela_putorius does not appear in the tree
    ERROR: Alignment sequence Myotis_myotis does not appear in the tree
    ERROR: Alignment sequence Phyllostomus_discolor does not appear in the tree
    ERROR: Alignment sequence Pipistrellus_kuhlii does not appear in the tree
    ERROR: Alignment sequence Rhinolophus_ferrumequinum does not appear in the tree
    ERROR: Alignment sequence Rousettus_aegyptiacus does not appear in the tree
    ERROR: Alignment sequence Sorex_araneus does not appear in the tree
    ERROR: Alignment sequence Sus_scrofa does not appear in the tree
    ERROR: Alignment sequence Tupaia_belangeri does not appear in the tree
    ERROR: Alignment sequence Tursiops_truncatus does not appear in the tree
    (Dasypus_novemcinctus,((Echinops_telfairi,Orycteropus_afer),(Loxodonta_africana,Trichechus_manatus_latirostris)),((((Ochotona_princeps,Oryctolagus_cuniculus),((Ictidomys_tridecemlineatus,((Cricetulu
    s_griseus,Microtus_ochrogaster),(Mus_musculus,Rattus_norvegicus))),(Cavia_porcellus,Heterocephalus_glaber))),(((Callithrix_jacchus,Saimiri_boliviensis),(((Gorilla_gorilla,(Homo_sapiens,Pan_troglodyt
    es)),Pongo_abelii),Macaca_fascicularis)),Carlito_syrichta)),(Otolemur_garnettii,Microcebus_murinus)));
    ERROR: Tree taxa and alignment sequence do not match (see above)

In other words, the program successfully processes the first 71 models, and the error occurs during the analysis of the 72th model.

The same error was found in a recent analysis of DNA.

Somehow, recent revisions of IQ-TREE, have caused ModelFinder to become unstable (perhaps even unreliable)

Issue with IQTREE2 after Performing Nearest Neighbor interchange

Hello,

I am using IQTREE multicore 2.0.3 for Linux 64-bit for ancestral reconstructions with this command:

iqtree -s gene_20.fasta -st NT2AA -t ../Gorilla.tree -m MF --ancestral --threads-max 4 --keep-ident --safe -fast

I keep getting an error after the "Performing Nearest Neighbor interchange..." stage.

ERROR: phylotree.cpp:3575: virtual NNIMove PhyloTree::getBestNNIForBran(PhyloNode*, PhyloNode*, NNIMove*): Assertion `node1->degree() == 3 && node2->degree() == 3' failed.

gene_20.fasta.log
Gorilla.tree.txt
gene_20.fasta.log

Is there an issue that I'm doing with my command? Or can I resolve this?

Thank you so much for the help,
Jacob Bowman

batch parsimony error

@JamesBarbetti, from the dev branch:

wget https://hgwdev.gi.ucsc.edu/~angie/publicMsa/publicMsa.2021-03-18.masked.fa.xz  
unxz publicMsa.2021-03-18.masked.fa.xz
./iqtree2 -s publicMsa.2021-03-18.masked.fa -t PARS -parsimony-batch -experimental -pre batch_parsimony -nt 128

error:

6494. inserted MW593537.1 on its desired branch (branch index 18779). It had parsimony score 3 (and path lengths 3.34415e-05, 6.68829e-05, 0.000100324)
ERROR : Could not find node 785259 as a neighbor of node 160004
ERROR: node.cpp:199: Neighbor* Node::findNeighbor(Node*): Assertion `0' failed.
ERROR: 
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: batch_parsimony.log
ERROR: ***    Alignment files (if possible)
Aborted (core dumped)

install iqtree from anaconda powershell prompt

(base) PS C:\Users\saman\anaconda3> conda install -c bioconda iqtree
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • iqtree

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

I don't understand the number of schemes analysed for the greedy algorithm

I'm analysing a dataset with 168 data blocks, using the greedy algorithm for merging subsets.

When I do this in PF2, the first step of the greedy algorithm computes the BIC score of 14028 subset pairs. This is expected - 168 choose 2 = 14028.

But when I run what I think is the same analysis in MF2, the first step of the algorithm analyses just 2328 subset pairs. At least, this is what's reported in the logged output. I really can't figure out quite what's going on.

It gets stranger when you look at the next step of the algorithm, which should only be analysing subsets that add to the new subset, but in this case my output is as below.

I'm guessing my commandline is just wrong, and in particular that --merge greedy is not actually doing what I think it's doing, i.e. implementing an algorithm the same as the PF2 greedy algorithm.

Commandline

Command: /data/rob/mf2/iqtree-2.1.2-Linux/bin/iqtree2 -s alignment.phy -spp partitions.nex -m TESTMERGEONLY -mset GTR -mrate E,G,I+G --merge-model GTR --merge-rate E,G,I+G --merge greedy -nt 128 --seed 123456742
Merging ES321806nucl_3rdpos+O10821nucl_3rdpos with BIC score: 572164.159 (LnL: -276386.833  df: 1908)
2497 GTR+F        572129.139  0.173       O19758nucl_2ndpos+O21222nucl_2ndpos   0h:0m:4s (0h:0m:26s left)
2498 GTR+F        572105.823  0.370       O16400nucl_2ndpos+O21222nucl_2ndpos   0h:0m:4s (0h:0m:26s left)
2499 GTR+F+I+G4   572118.259  0.481       O10295nucl_1stpos+O11722nucl_2ndpos   0h:0m:4s (0h:0m:26s left)
2500 GTR+F+G4     572085.885  1.141       O5959nucl_1stpos+O8818nucl_1stpos     0h:0m:4s (0h:0m:26s left)
2501 GTR+F+I+G4   572208.592  1.394       O10600nucl_2ndpos+O13660nucl_1stpos   0h:0m:4s (0h:0m:26s left)
2502 GTR+F+G4     572085.680  1.204       O5959nucl_1stpos+O23857nucl_1stpos    0h:0m:4s (0h:0m:26s left)
2503 GTR+F+G4     572067.794  0.866       O13507nucl_1stpos+O21944nucl_1stpos   0h:0m:5s (0h:0m:26s left)
2504 GTR+F+G4     572117.617  1.104       ES321879nucl_1stpos+O8818nucl_1stpos  0h:0m:5s (0h:0m:26s left)
2505 GTR+F+G4     572122.542  15.451      ES319935nucl_3rdpos+O988nucl_1stpos   0h:0m:5s (0h:0m:26s left)
2506 GTR+F+G4     572086.353  0.905       O19189nucl_1stpos+O22156nucl_1stpos   0h:0m:5s (0h:0m:26s left)
2507 GTR+F+G4     572078.146  1.229       ES321882nucl_1stpos+O16400nucl_1stpos 0h:0m:5s (0h:0m:26s left)
2508 GTR+F+G4     572084.614  1.052       O8775nucl_1stpos+O10821nucl_1stpos    0h:0m:5s (0h:0m:26s left)
2509 GTR+F+G4     572076.994  0.839       O13660nucl_1stpos+O19189nucl_1stpos   0h:0m:5s (0h:0m:26s left)
2510 GTR+F+G4     572104.762  9.295       O10600nucl_3rdpos+O17424nucl_3rdpos   0h:0m:5s (0h:0m:26s left)
2511 GTR+F+I+G4   572099.081  11.129      O1187nucl_3rdpos+O19102nucl_3rdpos    0h:0m:5s (0h:0m:27s left)
2512 GTR+F+I+G4   572144.710  14.689      ES319935nucl_3rdpos+ES321806nucl_3rdpos+O10821nucl_3rdpos     0h:0m:5s (0h:0m:27s left)
2513 GTR+F+I+G4   572065.166  10.102      O14147nucl_3rdpos+O15665nucl_3rdpos   0h:0m:5s (0h:0m:27s left)
2514 GTR+F+G4     572122.068  6.868       O9874nucl_3rdpos+O19058nucl_3rdpos    0h:0m:5s (0h:0m:27s left)
2515 GTR+F+I+G4   572087.934  1.532       ES321806nucl_1stpos+O10419nucl_1stpos 0h:0m:5s (0h:0m:27s left)
2516 GTR+F+I+G4   572133.599  9.612       O10867nucl_3rdpos+O17424nucl_3rdpos   0h:0m:5s (0h:0m:27s left)
2517 GTR+F+I+G4   572116.124  15.400      ES321806nucl_3rdpos+ES321879nucl_3rdpos+O10821nucl_3rdpos     0h:0m:5s (0h:0m:27s left)
2518 GTR+F+I+G4   572150.231  17.265      ES321806nucl_3rdpos+O6843nucl_3rdpos+O10821nucl_3rdpos        0h:0m:5s (0h:0m:27s left)
2519 GTR+F+I+G4   572289.800  16.016      ES321806nucl_3rdpos+O10821nucl_3rdpos+O18599nucl_3rdpos       0h:0m:5s (0h:0m:27s left)
2520 GTR+F+I+G4   572078.331  15.736      ES321806nucl_3rdpos+O988nucl_1stpos+O10821nucl_3rdpos 0h:0m:5s (0h:0m:28s left)
2521 GTR+F+I+G4   572060.003  15.333      ES321806nucl_3rdpos+O8818nucl_3rdpos+O10821nucl_3rdpos        0h:0m:5s (0h:0m:28s left)
2522 GTR+F+G4     572075.551  0.932       ES316890nucl_1stpos+O17157nucl_1stpos 0h:0m:5s (0h:0m:28s left)
2523 GTR+F+I+G4   572184.827  16.689      ES321806nucl_3rdpos+O10821nucl_3rdpos+O11240nucl_3rdpos       0h:0m:5s (0h:0m:28s left)
2524 GTR+F+I+G4   572059.969  15.301      ES316508nucl_3rdpos+ES321806nucl_3rdpos+O10821nucl_3rdpos     0h:0m:5s (0h:0m:28s left)
2525 GTR+F+I+G4   572132.526  16.415      ES321806nucl_3rdpos+O10821nucl_3rdpos+O13660nucl_3rdpos       0h:0m:5s (0h:0m:28s left)
2526 GTR+F+I+G4   572143.651  15.058      ES321806nucl_3rdpos+ES321882nucl_3rdpos+O10821nucl_3rdpos     0h:0m:5s (0h:0m:28s left)
2527 GTR+F+I+G4   572133.231  15.255      ES321806nucl_3rdpos+O10821nucl_3rdpos+O21286nucl_3rdpos       0h:0m:5s (0h:0m:28s left)
2528 GTR+F+I+G4   572107.812  15.332      ES321806nucl_3rdpos+O540nucl_3rdpos+O10821nucl_3rdpos 0h:0m:5s (0h:0m:29s left)
Merging O17252nucl_3rdpos+O17764nucl_3rdpos with BIC score: 572031.302 (LnL: -276371.219  df: 1898)

using stractural variants (SVs) to build tree by iqtree

Hi.
I called stractural variants (SVs) (>50bp) from WGS data. And I want to build a phylogenetic tree by IQtree.

Since SVs data are .vcf files, and IQtree doesn't support this type of file. And I transfer .vcf files into .fasta file according to SVs' genotype. I used "A" to represent "0", "T" to represent "1", and two bases represent one genotype. Then, used this transfered fasta file as input to build tree by IQtree. And got the result as my expect.
commond line: iqtree -s xxx.fasta -nt AUTO -b 1000 -m MFP -pre xxx_MFP

But someone told me that iqtree test the best module based on bases.
So, I want to know whether it's proper to do as above. Which module is proper for my data? Or what should I do if I want to bulid phylogenetic tree by SVs data.

Thank you.

compile error with “cmake -DIQTREE_FLAGS=KNL ../”

I followed the instructions to create and open "build" folder in the root directory of the "iqtree2"

Because my server is equipped intel xeon 8260 cpu, I want to improve performance by using intel C++ compiler and adding "KNL" tags while "cmake":
cmake -DIQTREE_FLAGS=KNL ../
So far, no @@problems,

Then I use "make" command to make file, it comes a error:
/root/iqtree2/tree/phylokernelavx512.cpp(100): error: no instance of function template "PhyloTree::computeNonrevLikelihoodBranchGenericSIMD" matches the required type
computeLikelihoodBranchPointer = &PhyloTree::computeNonrevLikelihoodBranchGenericSIMD ;
^

/root/iqtree2/tree/phylokernelavx512.cpp(101): error: no instance of function template "PhyloTree::computeNonrevLikelihoodDervGenericSIMD" matches the required type
computeLikelihoodDervPointer = &PhyloTree::computeNonrevLikelihoodDervGenericSIMD ;
^

/root/iqtree2/tree/phylokernelavx512.cpp(102): error: no instance of function template "PhyloTree::computeNonrevPartialLikelihoodGenericSIMD" matches the required type
computePartialLikelihoodPointer = &PhyloTree::computeNonrevPartialLikelihoodGenericSIMD;

So I follow the error message, i found 3 lines(100-102) of code in "phylokernelavx512.cpp":
100: computeLikelihoodBranchPointer = &PhyloTree::computeNonrevLikelihoodBranchGenericSIMD ;
101: computeLikelihoodDervPointer = &PhyloTree::computeNonrevLikelihoodDervGenericSIMD ;
102: computePartialLikelihoodPointer = &PhyloTree::computeNonrevPartialLikelihoodGenericSIMD;
then, my c++ editor (VS Code) shows: error: no instance of function template "..." matches the required type "..."

I moved to line 126-128:
126: computeLikelihoodBranchPointer = &PhyloTree::computeLikelihoodBranchSIMD <Vec8d, SAFE_LH, true>;
127: computeLikelihoodDervPointer = &PhyloTree::computeLikelihoodDervSIMD <Vec8d, SAFE_LH, true>;
128: computeLikelihoodDervMixlenPointer = &PhyloTree::computeLikelihoodDervMixlenSIMD<Vec8d, SAFE_LH, true>;

I found line100-102 and line 126-128 are almost same but at the end:
100: ... ;
101: ... ;
102: ... ;

126: ... <Vec8d, SAFE_LH, true>;
127: ... <Vec8d, SAFE_LH, true>;
128: ... <Vec8d, SAFE_LH, true>;

So I thouht maybe I can fix it with add " , SAFE_LH, 4, true" at the end of line 100-102:
100: computeLikelihoodBranchPointer = &PhyloTree::computeNonrevLikelihoodBranchGenericSIMD <Vec8d, SAFE_LH, true>;
101: computeLikelihoodDervPointer = &PhyloTree::computeNonrevLikelihoodDervGenericSIMD <Vec8d, SAFE_LH, true>;
102: computePartialLikelihoodPointer = &PhyloTree::computeNonrevPartialLikelihoodGenericSIMD<Vec8d, SAFE_LH, true>;

After that, I run “cmake -DIQTREE_FLAGS=KNL ../” and “make” again, no error shows, and I successfully get “iqtree2” program

Then I run “iqtree2” to analyze my data, no error show

I think it might help! #

Include +I+R models

Currently, we don't examine +I+R models by default, but looking at a bunch of empirical datasets suggests that these models are often the best.

To fix this, we just need to change what happens when:

  1. -mrate all and
  2. --merge-rate all
  3. the default settings for MF2 when -mrate and --merge-rate are not set

The only change is that the set of rate distributions considered should now be:

E,G,I,R,I+G,I+R

update nexus format error message

Related to this discussion on the user group:

https://groups.google.com/g/iqtree/c/SVTLc079Hvo/m/it1HLuR1DgAJ

We should update this error message:

ERROR: Partition file is not in NEXUS format, assuming RAxML-style partition file...

To read:

ERROR: Partition file is not in NEXUS format, assuming RAxML-style partition file...

If you thought your partition file was in NEXUS format, it's possible you have not saved it in plain text format. In this case, please use a plain text editor (like TextWrangler or Sublime Text) or make sure to save the file as 'Plain Text' in whatever editor you prefer to use. 

ModelFinder playing up

When using the command:

iqtree2 -s data.fst -st AA -m MF -msub nuclear -mtree -T 24 -merit BIC -safe

I expected the code to search tree space for every model of sequence evolution, which combines:

  1. all the nuclear substitution models for amino acids (i.e., LG, WAG, JTT, ...)
  2. the E,I,G,I+G,R[n] rate-heterogeneity across sites (RHAS) models

However, as the following check shows, the program skipped a large number of RHAS models for some of the substitution models:

grep "===>" data.fst.log
===> Testing model LG
===> Testing model LG+I
===> Testing model LG+G4
===> Testing model LG+I+G4
===> Testing model LG+R2
===> Testing model LG+R3
===> Testing model LG+R4
===> Testing model LG+R5
===> Testing model LG+R6
===> Testing model LG+F+R5
===> Testing model WAG+R5
===> Testing model WAG+F+R5
===> Testing model JTT+R5
===> Testing model JTT+F+R5
===> Testing model Q.pfam+R5
===> Testing model Q.pfam+F+R5
===> Testing model JTTDCMut+R5
===> Testing model JTTDCMut+F+R5
===> Testing model DCMut+R5
===> Testing model DCMut+F+R5
===> Testing model VT+R5
===> Testing model VT+F+R5
===> Testing model PMB+R5
===> Testing model PMB+F+R5
===> Testing model Blosum62+R5
===> Testing model Blosum62+F+R5
===> Testing model Dayhoff+R5
===> Testing model Dayhoff+F+R5

Things start going wrong after testing the LG+R6. For example, the LG+F, LG+F+I, LG+F+G, LG+F+I+G, LG+F+R2, LG+F+R3 and LG+F+R4 models are all missing.

qustion about the nsite for branch length inference

Hi , Thanks for you developing the IQ-TREE2 software. I have a question about the species trees inference . I am using the software r8s , which will use the tree . As mentioned in the forum 'https://www.biostars.org/p/367080/', I donnot not the number of site for species inference . Here is some of the log when I build the tree ,saying


SEQUENCE ALIGNMENT
Input data: 7 sequences with 60206 amino-acid sites
Number of constant sites: 32413 (= 53.8368% of all sites)
Number of invariant (constant or ambiguous constant) sites: 32413 (= 53.8368% of all sites)
Number of parsimony informative sites: 9205
Number of distinct site patterns: 24168

Are these sites used for tree inference ? can you give me a clue about it ? Thank you .

Can IQTREE support codon degeneracy data?

i have got some codon degeneracy matrix and used iqtree for tree inference. I found that it just followed as amino acid data when modelfinder progress. Can someone tell me the truth? Many thanks!

BFGS+EM for optimising model parameters

The other day we discussed which algorithm was better for model parameter estimation. We have two implementations:

  1. EM, which will fix all parameters and optimise each one in turn while holding the others fixed, then iterate until it's done.

  2. BFGS, which will optimise all parameters at once

@bqminh mentioned that in the original modelfinder, they had compared these and found that BFGS was quicker but gave worse likelihoods, so they decided to stick with EM.

@JamesBarbetti mentioned that often it's better to chain them together

I said I'd take a look at the huge SARS-CoV-2 alignments and see what happened. Here's what happened, confirming that this is hard and also interesting. This is an alignment of ~30K bp and ~40K sequences. Free rate models fit much better than the other models.

JC+I+R5 optimised with EM:
lnL: -388587.635
Proportion of invariable sites: 0.389
Site proportion and rates: (0.529,0.808) (0.074,5.739) (0.009,16.933)

JC+I+R5 optimised with BFGS:
lnL: <-375974.923841
Proportion of invariable sites: 0.616
Site proportion and rates: (0.426,0.269) (0.415,0.625) (0.159,3.936)

It's less than that because I had to kill the analysis before it had a chance to do the final round of fine-grained model parameter optimisations. The first analysis took about 2 days. I killed the second on after 1 day, and I'd guess it was going to take a fairly similar total execution time compared the first one. BFGS is certainly no more than twice as fast.

The surprise: BFGS got WAY better likelihoods.

So, my suggestion is that we implement @JamesBarbetti's suggestion of the two algorithms chained together and compare the performance on a range of datasets. I guess there are lots of options for how to chain them, including:

  1. BFGS until convergence followed EM until convergence
  2. As in 1 but EM followed by BFGS
  3. Switch between BFGS and EM on each iteration

@JamesBarbetti suggested that option 3 might be best, and that's my intuition too.

I think this could be a nice addition to ModelFinder2, and simple to compare on a representative datasets.

Better? parallelisation for MF2

Currently we parallelise MF2 by sending each subset to its own thread. This is OK, but we will often miss out on a lot of potential efficiency.

E.g. imagine we have 100 available processors, and we're analysing a dataset with 10 data blocks and we want to fit 100 models to each data block.

Currently we can only use 10 threads for this, so we can only get maximum 10% efficiency.

We can refactor the parallelisation to speed this up though, as follows.

  1. Start by using 10 threads to estimate the most complex model for each data block. This is obviously limiting, but since it's only one model per data block it will also be quick.
  2. Use the result from step 1 to set the initial parameters of the models for all of the less complex models, as we currently do.
  3. Create a job queue that includes all of the remaining 99 models from each of the 10 data blocks (990 jobs), with parameters initialised from step 2 (maybe this is where there are limitations, e.g. if you use previous estimates from free-rate models to initialise other free rate models, but in this case it might just require that the jobs are sometimes packaged into related subsets, e.g. all the free-rate jobs go to one processor because they need to build on each other)
  4. Order the job queue roughly according to how long we think each job will take
  5. Run the jobs

In this example we can go from maximum 10% efficiency to a maximum of 99% efficiency.

In terms of ordering jobs, maybe there's already something in IQ-TREE to estimate execution time for a job. If not, then in PF2 we use a really crude estimates based on gut feelings for how long different types of model tend to take to optimise. E.g. JC is faster than GTR. GTR is faster than GTR+I and GTR+G, and those are all a lot faster than GTR+I+G, with GTR+Rx models the slowest.

Segmentation fault with mixed protein+DNA partition

Hello,

sorry to post here instead of Google groups but I cannot seem to join that. I am experiencing an apparently unreported segmentation fault on a rather small dataset of 8 proteins and 1 rDNA matrix (altogether ~4300 characters). After loading the input, IQ-TREE consumes 90GB RAM and crashes. This does not happen with version 1.6, or with protein and DNA partitions analyzed separately. Would you kindly advise me on this?
I am attaching the log file. partition.nex.log

Many thanks, kind regards

Zoltan

iqtree v 2.1.3 - assertion error in alignment.cpp:77

Hi all,

After successfully creating varsites file we ran across the following error while using iqtree2 v 2.1.3 :

ERROR: alignment.cpp:77: std::__cxx11::string &Alignment::getSeqName(int): Assertion `i >= 0 && i < (int)seq_names.size()' failed.
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: *** Log file: AS.pruning-0.5.log
ERROR: *** Alignment files (if possible)

I am attaching the log file
AS.pruning-0.5.log

Could you please help us in handling this error?

Thanks in advance.

dynamic allocation of threads during and after modelfinder

At the moment we have a small efficiency issue when doing an analysis that involves ModelFinder then a tree search: different numbers of threads are the best for the two analyses (esp. when the alignment is short).

Typically, we want all available threads for ModelFinder2, then often far fewer for the tree search. There are a couple of cases to consider. Here's what I can think of, with my suggesions:

With option -nt AUTO

Here I think we should just set it to all available threads for MF2, then run the standard thread selection algorithm for the tree search

With option -nt N

My suggestion: we set the number of threads to N for MF2, then run thread selection during tree search, if and only if the conditions are triggered that usually give the warning: WARNING: Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads. (which, by the way, is a frustrating kind of warning as a user when you know you have to pick a large N for MF2, and would like to pick a smaller N for the tree search, but there's no option to do so.

An alternative (probably not as good)

We could consider adding more than one -nt command, to distinguish between MF2 and tree search, something like: -nt_mf and -nt. In this case, -nt_mf would apply only to modelfinder, and would override -nt for modelfinder if it's used. Otherwise, everything else would stay the same. Also if we go this route, we should update the warning WARNING: Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads. , to read something like WARNING: Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads. to read something like WARNING: Number of threads seems too high for short alignments. Use -nt AUTO to determine best number of threads, or -nt_mf combined with -nt to set different numbers of threads for modelfinder and tree search

`--tree-fix` flag throws error "Invalid option"

Hi,

I'm trying to using iqtree2 to optimize branch lengths on a fixed topology, but encounter the error Invalid "--tree-fix" option. when using the --tree-fix flag described in the help menu. Can you clarify what flag should be used?

Version:

IQ-TREE multicore version 2.1.2 COVID-edition for Mac OS X 64-bit built Oct 22 2020
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Help menu says:
--tree-fix Fix -t tree (no tree search performed)

Reproducible example:
iqtree2 -m LG -s data.fasta -t data.tre --tree-fix

I have attached the data files from example.

Thanks!
-Stephanie
forissue.zip

Iqtree2 covid19 version

Dear Bui Quang Minh,

I was trying to use the windows version of IQtree2 for COVID-19 (iqtree-2.1.2-Windows), but the run finished after the evaluation of identical sequences.
I am attaching the log file created.
all_30-11-20_2.fas_UnAmbigu.fasta.log

Do you have an idea of what is wrong?,

Thank you very much in advance for your prompt answer.

New bug in parsinomy code?

@JamesBarbetti I think something untoward is happening in the parsimony code.

Using exactly the same data and commandline, I get very different outputs from the code compiled on 27 Apr versus 15 May.

Logs are pasted below, but the tl;dr is that I have two worries:

  1. New version doesn't tell me it's doing any optimisation (maybe you just removed print statements though)
  2. New version tells me a patently incorrect parsimony score (8207) when I know from previous versions of IQ-TREE and independent analyses (e.g. UShER) that the score is >500,000

A bug?

Rob

Log Files

27 Apr is as expected:

IQ-TREE multicore version 2.1.2 COVID-edition for Linux 64-bit built Apr 27 2021                                                            
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host:    c109762 (AVX2, FMA3, 1007 GB RAM)
Command: ../iqtree -n 0 -no-ml-dist -m JC -t usher/placed/binary-tree.nh -s aln_sampled.fa -parsimony-spr 100 -parsimony-nni 100 -spr-radius 40 --suppress-list-of-sequences -blfix -nt 100 -fast -pre iqtree_parsimony2
Seed:    328716 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Sat May 15 20:06:41 2021
Kernel:  AVX+FMA - 100 threads (256 CPU cores detected)

Reading alignment file aln_sampled.fa ... Fasta format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 410000 sequences with 29628 columns, 29518 distinct patterns
23198 parsimony-informative, 2213 singleton sites, 4217 constant sites
Reading input tree file usher/placed/binary-tree.nh ... rooted tree
Before doing (up to) 100 rounds of parsimony SPR, parsimony score was 587794
Applied 1418 moves (out of 4030) (2009 still possible) in iteration 1 (parsimony now 586169) after 28 min 22 sec
Applied 132 moves (out of 297) (148 still possible) in iteration 2 (parsimony now 586019) after 46 min 48 sec
Applied 29 moves (out of 61) (31 still possible) in iteration 3 (parsimony now 585983) after 1 hrs 2 min 30 sec
Applied 3 moves (out of 7) (3 still possible) in iteration 4 (parsimony now 585980) after 1 hrs 17 min 45 sec
Applied 0 moves (out of 0) (0 still possible) in last iteration  (parsimony now 585980) (total SPR moves examined 17550096376)
Before doing (up to) 100 rounds of parsimony NNI, parsimony score was 585980
Applied 0 moves (out of 0) (0 still possible) in last iteration  (parsimony now 585980) (total NNI moves examined 819996)

NOTE: 392499 MB RAM (383 GB) is required!
Estimate model parameters (epsilon = 0.05)
1. Initial log-likelihood: -8.43037e+09
Optimal log-likelihood: -8.43037e+09
Rate parameters:  A-C: 1.00000  A-G: 1.00000  A-T: 1.00000  C-G: 1.00000  C-T: 1.00000  G-T: 1.00000
Base frequencies:  A: 0.250  C: 0.250  G: 0.250  T: 0.250
Parameters optimization took 1 rounds (19.3878 sec)
NOTE: 3.264 seconds to dump checkpoint file, increase to 66.000
BEST SCORE FOUND : -8430370020.728
Total tree length: 574019.970

Total number of iterations: 0
CPU time used for tree search: 0.282 sec (0h:0m:0s)
Wall-clock time used for tree search: 0.283 sec (0h:0m:0s)
Total CPU time used: 254665.764 sec (70h:44m:25s)
Total wall-clock time used: 6026.242 sec (1h:40m:26s)

Analysis results written to: 
  IQ-TREE report:                iqtree_parsimony2.iqtree
  Maximum-likelihood tree:       iqtree_parsimony2.treefile
  Screen log file:               iqtree_parsimony2.log

NOTE: 4.895 seconds to dump checkpoint file, increase to 98.000
Date and Time: Sat May 15 22:00:48 2021

But the very same command on the latest version (May 15th) gives me a parsimony score that is insane, and doesn't seem to do any optimisation at all (though maybe the last bit is just you removing print statemets?).

IQ-TREE multicore version 2.1.2 COVID-edition for Linux 64-bit built May 15 2021
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host:    c109762 (AVX2, FMA3, 1007 GB RAM)
Command: ../iqtree2 -n 0 -no-ml-dist -m JC -t usher/placed/binary-tree.nh -s aln_sampled.fa -parsimony-spr 100 -parsimony-nni 100 -spr-radiu
s 40 --suppress-list-of-sequences -blfix -nt 100 -fast -pre iqtree_parsimony
Seed:    442874 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Sat May 15 17:48:43 2021
Kernel:  AVX+FMA - 100 threads (256 CPU cores detected)

Reading alignment file aln_sampled.fa ... Fasta format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 410000 sequences with 29628 columns, 29518 distinct patterns
23198 parsimony-informative, 2213 singleton sites, 4217 constant sites
Reading input tree file usher/placed/binary-tree.nh ... rooted tree

NOTE: 392499 MB RAM (383 GB) is required!
Estimate model parameters (epsilon = 0.05)
1. Initial log-likelihood: -8.42783e+09
Optimal log-likelihood: -8.42783e+09
Rate parameters:  A-C: 1.00000  A-G: 1.00000  A-T: 1.00000  C-G: 1.00000  C-T: 1.00000  G-T: 1.00000
Base frequencies:  A: 0.250  C: 0.250  G: 0.250  T: 0.250
Parameters optimization took 1 rounds (23.3431 sec)
NOTE: 3.241 seconds to dump checkpoint file, increase to 65.000
Parsimony score of initial tree: 8671
BEST SCORE FOUND : -8427833300.853
Total tree length: 20.290

Total number of iterations: 0
CPU time used for tree search: 0.269 sec (0h:0m:0s)
Wall-clock time used for tree search: 0.269 sec (0h:0m:0s)
Total CPU time used: 384370.068 sec (106h:46m:10s)
Total wall-clock time used: 5422.123 sec (1h:30m:22s)

Analysis results written to: 
  IQ-TREE report:                iqtree_parsimony.iqtree
  Maximum-likelihood tree:       iqtree_parsimony.treefile
  Screen log file:               iqtree_parsimony.log

NOTE: 5.381 seconds to dump checkpoint file, increase to 108.000
Date and Time: Sat May 15 19:33:39 2021

Can you double check on some simple test cases?

Rob

Change the column order of partitioned models in the .iqtree output

Currently we put the really long column (Name) second, which makes it almost unreadable. The fix is easy though - shift the Name column to be the last column.

Here's an example of why this would help. With large (now common) datasets, we currently give users this to look at:

  ID  Name                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Type	Seq	Site	Unique	Infor	Invar	Const
   1  ES316508nucl_1stpos+ES321882nucl_1stpos+O1187nucl_1stpos+O5959nucl_1stpos+O8818nucl_1stpos+O10353nucl_1stpos+O10353nucl_2ndpos+O10567nucl_1stpos+O10821nucl_1stpos+O11092nucl_1stpos+O11094nucl_1stpos+O11635nucl_1stpos+O11722nucl_1stpos+O13507nucl_1stpos+O13706nucl_1stpos+O14581nucl_1stpos+O15665nucl_1stpos+O16052nucl_1stpos+O16400nucl_1stpos+O17424nucl_1stpos+O19102nucl_1stpos+O21222nucl_1stpos+O21944nucl_1stpos+O22156nucl_1stpos+O22172nucl_1stpos+O22441nucl_1stpos+O23857nucl_1stpos  DNA	92	4233	2366	748	2924	2924
   2  ES316508nucl_2ndpos+O6843nucl_2ndpos+O7569nucl_2ndpos+O8775nucl_2ndpos+O11722nucl_2ndpos+O13507nucl_2ndpos+O17252nucl_2ndpos+O18599nucl_2ndpos+O19102nucl_2ndpos+O21286nucl_2ndpos+O21944nucl_2ndpos                                                                                                                                                                                                                                                                                                    DNA	89	1801	758	130	1556	1556
   3  ES316508nucl_3rdpos+ES321806nucl_3rdpos+O540nucl_3rdpos+O988nucl_1stpos+O1187nucl_3rdpos+O8818nucl_3rdpos+O10295nucl_3rdpos+O10600nucl_3rdpos+O10821nucl_3rdpos+O10867nucl_3rdpos+O12201nucl_3rdpos+O13507nucl_3rdpos+O17157nucl_3rdpos+O17252nucl_3rdpos+O17764nucl_3rdpos+O23857nucl_3rdpos                                                                                                                                                                                                           DNA	108	2423	2395	2232	120	119
   4  ES316890nucl_1stpos+ES319935nucl_1stpos+ES321806nucl_1stpos+ES321879nucl_1stpos+O540nucl_1stpos+O6843nucl_1stpos+O8775nucl_1stpos+O10295nucl_1stpos+O10419nucl_1stpos+O13660nucl_1stpos+O19758nucl_1stpos                                                                                                                                                                                                                                                                                               DNA	110	1495	799	208	1135	1135
   5  ES316890nucl_2ndpos+ES319935nucl_2ndpos+ES321879nucl_2ndpos+ES321882nucl_2ndpos+O1187nucl_2ndpos+O5959nucl_2ndpos+O8818nucl_2ndpos+O10419nucl_2ndpos+O10567nucl_2ndpos+O10821nucl_2ndpos+O11092nucl_2ndpos+O11094nucl_2ndpos+O11240nucl_2ndpos+O11569nucl_2ndpos+O11635nucl_2ndpos+O13706nucl_2ndpos+O14581nucl_2ndpos+O16052nucl_2ndpos+O16400nucl_2ndpos+O17424nucl_2ndpos+O19189nucl_2ndpos+O21222nucl_2ndpos+O22156nucl_2ndpos+O22172nucl_2ndpos+O22441nucl_2ndpos+O23857nucl_2ndpos                DNA	105	3866	1992	293	2999	2999
   6  ES316890nucl_3rdpos+O5959nucl_3rdpos+O11635nucl_3rdpos+O11722nucl_3rdpos+O13706nucl_3rdpos+O14581nucl_3rdpos+O21944nucl_3rdpos+O22156nucl_3rdpos+O22441nucl_3rdpos                                                                                                                                                                                                                                                                                                                                      DNA	83	1353	1347	1262	58	58
   7  ES319935nucl_3rdpos+ES321879nucl_3rdpos+ES321882nucl_3rdpos+O6843nucl_3rdpos+O8775nucl_3rdpos+O10419nucl_3rdpos+O11240nucl_3rdpos+O12061nucl_3rdpos+O13660nucl_3rdpos+O18599nucl_3rdpos+O19102nucl_3rdpos+O21286nucl_3rdpos                                                                                                                                                                                                                                                                             DNA	104	1894	1862	1744	96	96
   8  ES321806nucl_2ndpos+O540nucl_2ndpos+O9874nucl_2ndpos+O10295nucl_2ndpos+O10867nucl_2ndpos+O13660nucl_2ndpos+O14147nucl_2ndpos+O15665nucl_2ndpos+O17157nucl_2ndpos+O17764nucl_2ndpos+O19758nucl_2ndpos+O21786nucl_2ndpos+O30441nucl_2ndpos                                                                                                                                                                                                                                                                DNA	107	2108	731	40	1955	1955
   9  O988nucl_2ndpos+O11240nucl_1stpos+O11569nucl_1stpos+O12061nucl_1stpos+O12201nucl_1stpos+O17252nucl_1stpos+O18599nucl_1stpos+O19058nucl_1stpos+O21286nucl_1stpos                                                                                                                                                                                                                                                                                                                                         DNA	89	1315	871	328	765	765
  10  O988nucl_3rdpos+O10353nucl_3rdpos+O12061nucl_2ndpos+O12201nucl_2ndpos+O19058nucl_2ndpos                                                                                                                                                                                                                                                                                                                                                                                                                 DNA	71	750	450	123	439	439
  11  O7569nucl_1stpos+O9874nucl_1stpos+O10867nucl_1stpos+O14147nucl_1stpos+O17157nucl_1stpos+O17764nucl_1stpos+O19189nucl_1stpos+O21786nucl_1stpos+O30441nucl_1stpos                                                                                                                                                                                                                                                                                                                                         DNA	65	1695	763	249	1351	1351
  12  O7569nucl_3rdpos+O11094nucl_3rdpos+O14147nucl_3rdpos+O15665nucl_3rdpos+O19058nucl_3rdpos+O19189nucl_3rdpos+O21222nucl_3rdpos                                                                                                                                                                                                                                                                                                                                                                            DNA	71	1232	1193	1032	103	102
  13  O9874nucl_3rdpos+O11092nucl_3rdpos+O11569nucl_3rdpos+O16400nucl_3rdpos+O17424nucl_3rdpos+O21786nucl_3rdpos+O22172nucl_3rdpos+O30441nucl_3rdpos                                                                                                                                                                                                                                                                                                                                                          DNA	81	1049	1025	876	96	96
  14  O10567nucl_3rdpos+O10600nucl_1stpos+O10600nucl_2ndpos                                                                                                                                                                                                                                                                                                                                                                                                                                                   DNA	69	319	243	146	117	117
  15  O16052nucl_3rdpos+O19758nucl_3rdpos                                                                                                                                                                                                                                                                                                                                                                                                                                                                     DNA	48	386	360	301	43	43

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.