Hi, I got the following error with my dataset when I was trying to pre-assign all 40 chromosomes to 2 subgenomes. Apparently, SubPhaser re-assigned all chromosomes to SG1. With a smaller number of assignments, SubPhaser successfully completed in the same genome with a smaller number of homologous chromosome assignments, as you suggested in #7.
22-12-23 03:08:56 [INFO] Version: 1.2.5
22-12-23 03:08:56 [INFO] Arguments: {'genomes': ['/gfe_data/species_genome/Nepenthes_gracilis_male_HiC.fa.gz'], 'sg_cfgs': ['/gfe_data/species_subphaser_cfg/Nepenthes_gracilis_subphaser_cfg.txt'], 'labels': None, 'no_label': True, 'target': None, 'sg_assigned': None, 'sep': '|', 'custom_features': None, 'prefix': 'Nepenthes_gracilis.', 'outdir': 'Nepenthes_gracilis.subphaser', 'tmpdir': 'Nepenthes_gracilis.tmp', 'k': 15, 'min_fold': 2, 'min_freq': 200, 'baseline': 1, 'lower_count': 3, 'min_prop': None, 'max_freq': 1000000000.0, 'max_prop': None, 'low_mem': None, 'by_count': False, 're_filter': False, 'nsg': None, 'replicates': 1000, 'jackknife': 50, 'max_pval': 0.05, 'test_method': 'ttest_ind', 'figfmt': 'pdf', 'heatmap_colors': ('green', 'black', 'red'), 'heatmap_options': "Rowv=T,Colv=T,scale='col',dendrogram='row',labCol=F,trace='none',key=T,key.title=NA,density.info='density',main=NA,xlab='Differential kmers',margins=c(2.5,12)", 'just_core': False, 'disable_ltr': False, 'ltr_detectors': ['ltr_harvest'], 'ltr_finder_options': '-w 2 -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.8', 'ltr_harvest_options': '-seqids yes -similar 80 -vic 10 -seed 20 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6', 'tesorter_options': '-db rexdb -dp2', 'all_ltr': False, 'intact_ltr': False, 'exclude_exchanges': False, 'non_specific': False, 'mu': 1.3e-08, 'disable_ltrtree': False, 'subsample': 1000, 'ltr_domains': ['INT', 'RT', 'RH'], 'trimal_options': '-automated1', 'tree_method': 'FastTree', 'tree_options': '', 'ggtree_options': "branch.length='none', layout='circular'", 'disable_circos': False, 'window_size': 1000000, 'disable_blocks': False, 'aligner': 'minimap2', 'aligner_options': '-x asm20 -n 10', 'min_block': 100000, 'alt_cfgs': None, 'chr_ordered': None, 'ncpu': 4, 'max_memory': '32', 'cleanup': False, 'overwrite': False}
22-12-23 03:08:56 [INFO] Target chromosomes: ['scaffold2', 'scaffold1', 'scaffold8', 'scaffold11', 'scaffold12', 'scaffold3', 'scaffold17', 'scaffold23', 'scaffold24', 'scaffold40', 'scaffold4', 'scaffold22', 'scaffold30', 'scaffold33', 'scaffold39', 'scaffold5', 'scaffold13', 'scaffold16', 'scaffold18', 'scaffold26', 'scaffold6', 'scaffold15', 'scaffold20', 'scaffold32', 'scaffold38', 'scaffold7', 'scaffold14', 'scaffold27', 'scaffold28', 'scaffold29', 'scaffold9', 'scaffold19', 'scaffold21', 'scaffold34', 'scaffold36', 'scaffold10', 'scaffold25', 'scaffold31', 'scaffold35', 'scaffold37']
22-12-23 03:08:56 [INFO] Splitting genomes by chromosome into `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.`
22-12-23 03:09:08 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.split.ok`
22-12-23 03:09:08 [INFO] Chromosomes: ['scaffold2', 'scaffold1', 'scaffold8', 'scaffold11', 'scaffold12', 'scaffold3', 'scaffold17', 'scaffold23', 'scaffold24', 'scaffold40', 'scaffold4', 'scaffold22', 'scaffold30', 'scaffold33', 'scaffold39', 'scaffold5', 'scaffold13', 'scaffold16', 'scaffold18', 'scaffold26', 'scaffold6', 'scaffold15', 'scaffold20', 'scaffold32', 'scaffold38', 'scaffold7', 'scaffold14', 'scaffold27', 'scaffold28', 'scaffold29', 'scaffold9', 'scaffold19', 'scaffold21', 'scaffold34', 'scaffold36', 'scaffold10', 'scaffold25', 'scaffold31', 'scaffold35', 'scaffold37']
22-12-23 03:09:08 [INFO] Chromosome Number: 40
22-12-23 03:09:08 [INFO] CONFIG: [[['scaffold2'], ['scaffold1', 'scaffold8', 'scaffold11', 'scaffold12']], [['scaffold3'], ['scaffold17', 'scaffold23', 'scaffold24', 'scaffold40']], [['scaffold4'], ['scaffold22', 'scaffold30', 'scaffold33', 'scaffold39']], [['scaffold5'], ['scaffold13', 'scaffold16', 'scaffold18', 'scaffold26']], [['scaffold6'], ['scaffold15', 'scaffold20', 'scaffold32', 'scaffold38']], [['scaffold7'], ['scaffold14', 'scaffold27', 'scaffold28', 'scaffold29']], [['scaffold9'], ['scaffold19', 'scaffold21', 'scaffold34', 'scaffold36']], [['scaffold10'], ['scaffold25', 'scaffold31', 'scaffold35', 'scaffold37']]]
22-12-23 03:09:08 [INFO] Genome size: 746,713,351 bp
22-12-23 03:09:08 [INFO] ###Step: Kmer Count
22-12-23 03:09:08 [INFO] Counting kmer by jellyfish
22-12-23 03:09:08 [INFO] Start Pool with 4 process(es)
.
.
.
22-12-23 03:13:26 [INFO] Bootstrap: mean Adjusted Rand-Index: 0.9428; mean V-measure score: 0.9295
22-12-23 03:13:26 [INFO] Subgenome assignments: OrderedDict([('scaffold2', 'SG1'), ('scaffold1', 'SG1'), ('scaffold8', 'SG1'), ('scaffold11', 'SG1'), ('scaffold12', 'SG1'), ('scaffold3', 'SG1'), ('scaffold17', 'SG1'), ('scaffold23', 'SG1'), ('scaffold24', 'SG1'), ('scaffold40', 'SG1'), ('scaffold4', 'SG1'), ('scaffold22', 'SG1'), ('scaffold30', 'SG2'), ('scaffold33', 'SG2'), ('scaffold39', 'SG1'), ('scaffold5', 'SG1'), ('scaffold13', 'SG1'), ('scaffold16', 'SG1'), ('scaffold18', 'SG1'), ('scaffold26', 'SG1'), ('scaffold6', 'SG1'), ('scaffold15', 'SG1'), ('scaffold20', 'SG1'), ('scaffold32', 'SG1'), ('scaffold38', 'SG1'), ('scaffold7', 'SG1'), ('scaffold14', 'SG1'), ('scaffold27', 'SG1'), ('scaffold28', 'SG1'), ('scaffold29', 'SG1'), ('scaffold9', 'SG1'), ('scaffold19', 'SG1'), ('scaffold21', 'SG1'), ('scaffold34', 'SG1'), ('scaffold36', 'SG1'), ('scaffold10', 'SG1'), ('scaffold25', 'SG1'), ('scaffold31', 'SG1'), ('scaffold35', 'SG1'), ('scaffold37', 'SG1')])
22-12-23 03:13:26 [INFO] Outputing `chromosome` - `subgenome` assignments to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.chrom-subgenome.tsv`
22-12-23 03:13:26 [INFO] Outputing significant differiential `kmer` - `subgenome` maps to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.sig.kmer-subgenome.tsv`
22-12-23 03:13:26 [INFO] Start Pool with 4 process(es)
22-12-23 03:13:26 [INFO] 9 significant subgenome-specific kmers
22-12-23 03:13:26 [INFO] 9 SG2-specific kmers
22-12-23 03:13:27 [INFO] run CMD: `Rscript /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.kmer.mat.R`
22-12-23 03:13:27 [INFO] Outputing PCA plot to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.kmer_pca.pdf`
22-12-23 03:13:28 [INFO] Outputing `coordinate` - `subgenome` maps to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.subgenome.bin.count`
22-12-23 03:13:28 [INFO] Start Pool with 4 process(es)
.
.
.
22-12-23 03:14:47 [INFO] Processed 94 sequences
22-12-23 03:14:47 [INFO] 92 (97.87%) sequences contain subgenome-specific kmers
22-12-23 03:14:47 [INFO] 100.00% of 9 subgenome-specific kmers are mapped
22-12-23 03:14:47 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.Nepenthes_gracilis.k15_q200_f2.subgenome.bin.count.ok`
22-12-23 03:14:47 [INFO] Enriching subgenome by chromosome window (size: 1000000)
22-12-23 03:14:47 [INFO] Start Pool with 4 process(es)
.
.
.
22-12-23 03:21:31 [INFO] finished with 0 commands uncompleted
22-12-23 03:21:32 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.scn.ok`
22-12-23 03:21:32 [INFO] 23051 LTRs identified
22-12-23 03:21:32 [INFO] Extracting inner sequences of LTRs to classify by `TEsorter`
22-12-23 03:21:32 [INFO] run CMD: `TEsorter /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.inner.fa -db rexdb -dp2 -p 4 -pre /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.inner.fa -tmp /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR &> /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.inner.fa.tesort.log`
22-12-23 03:39:13 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.tesort.ok`
22-12-23 03:39:13 [INFO] By TEsorter, 13396 (58.1%) are classified as LTRs, of which 5538 (41.3%) are intact with complete protein domains
22-12-23 03:39:13 [INFO] After filtering, 13202 / 23051 (57.3%) LTRs retained
22-12-23 03:39:13 [INFO] Outputing `coordinate` - `LTR` maps to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.ltr.bin.count`
22-12-23 03:39:13 [INFO] Start Pool with 4 process(es)
22-12-23 03:39:23 [INFO] Processed 13202 sequences
22-12-23 03:39:23 [INFO] 204 (1.55%) sequences contain subgenome-specific kmers
22-12-23 03:39:23 [INFO] 44.44% of 9 subgenome-specific kmers are mapped
22-12-23 03:39:25 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.Nepenthes_gracilis.k15_q200_f2.ltr.bin.count.ok`
22-12-23 03:39:25 [INFO] Enriching subgenome-specific LTR-RTs
22-12-23 03:39:25 [INFO] Start Pool with 4 process(es)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
ratios = np.array(row) / np.array(total)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
ratios = np.array(row) / np.array(total)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
ratios = np.array(row) / np.array(total)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
ratios = np.array(row) / np.array(total)
22-12-23 03:39:25 [INFO] Output: /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.ltr.enrich
22-12-23 03:39:25 [INFO] 0 significant subgenome-specific LTR-RTs
22-12-23 03:39:28 [INFO] Summary of overall LTR insertion age (million years):
/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/opt/conda/envs/biotools/bin/subphaser", line 33, in <module>
sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 784, in main
pipeline.run()
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 518, in run
ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 602, in step_ltr
enrich_ltrs = LTR.plot_insert_age(ltrs, d_enriched, prefix,
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 515, in plot_insert_age
d_info = summary_ltr_time(d_data, fout)
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 601, in summary_ltr_time
np.median(xages), abs(np.percentile(xages, 2.5)), np.percentile(xages, 97.5)))
File "<__array_function__ internals>", line 180, in percentile
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
result = _quantile(arr,
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4699, in _quantile
take(arr, indices=-1, axis=DATA_AXIS)
File "<__array_function__ internals>", line 180, in take
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.