Giter Site home page Giter Site logo

zhangrengang / subphaser Goto Github PK

View Code? Open in Web Editor NEW
49.0 3.0 12.0 3.33 MB

Phase, partition and visualize subgenomes of a neoallopolyploid or hybrid based on the subgenome-specific repetitive kmers.

Home Page: https://doi.org/10.1111/nph.18173

License: GNU General Public License v3.0

Python 99.03% Perl 0.17% Shell 0.80%
allopolyploid subgenome kmer partition phasing exchange

subphaser's Issues

The output subgenomes are not paired

Hi~,
I used this software to analyze the subgenome, input pairs of chromosome files, but output 11 chromosomes each and 13 chromosomes each. Is this result correct? What am I to make of this result?

look for you reply!
Hang

Division by zero when trying to build trees?

Hi, when running SubPhaser i get the following error:

Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 779, in main
    pipeline.run()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 516, in run
    ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 615, in step_ltr
    d_files = tree.build(job_args=job_args)
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 210, in build
    ncpus = [max(1, int(self.ncpu*v/tprop)) for v in prop]
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 210, in <listcomp>
    ncpus = [max(1, int(self.ncpu*v/tprop)) for v in prop]
ZeroDivisionError: division by zero

I believe this could be because only one scaffold was identified as a subgenome. Does this sound possible?

Many thanks

Mike

IndexError: cannot do a non-empty take from an empty axes.

Hi, I got the following error with my dataset when I was trying to pre-assign all 40 chromosomes to 2 subgenomes. Apparently, SubPhaser re-assigned all chromosomes to SG1. With a smaller number of assignments, SubPhaser successfully completed in the same genome with a smaller number of homologous chromosome assignments, as you suggested in #7.

22-12-23 03:08:56 [INFO] Version: 1.2.5
22-12-23 03:08:56 [INFO] Arguments: {'genomes': ['/gfe_data/species_genome/Nepenthes_gracilis_male_HiC.fa.gz'], 'sg_cfgs': ['/gfe_data/species_subphaser_cfg/Nepenthes_gracilis_subphaser_cfg.txt'], 'labels': None, 'no_label': True, 'target': None, 'sg_assigned': None, 'sep': '|', 'custom_features': None, 'prefix': 'Nepenthes_gracilis.', 'outdir': 'Nepenthes_gracilis.subphaser', 'tmpdir': 'Nepenthes_gracilis.tmp', 'k': 15, 'min_fold': 2, 'min_freq': 200, 'baseline': 1, 'lower_count': 3, 'min_prop': None, 'max_freq': 1000000000.0, 'max_prop': None, 'low_mem': None, 'by_count': False, 're_filter': False, 'nsg': None, 'replicates': 1000, 'jackknife': 50, 'max_pval': 0.05, 'test_method': 'ttest_ind', 'figfmt': 'pdf', 'heatmap_colors': ('green', 'black', 'red'), 'heatmap_options': "Rowv=T,Colv=T,scale='col',dendrogram='row',labCol=F,trace='none',key=T,key.title=NA,density.info='density',main=NA,xlab='Differential kmers',margins=c(2.5,12)", 'just_core': False, 'disable_ltr': False, 'ltr_detectors': ['ltr_harvest'], 'ltr_finder_options': '-w 2 -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.8', 'ltr_harvest_options': '-seqids yes -similar 80 -vic 10 -seed 20 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6', 'tesorter_options': '-db rexdb -dp2', 'all_ltr': False, 'intact_ltr': False, 'exclude_exchanges': False, 'non_specific': False, 'mu': 1.3e-08, 'disable_ltrtree': False, 'subsample': 1000, 'ltr_domains': ['INT', 'RT', 'RH'], 'trimal_options': '-automated1', 'tree_method': 'FastTree', 'tree_options': '', 'ggtree_options': "branch.length='none', layout='circular'", 'disable_circos': False, 'window_size': 1000000, 'disable_blocks': False, 'aligner': 'minimap2', 'aligner_options': '-x asm20 -n 10', 'min_block': 100000, 'alt_cfgs': None, 'chr_ordered': None, 'ncpu': 4, 'max_memory': '32', 'cleanup': False, 'overwrite': False}
22-12-23 03:08:56 [INFO] Target chromosomes: ['scaffold2', 'scaffold1', 'scaffold8', 'scaffold11', 'scaffold12', 'scaffold3', 'scaffold17', 'scaffold23', 'scaffold24', 'scaffold40', 'scaffold4', 'scaffold22', 'scaffold30', 'scaffold33', 'scaffold39', 'scaffold5', 'scaffold13', 'scaffold16', 'scaffold18', 'scaffold26', 'scaffold6', 'scaffold15', 'scaffold20', 'scaffold32', 'scaffold38', 'scaffold7', 'scaffold14', 'scaffold27', 'scaffold28', 'scaffold29', 'scaffold9', 'scaffold19', 'scaffold21', 'scaffold34', 'scaffold36', 'scaffold10', 'scaffold25', 'scaffold31', 'scaffold35', 'scaffold37']
22-12-23 03:08:56 [INFO] Splitting genomes by chromosome into `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.`
22-12-23 03:09:08 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.split.ok`
22-12-23 03:09:08 [INFO] Chromosomes: ['scaffold2', 'scaffold1', 'scaffold8', 'scaffold11', 'scaffold12', 'scaffold3', 'scaffold17', 'scaffold23', 'scaffold24', 'scaffold40', 'scaffold4', 'scaffold22', 'scaffold30', 'scaffold33', 'scaffold39', 'scaffold5', 'scaffold13', 'scaffold16', 'scaffold18', 'scaffold26', 'scaffold6', 'scaffold15', 'scaffold20', 'scaffold32', 'scaffold38', 'scaffold7', 'scaffold14', 'scaffold27', 'scaffold28', 'scaffold29', 'scaffold9', 'scaffold19', 'scaffold21', 'scaffold34', 'scaffold36', 'scaffold10', 'scaffold25', 'scaffold31', 'scaffold35', 'scaffold37']
22-12-23 03:09:08 [INFO] Chromosome Number: 40
22-12-23 03:09:08 [INFO] CONFIG: [[['scaffold2'], ['scaffold1', 'scaffold8', 'scaffold11', 'scaffold12']], [['scaffold3'], ['scaffold17', 'scaffold23', 'scaffold24', 'scaffold40']], [['scaffold4'], ['scaffold22', 'scaffold30', 'scaffold33', 'scaffold39']], [['scaffold5'], ['scaffold13', 'scaffold16', 'scaffold18', 'scaffold26']], [['scaffold6'], ['scaffold15', 'scaffold20', 'scaffold32', 'scaffold38']], [['scaffold7'], ['scaffold14', 'scaffold27', 'scaffold28', 'scaffold29']], [['scaffold9'], ['scaffold19', 'scaffold21', 'scaffold34', 'scaffold36']], [['scaffold10'], ['scaffold25', 'scaffold31', 'scaffold35', 'scaffold37']]]
22-12-23 03:09:08 [INFO] Genome size: 746,713,351 bp
22-12-23 03:09:08 [INFO] ###Step: Kmer Count
22-12-23 03:09:08 [INFO] Counting kmer by jellyfish
22-12-23 03:09:08 [INFO] Start Pool with 4 process(es)
.
.
.
22-12-23 03:13:26 [INFO] Bootstrap: mean Adjusted Rand-Index: 0.9428; mean V-measure score: 0.9295
22-12-23 03:13:26 [INFO] Subgenome assignments: OrderedDict([('scaffold2', 'SG1'), ('scaffold1', 'SG1'), ('scaffold8', 'SG1'), ('scaffold11', 'SG1'), ('scaffold12', 'SG1'), ('scaffold3', 'SG1'), ('scaffold17', 'SG1'), ('scaffold23', 'SG1'), ('scaffold24', 'SG1'), ('scaffold40', 'SG1'), ('scaffold4', 'SG1'), ('scaffold22', 'SG1'), ('scaffold30', 'SG2'), ('scaffold33', 'SG2'), ('scaffold39', 'SG1'), ('scaffold5', 'SG1'), ('scaffold13', 'SG1'), ('scaffold16', 'SG1'), ('scaffold18', 'SG1'), ('scaffold26', 'SG1'), ('scaffold6', 'SG1'), ('scaffold15', 'SG1'), ('scaffold20', 'SG1'), ('scaffold32', 'SG1'), ('scaffold38', 'SG1'), ('scaffold7', 'SG1'), ('scaffold14', 'SG1'), ('scaffold27', 'SG1'), ('scaffold28', 'SG1'), ('scaffold29', 'SG1'), ('scaffold9', 'SG1'), ('scaffold19', 'SG1'), ('scaffold21', 'SG1'), ('scaffold34', 'SG1'), ('scaffold36', 'SG1'), ('scaffold10', 'SG1'), ('scaffold25', 'SG1'), ('scaffold31', 'SG1'), ('scaffold35', 'SG1'), ('scaffold37', 'SG1')])
22-12-23 03:13:26 [INFO] Outputing `chromosome` - `subgenome` assignments to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.chrom-subgenome.tsv`
22-12-23 03:13:26 [INFO] Outputing significant differiential `kmer` - `subgenome` maps to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.sig.kmer-subgenome.tsv`
22-12-23 03:13:26 [INFO] Start Pool with 4 process(es)
22-12-23 03:13:26 [INFO] 9 significant subgenome-specific kmers
22-12-23 03:13:26 [INFO] 	9 SG2-specific kmers
22-12-23 03:13:27 [INFO] run CMD: `Rscript /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.kmer.mat.R`
22-12-23 03:13:27 [INFO] Outputing PCA plot to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.kmer_pca.pdf`
22-12-23 03:13:28 [INFO] Outputing `coordinate` - `subgenome` maps to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.subgenome.bin.count`
22-12-23 03:13:28 [INFO] Start Pool with 4 process(es)
.
.
.
22-12-23 03:14:47 [INFO] Processed 94 sequences
22-12-23 03:14:47 [INFO] 92 (97.87%) sequences contain subgenome-specific kmers
22-12-23 03:14:47 [INFO] 100.00% of 9 subgenome-specific kmers are mapped
22-12-23 03:14:47 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.Nepenthes_gracilis.k15_q200_f2.subgenome.bin.count.ok`
22-12-23 03:14:47 [INFO] Enriching subgenome by chromosome window (size: 1000000)
22-12-23 03:14:47 [INFO] Start Pool with 4 process(es)
.
.
.
22-12-23 03:21:31 [INFO] finished with 0 commands uncompleted
22-12-23 03:21:32 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.scn.ok`
22-12-23 03:21:32 [INFO] 23051 LTRs identified
22-12-23 03:21:32 [INFO] Extracting inner sequences of LTRs to classify by `TEsorter`
22-12-23 03:21:32 [INFO] run CMD: `TEsorter /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.inner.fa -db rexdb -dp2 -p 4 -pre /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.inner.fa -tmp /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR &> /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.inner.fa.tesort.log`
22-12-23 03:39:13 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.LTR.tesort.ok`
22-12-23 03:39:13 [INFO] By TEsorter, 13396 (58.1%) are classified as LTRs, of which 5538 (41.3%) are intact with complete protein domains
22-12-23 03:39:13 [INFO] After filtering, 13202 / 23051 (57.3%) LTRs retained
22-12-23 03:39:13 [INFO] Outputing `coordinate` - `LTR` maps to `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.ltr.bin.count`
22-12-23 03:39:13 [INFO] Start Pool with 4 process(es)
22-12-23 03:39:23 [INFO] Processed 13202 sequences
22-12-23 03:39:23 [INFO] 204 (1.55%) sequences contain subgenome-specific kmers
22-12-23 03:39:23 [INFO] 44.44% of 9 subgenome-specific kmers are mapped
22-12-23 03:39:25 [INFO] New check point file: `/gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.tmp/Nepenthes_gracilis.Nepenthes_gracilis.k15_q200_f2.ltr.bin.count.ok`
22-12-23 03:39:25 [INFO] Enriching subgenome-specific LTR-RTs
22-12-23 03:39:25 [INFO] Start Pool with 4 process(es)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
  ratios = np.array(row) / np.array(total)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
  ratios = np.array(row) / np.array(total)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
  ratios = np.array(row) / np.array(total)
/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Stats.py:157: RuntimeWarning: invalid value encountered in divide
  ratios = np.array(row) / np.array(total)
22-12-23 03:39:25 [INFO] Output: /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.ltr.enrich
22-12-23 03:39:25 [INFO] 0 significant subgenome-specific LTR-RTs
22-12-23 03:39:28 [INFO] Summary of overall LTR insertion age (million years):
/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 784, in main
    pipeline.run()
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 518, in run
    ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 602, in step_ltr
    enrich_ltrs = LTR.plot_insert_age(ltrs, d_enriched, prefix, 
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 515, in plot_insert_age
    d_info = summary_ltr_time(d_data, fout)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 601, in summary_ltr_time
    np.median(xages), abs(np.percentile(xages, 2.5)), np.percentile(xages, 97.5)))
  File "<__array_function__ internals>", line 180, in percentile
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4166, in percentile
    return _quantile_unchecked(
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
    r, k = _ureduce(a,
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
    r = func(a, **kwargs)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4699, in _quantile
    take(arr, indices=-1, axis=DATA_AXIS)
  File "<__array_function__ internals>", line 180, in take
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

Error in os.link(figfile, dstfig)

I came across a seemingly rare error, which occurred when I tried to run SubPhaser installed to a Singularity container on macOS using Vagrant. This is not a big issue to me because it didn't happen when I used the same container on my main environment on a Linux server, but I would like to report it here.

I found a similar problem on hard links reported elsewhere:
https://neurostars.org/t/qsiprep-raw-src-qc-os-link-self-inputs-src-file-linked-src-file-permissionerror-errno-1-operation-not-permitted/19076

(Please note that I used Arabidopsis thaliana as input only for a testing purpose)

22-12-22 12:11:09 [INFO] ###Step: Circos
22-12-22 12:11:09 [INFO] Limit memory 4.6G per process with total memory 1.2
22-12-22 12:11:09 [INFO] Using 1 processes to align chromosome sequences
22-12-22 12:11:09 [INFO] Check point file: `/gfe_data/tmp/1_Arabidopsis_thaliana/Arabidopsis_thaliana.tmp/Arabidopsis_thaliana.Blocks/Chr1-Chr2.paf.ok` exists; skip this step
22-12-22 12:11:09 [INFO] Start Pool with 1 process(es)
22-12-22 12:11:10 [INFO] Copy `/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/circos` to `/gfe_data/tmp/1_Arabidopsis_thaliana/Arabidopsis_thaliana.subphaser/`
using cutoff: upper 43576.5 for SG1
using cutoff: upper 694.5 for SG2
22-12-22 12:11:12 [INFO] run CMD: `cd /gfe_data/tmp/1_Arabidopsis_thaliana/Arabidopsis_thaliana.subphaser/Arabidopsis_thaliana.k15_q200_f2.circos && circos -conf ./circos.conf`
Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 784, in main
    pipeline.run()
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 524, in run
    self.step_circos(
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 684, in step_circos
    Circos.circos_plot(self.chromfiles, wkdir, *args, **kargs)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/Circos.py", line 515, in circos_plot
    os.link(figfile, dstfig)
PermissionError: [Errno 1] Operation not permitted: '/gfe_data/tmp/1_Arabidopsis_thaliana/Arabidopsis_thaliana.subphaser/Arabidopsis_thaliana.k15_q200_f2.circos/circos.png' -> '/gfe_data/tmp/1_Arabidopsis_thaliana/Arabidopsis_thaliana.subphaser/Arabidopsis_thaliana.k15_q200_f2.circos.png'

Only one pair of homologous chromosomes were not phased

Hi~
SubPhaser is a great piece of software, I have suffered some problems when I use this software to phase my diploid genome.

After the previous hic scaffolding, I got 22 superscaffolds, then I want to divide these scaffolds into 2 parts(2n=2x=22).
3c28a7d68e52191977cb542b72062b2
so I used this SubPhaser(-k 17 -q 50 -f 1.5), then 20 superscaffolds were phased and only 1 pair of homologous scaffolds(scaffold_9 and scaffold_10) were not phased.
0d8d5c8a695e265d13d28060e73c722
k17_q50_f1.5.kmer_freq.pdf
k17_q50_f1.5.kmer_pca.pdf
k17_q50_f1.5.ltr.insert.density.pdf

How can I solve this problem?
Looking forward to your reply!
Yang

ValueError: n_components=3 must be between 0 and min

Dear Writer !
Thanks for your useful pipline, but I meet an error when use SubPhaser.
My command
subphaser -i groups_genome.fasta -c groups_sg.config
My configue file
image
But I have the error
image
What's wrong with me ?

Singularity container fails if environmental variable `R_LIBS_USER` is set

Hi!

I was able to finish the pipeline Singularity version but had to reset the path to R libraries manually (I have a custom R library path set in my .bashrc). The following was sufficient:

export R_LIBS_USER=/share/home/app/bin/miniconda3/envs/SubPhaser/lib/R/library/

Maybe it's worth adding that variable to the container recipe.

Also, the mafft stage fails if $TMPDIR is in other path than /tmp (was /scratch in my case), I had to specify the bind path manually when running the container. (Could be fixed by adding $TMPDIR to default bindpaths or SINGULARITY_BIND variable?)

Thanks again!
Nikita

Can't install SubPhaser: : Found conflicts! Looking for incompatible packages.

Dear Dr Zhang
Thanks for developing this useful tool.
Unfortunately, I was stuck in the installation step of SubPhaser.
When I run
conda env create -f SubPhaser.yaml
The conda environment can not be set up and errors like
`Collecting package metadata (repodata.json): done
Solving environment: Found conflicts! Looking for incompatible packages.
...
The following specifications were found to be incompatible with your system:

  • feature:/linux-64::__glibc==2.17=0
  • feature:|@/linux-64::__glibc==2.17=0
  • biopython==1.79=py38h497a2fe_0 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  • blast==2.11.0=pl526he19e7b1_0 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
    ...
    Your installed version is: 2.17
    `
    Looking forward to your responses.

Is there a limitation on chromosome counts?

Hi Rengang,

Thanks for your pipeline ! SubPhaser is very useful for our project.

I wonder whether the chromosome counts is limit or not for Subphaser? Because my species has a huge chromosome numbers.

best,

Cheng

Invalid specifier: '>=3.6:'

With python 3.9, I got the following error with python setup.py install when installing the latest SubPhaser.

> python setup.py install
error in subphaser setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6:'

The installation worked well after replacing python_requires='>=3.6:' with python_requires='>=3.6' in setup.py.

Installation problem

When I use conda to install this software, it prompts me as follows:
Grid computing is not available because DRMAA not configured properly: Could not find drmaa library.
How can I solve this problem?
Many thanks.

ValueError: 0 kmer with fold > 2. Please reset the filter options.

When I analyze using the default parameters, the following error occurs. What should I set the kmer parameters to?

23-12-30 15:24:20 [INFO] After filtering, remained 0 (0.00%) differential (freq >= 200) and 0 (0.00%) candidate (freq > 0) kmers
Traceback (most recent call last):
File "/home/zuozd/miniconda3/envs/SubPhaser/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/home/zuozd/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/home/zuozd/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 422, in run
d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets,
File "/home/zuozd/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 502, in filter
raise ValueError('0 kmer with fold > {}. Please reset the filter options.'.format(min_fold))
ValueError: 0 kmer with fold > 2. Please reset the filter options.

THANK YOU!

No differential kmers

Hi,

I am trying to use the SubPhaser to phase the subgenomes of my species. However, after filtering differential kmers, no differential kmers were remained. I used parameter of this -k 15 -q 50 -f 2 . The same result is achieved even if I continue to reduce -k and -q.

23-12-27 17:32:21 [INFO] 125035 kmers in total 23-12-27 17:32:21 [INFO] Filtering differential kmers 23-12-27 17:32:22 [INFO] Start Pool with 112 process(es) 23-12-27 17:32:25 [INFO] After filtering, remained 0 (0.00%) differential (freq >= 25) and 0 (0.00%) candidate (freq > 0) kmers

Do you have any suggestions?

thanks,
Chen

三倍体基因组

14a9c9ddaefa300032617119cf4534e
老师,您好
三倍体基因组划分成这样了,单套17条,还能有改善吗

Too few markers

Hi,

I am trying to use the SubPhaser to phase the subgenomes of my species. The parental species are unknown and I built a quite good chromosomal assembly with 99% of BUSCOs
k13_q100_f2.0.circos.pdf
complete. I named the subgenomes after synteny analysis with a close species sorghum bicolor. When I tried to use Subphaser, I managed to phase the subgenomes, but there seems to be very few kmer markers, and no ltr was found- much less than the numbers in your example files. I used parameter of this -k 13 -q 100 -f 2 -disable_ltr
Is this result trustworthy? What could be the reason?

thanks,
Cui

Suggestion for specific settings to improve subphasing

Hi
Thanks a lot for this awesome tool!

I am trying to phase an allopentaploid genome which we expect to have 4 subgenomes. Although the clustering works very well, I having trouble to adjust the settings to get the fours subgenomes correctly identified. Suphaser identifies normally 3 subgenomes, but if I set -nsg 4 it does not identify correctly the 4th subgenome based on the clustering but it splits one subgenome wrongly. Please below.

Using -nsg 3:
image

Using -nsg 4:
image

Using only the set of chromosomes from S1/2 and s3 from the two subgnomes that should be split:
image

Ideally, I would like to have in one run the 4 subgenomes correctly identified and split. Any suggestions are welcome!
Best
André

Unbalanced of chromosomes number and differential kmers number among subgenomes

Hi~,
I have got a whole new set of problems now:
image

there is abnormally few subgenome-specific kmers in one of subgenomes, and the numbers of assigned chromosomes among subgenomes are abnormally unbalanced. I have also tried -k ( 8,13,15,17,22,27,33,37,45,50), -q (10,200,600,1000), -f(1.5,2), but failed to deal with this problem. Have you any suggestions about that?

ModuleNotFoundError: No module named 'TEsorter'

Hi, I got module error (No. 1) when I run subphaser.
To solve the error, I installed TEsorter using new and old school methods, but subphaser can't find that module.
When I run from TEsorter.app import CommonClassifications in python3, I got ImportError (No. 2).
Could you give me the solution?

Thank you !
Jung

(No. 1)
(SubPhaser)$ subphaser

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/subphaser-1.2.6-py3.8.egg/subphaser/LTR.py", line 9, in
from TEsorter.app import CommonClassifications
ModuleNotFoundError: No module named 'TEsorter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/subphaser", line 11, in
load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')()
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 490, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2854, in load_entry_point
return ep.load()
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2445, in load
return self.resolve()
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2451, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/local/lib/python3.8/dist-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 15, in
from . import LTR
File "/usr/local/lib/python3.8/dist-packages/subphaser-1.2.6-py3.8.egg/subphaser/LTR.py", line 11, in
from .api.TEsorter.app import CommonClassifications
File "/usr/local/lib/python3.8/dist-packages/subphaser-1.2.6-py3.8.egg/subphaser/api/TEsorter/app.py", line 37, in
from .modules.get_record import get_records
File "/usr/local/lib/python3.8/dist-packages/subphaser-1.2.6-py3.8.egg/subphaser/api/TEsorter/modules/get_record.py", line 6, in
from TEsorter.modules.small_tools import open_file as open
ModuleNotFoundError: No module named 'TEsorter'

(No. 2)

from TEsorter.app import CommonClassifications
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'CommonClassifications' from 'TEsorter.app' (/home/super/miniconda3/envs/SubPhaser/lib/python3.10/site-packages/TEsorter/app.py)

Getting location of subgenome specific TEs

Hi! Thanks for the great tool! I was wondering if one could get the genomic location of subgenome-specific TE or TE k-mer. My idea is to take a look at coding regions that are upstream and downstream to subgenome-specific TEs.

My current approach is to take a look at the 'k15_q200_f2.ltr.enrich' file in the phase-results folder and look for specific k-mers that are found in one subgenome (column 2) and that have no potential exchange among subgenomes (column 5). Once I identify k-mers that fulfill those requirements I was going to look for the genomic position of those k-mers in the 'LTR.inner.fa.dom.gff3' file that is in the tmp directory. Is that approach correct? or should I be taking a look at other output files?

Thank you in advance!!
Bests,
Emiliano

kmer 13 or less gives a lot of broken pipe errors

Thank you for developing this pipeline.

I've noticed that for my allotetraploid species, while -k 15, -k 14 works fine, once I try -k 13 or -k 12, there are a lot of broken pipeline issues. A lot of the underlying python scripts will start having errors.

I was wondering if I could talk to someone about this? Thanks!

ValueError: All singletons are not allowed

when I use the command subphaser -i brg.fa -c brg.txt ,it show the error ValueError: All singletons are not allowed

how can I slove? Thanks!

Traceback (most recent call last):
File "/home/zuozd/miniconda3/envs/SubPhaser/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/home/zuozd/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/home/zuozd/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 422, in run
d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets,
File "/home/zuozd/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 479, in filter
raise ValueError('All singletons are not allowed')
ValueError: All singletons are not allowed

Failed to install SubPhaser

Hi~
When i type 'conda env create -f SubPhaser.yaml', an error showed:
图片
Do u known how to solve this?
Best wishes.

IndexError: index -1 is out of bounds for axis 0 with size 0

Hi, Thanks for developing the tool. I tried the example of ginger and successfully procressed. But When I used my own triploid genome (3n=63), I met an error. My config file is as follow:
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 21 22
23 24 25
27 28 29
30 31 32
33 34 35
36 37 38
39 40 41
42 43 45
47 48 49
50 51 52
53 54 59
60 64 65
67 68 71
72 73 74
75 76 77

The command was 'subphaser -i ref.fa -c config.txt -pre out', The I get the error like this:

22-06-02 16:24:49 [INFO] Summary of overall LTR insertion age (million years):
/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/home/wangyue/software/miniconda2/envs/SubPhaser/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/main.py", line 779, in main
pipeline.run()
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/main.py", line 516, in run
ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/main.py", line 600, in step_ltr
enrich_ltrs = LTR.plot_insert_age(ltrs, d_enriched, prefix, shared=d_shared,
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 513, in plot_insert_age
d_info = summary_ltr_time(d_data, fout)
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 578, in summary_ltr_time
np.median(xages), abs(np.percentile(xages, 2.5)), np.percentile(xages, 97.5)))
File "<array_function internals>", line 5, in percentile
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3867, in percentile
return _quantile_unchecked(
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3564, in _ureduce
r = func(a, **kwargs)
File "/home/wangyue/software/miniconda2/envs/SubPhaser/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4098, in _quantile_ureduce_func
n = np.isnan(ap[-1])
IndexError: index -1 is out of bounds for axis 0 with size 0

And the results in the file "outk15_q200_f2.chrom-subgenome.tsv" showed different number of chromosomes for each genotype.

I don't know where is my problem. Can you give me any advises? Thanks a lot

`TEsorter` cannot find `rexdb` in Singularity container

Hi and thanks for the tool! It looks very promising.

I had to use Singularity because of some cluster vs. conda Qt conflicts that I could not resolve. However, with Singularity I found myself unable to proceed beyond the TEsorter stage because of the following error:

Apptainer> cat /netscratch/dep_mercier/grp_novikova/software/SubPhaser/example_data/tmp/LTR.inner.fa.tesort.log
2023-12-08 17:33:23,593 -INFO- VARS: {'sequence': '/netscratch/dep_mercier/grp_novikova/software/SubPhaser/example_data/tmp/LTR.inner.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': '/netscratch/dep_mercier/grp_novikova/software/SubPhaser/example_data/tmp/LTR.inner.fa', 'force_write_hmmscan': False, 'processors': 48, 'tmp_dir': '/netscratch/dep_mercier/grp_novikova/software/SubPhaser/example_data/tmp/LTR', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': True, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False}
2023-12-08 17:33:23,594 -INFO- checking dependencies:
Traceback (most recent call last):
  File "/share/home/app/bin/miniconda3/envs/SubPhaser/bin/TEsorter", line 10, in <module>
    sys.exit(main())
  File "/share/home/app/bin/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/TEsorter/app.py", line 1014, in main
    pipeline(Args())
  File "/share/home/app/bin/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/TEsorter/app.py", line 145, in pipeline
    Dependency().check_hmmer(db=DB[args.hmm_database])
  File "/share/home/app/bin/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/TEsorter/app.py", line 952, in check_hmmer
    dp_version = self.get_hmm_version(db)[:3]
  File "/share/home/app/bin/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/TEsorter/app.py", line 967, in get_hmm_version
    line = open(db).readline()
**FileNotFoundError: [Errno 2] No such file or directory: '/share/home/app/bin/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm'**

Turns out databases are not loaded at all:

Apptainer> ls /share/home/app/bin/miniconda3/envs/SubPhaser/lib/python3.8/site-packages/TEsorter/
__init__.py  __main__.py  app.py       modules/     version.py

How would I fix that?

Cheers,
Nikita

matplotlib raise RuntimeError ('Invalid DISPLAY variable')

Hi, when plotting the kmer_freq, it reported errors like this:

"23-09-13 23:23:33 [INFO] Plot k15_q200_f2.kmer_freq.pdf
Traceback (most recent call last):
  File "~/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/__main__.py", line 790, in main
    pipeline.run()
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/__main__.py", line 415, in run
    d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets, 
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 504, in filter
    plot_histogram(tot_freqs, outfig, vline=None)
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 647, in plot_histogram
    plt.figure(figsize=(7,5), dpi=300, tight_layout=True)
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/pyplot.py", line 797, in figure
    manager = new_figure_manager(
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/pyplot.py", line 316, in new_figure_manager
    return _backend_mod.new_figure_manager(*args, **kwargs)
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 3545, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 3550, in new_figure_manager_given_figure
    canvas = cls.FigureCanvas(figure)
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/backends/backend_qt5agg.py", line 21, in __init__
    super().__init__(figure=figure)
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/backends/backend_qt5.py", line 213, in __init__
    _create_qApp()
  File "~/.conda/envs/SubPhaser/lib/python3.8/site-packages/matplotlib/backends/backend_qt5.py", line 108, in _create_qApp
    raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable"

how can I solve it?

亚基因组分析

张老师, 您好!正在用subphaser分一个异源四倍体的AB亚基因组,得到初步结果,请您帮忙看看。装出来的基因组共22条染色体,在subphaser分析后,一组为12条,另一组10条,目前这里比较迷惑,请您指点。结果如下图:
k15_q50_f2 0 circos
1709383368373
image
image

cannot allocate memory

Hi,

Thanks a lot for the very nice tool!

I am trying to phase the subgenomes from this hexaploid haplotype-phased genome (9Gb), but somehow I always get stuck with the error message cannot allocate memory, despite changing the memory option several times... Any help with that is appreciated.

Cheers
André
...
24-01-25 07:23:35 [INFO] Loading kmer matrix from jellyfish
24-01-25 07:23:35 [INFO] Start Pool with 40 process(es)
24-01-25 07:23:57 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_53.fasta_15.fa
24-01-25 07:28:54 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_60.fasta_15.fa
24-01-25 07:29:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_5.fasta_15.fa
24-01-25 07:30:13 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_57.fasta_15.fa
24-01-25 07:30:47 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_61.fasta_15.fa
24-01-25 07:30:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_54.fasta_15.fa
24-01-25 07:31:00 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_22.fasta_15.fa
24-01-25 07:31:36 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_50.fasta_15.fa
24-01-25 07:31:46 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_52.fasta_15.fa
24-01-25 07:32:25 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_48.fasta_15.fa
24-01-25 07:32:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_42.fasta_15.fa
24-01-25 07:32:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_47.fasta_15.fa
24-01-25 07:32:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_55.fasta_15.fa
24-01-25 07:32:49 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_4.fasta_15.fa
24-01-25 07:33:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_35.fasta_15.fa
24-01-25 07:33:47 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_40.fasta_15.fa
24-01-25 07:33:53 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_25.fasta_15.fa
24-01-25 07:34:02 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_27.fasta_15.fa
24-01-25 07:34:12 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_38.fasta_15.fa
24-01-25 07:34:22 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_37.fasta_15.fa
24-01-25 07:35:11 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_41.fasta_15.fa
24-01-25 07:35:17 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_26.fasta_15.fa
24-01-25 07:35:28 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_33.fasta_15.fa
24-01-25 07:35:40 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_65.fasta_15.fa
24-01-25 07:35:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_28.fasta_15.fa
24-01-25 07:36:01 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_7.fasta_15.fa
24-01-25 07:36:12 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_17.fasta_15.fa
24-01-25 07:36:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_36.fasta_15.fa
24-01-25 07:36:32 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_30.fasta_15.fa
24-01-25 07:36:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_14.fasta_15.fa
24-01-25 07:37:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_18.fasta_15.fa
24-01-25 07:37:57 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_63.fasta_15.fa
24-01-25 07:38:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_1.fasta_15.fa
24-01-25 07:38:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_16.fasta_15.fa
24-01-25 07:38:27 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_31.fasta_15.fa
24-01-25 07:38:36 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_12.fasta_15.fa
24-01-25 07:38:49 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_11.fasta_15.fa
24-01-25 07:39:01 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_62.fasta_15.fa
24-01-25 07:39:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_23.fasta_15.fa
24-01-25 07:39:18 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_64.fasta_15.fa
24-01-25 07:39:23 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_66.fasta_15.fa
24-01-25 07:39:37 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_39.fasta_15.fa
24-01-25 07:39:55 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_15.fasta_15.fa
24-01-25 07:40:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_3.fasta_15.fa
24-01-25 07:40:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_21.fasta_15.fa
24-01-25 07:40:29 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_24.fasta_15.fa
24-01-25 07:41:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_29.fasta_15.fa
24-01-25 07:41:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_34.fasta_15.fa
24-01-25 07:41:40 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_32.fasta_15.fa
24-01-25 07:41:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_56.fasta_15.fa
24-01-25 07:42:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_8.fasta_15.fa
24-01-25 07:42:20 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_9.fasta_15.fa
24-01-25 07:42:32 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_10.fasta_15.fa
24-01-25 07:42:43 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_13.fasta_15.fa
24-01-25 07:42:55 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_2.fasta_15.fa
24-01-25 07:43:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa
24-01-25 07:43:20 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa
24-01-25 07:43:30 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa
24-01-25 07:43:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa
24-01-25 07:43:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa
24-01-25 07:43:51 [INFO] 62557073 kmers in total
24-01-25 07:43:51 [INFO] Filtering differential kmers
Traceback (most recent call last):
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 422, in run
d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 487, in filter
for kmer, freqs, tot_freq in pool_func(_filter_kmer, args, self.ncpu,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/RunCmdsMP.py", line 336, in pool_func
pool = multiprocessing.Pool(processors)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Arabidopsis_suecica_LTR.inner.fa.cls.tsv not found

Thank you for developing this excellent tool! I installed SubPhaser in a singularity container with the minimal dependencies listed in #10 (comment) and tried to run the Arabidopsis test dataset, but an error occurred. Another run with my own dataset stopped with the same error message. I would be grateful if you could provide me with potential solutions.

22-12-18 06:54:20 [INFO] finished with 0 commands uncompleted
22-12-18 06:54:20 [INFO] New check point file: `/home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR.scn.ok`
22-12-18 06:54:20 [INFO] 5566 LTRs identified
22-12-18 06:54:20 [INFO] Extracting inner sequences of LTRs to classify by `TEsorter`
22-12-18 06:54:20 [INFO] run CMD: `TEsorter /home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR.inner.fa -db rexdb -dp2 -p 128 -pre /home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR.inner.fa -tmp /home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR &> /home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR.inner.fa.tesort.log`
22-12-18 06:54:56 [INFO] New check point file: `/home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR.tesort.ok`
Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 784, in main
    pipeline.run()
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 518, in run
    ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 556, in step_ltr
    ltrs, ltrfile = pipeline.run()
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 335, in run
    d_class = self.classfify(ltrs)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 397, in classfify
    for classification in CommonClassifications(clsfile):
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/api/TEsorter/app.py", line 339, in _parse
    for i, line in enumerate(open(self.clsfile)):
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/xopen/__init__.py", line 1291, in xopen
    opened_file = open(filename, mode, **text_mode_kwargs)  # type: ignore
FileNotFoundError: [Errno 2] No such file or directory: '/home/kfuku/docker_img/gfe/usr/local/bin/SubPhaser/example_data/Arabidopsis_suecica_tmp/Arabidopsis_suecica_LTR.inner.fa.cls.tsv'

Used for contig-level assembly

Hi, Thanks for developing such a useful tool. I wonder if it can be used for contig-level assembly.
Thank you for your reply in advance~

Changing mutation rate

Hello,

I'm curious why the ***.ltr.insert.density.pdf plot densities change in an order of magnitude if you change the mutation rate, but the ***.ltr.insert.histo.pdf remains the same. I anticipated that only the x-axis would change with the new mutation rate, but somehow the density (y-axis) of LTRs change as well. I attached the small genome example ran as default and with a different mutation rate of -mu 6.7e-09.

I used the default example script then I ran this:

prefix=Arabidopsis_suecica
DT=date +"%y%m%d%H%M"
options="-pre ${prefix}_" # to avoid conflicts
subphaser -i ${prefix}_genome.fasta.gz -c ${prefix}_sg.config -max_memory 128G -disable_circos -intact_ltr -mu 1.75e-09 $options 2&gt;&amp;1 | tee ${prefix}.log.$DT

I checked without the -intact_ltr flag and the results are the same.

Any insights would be greatly appreciated.

Arabidopsis_suecica_k15_q200_f2.ltr.insert.histo.default.pdf
Arabidopsis_suecica_k15_q200_f2.ltr.insert.density.1.75.pdf
Arabidopsis_suecica_k15_q200_f2.ltr.insert.histo.1.75.pdf
Arabidopsis_suecica_k15_q200_f2.ltr.insert.density.default.pdf

Thank you.
Crystal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.