Comments (13)
How much is the RAM of your computer?
from subphaser.
That's the thing. I am running from a cluster with 500G RAM and 64 threads
from subphaser.
How about the peak memory? Surely the large genome require large memory, but I can run the wheat genome (14Gb, 140M kmers, 21 chromosomes) with 1Tb RAM.
If it actually exceed the 500G RAM, you may try to increase -lower_count
to reduce kmers, or reduce the chromosomes in the config file. If necessary, you may try to decrease the chunksize
in /netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py
from:
# matrix
logger.info('Loading kmer matrix from jellyfish') # multiprocessing by kmer
chunksize = None if self.pool_method == 'map' else 20000
to
# matrix
logger.info('Loading kmer matrix from jellyfish') # multiprocessing by kmer
chunksize = None if self.pool_method == 'map' else 200
By the way, if your hexaploid is an autohexaploid, there's no reason to waste time to try subphaser
.
from subphaser.
Thanks! I will try on our HPC cluster with 1TB or adjust the parameters as you suggested. How long should the whole run take?
I am not sure is a autopolyploid, there is some evidence for a hybdrid between autotetra with a diploid.
from subphaser.
In general 1-2 days is needed for the large genome.
from subphaser.
somehow is strange... running on our HPC node with 1TB the job exits with>
Resource usage summary:
CPU time : 2301.09 sec.
Max Memory : 69212 MB
Max Swap : 754180 MB
Max Processes : 44
Max Threads : 48
from subphaser.
Are you using SLURM which limits Memory according to Processes?
from subphaser.
We use LSF, but I set the memory limit to 980G, and still exits. But it seems that the max memory set was not even reached before it exits.
from subphaser.
It is strange. You may try to reduce the -cpu
set to 1 to see the memory cost.
from subphaser.
it did advance a bit, but still failed.
24-02-04 08:21:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa
24-02-04 08:22:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa
24-02-04 08:22:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa
24-02-04 08:22:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa
24-02-04 08:22:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa
24-02-04 08:22:48 [INFO] 62557073 kmers in total
24-02-04 08:22:48 [INFO] Filtering differential kmers
24-02-04 08:22:48 [INFO] Start Pool with 1 process(es)
24-02-04 08:28:46 [INFO] Processed 10000000 kmers
24-02-04 08:34:51 [INFO] Processed 20000000 kmers
24-02-04 08:40:59 [INFO] Processed 30000000 kmers
24-02-04 08:47:03 [INFO] Processed 40000000 kmers
24-02-04 08:52:35 [INFO] Processed 50000000 kmers
24-02-04 08:58:40 [INFO] Processed 60000000 kmers
24-02-04 09:00:08 [INFO] After filtering, remained 4 (0.00%) differential (freq >= 200) and 56 (0.00%) candidate (freq > 0) kmers
24-02-04 09:00:08 [INFO] Plot /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_freq.pdf
24-02-04 09:00:44 [INFO] New check point file: /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_CBC_k15_q200_f2.kmer.mat.ok
24-02-04 09:00:44 [INFO] ###Step: Cluster
24-02-04 09:00:44 [INFO] Performing bootstrap of 1000 replicates, with each replicate resampling 50% data with replacement
24-02-04 09:01:29 [INFO] Bootstrap: mean Adjusted Rand-Index: 0.9635; mean V-measure score: 0.9538
24-02-04 09:01:29 [INFO] Subgenome assignments: OrderedDict([('scaffold_1', 'SG1'), ('scaffold_4', 'SG2'), ('scaffold_16', 'SG3'), ('scaffold_18', 'SG2'), ('scaffold_25', 'SG3'), ('scaffold_28', 'SG3'), ('scaffold_7', 'SG2'), ('scaffold_11', 'SG2'), ('scaffold_12', 'SG2'), ('scaffold_14', 'SG3'), ('scaffold_63', 'SG2'), ('scaffold_66', 'SG5'), ('scaffold_41', 'SG2'), ('scaffold_47', 'SG2'), ('scaffold_52', 'SG3'), ('scaffold_54', 'SG2'), ('scaffold_55', 'SG2'), ('scaffold_57', 'SG2'), ('scaffold_5', 'SG2'), ('scaffold_37', 'SG2'), ('scaffold_38', 'SG2'), ('scaffold_40', 'SG3'), ('scaffold_42', 'SG2'), ('scaffold_48', 'SG2'), ('scaffold_22', 'SG3'), ('scaffold_23', 'SG2'), ('scaffold_17', 'SG2'), ('scaffold_35', 'SG2'), ('scaffold_36', 'SG2'), ('scaffold_65', 'SG2'), ('scaffold_26', 'SG2'), ('scaffold_27', 'SG2'), ('scaffold_30', 'SG2'), ('scaffold_31', 'SG2'), ('scaffold_33', 'SG2'), ('scaffold_39', 'SG4'), ('scaffold_50', 'SG2'), ('scaffold_53', 'SG3'), ('scaffold_60', 'SG2'), ('scaffold_61', 'SG2'), ('scaffold_62', 'SG2'), ('scaffold_64', 'SG2'), ('scaffold_15', 'SG4'), ('scaffold_3', 'SG5'), ('scaffold_21', 'SG3'), ('scaffold_24', 'SG3'), ('scaffold_29', 'SG3'), ('scaffold_34', 'SG4'), ('scaffold_32', 'SG3'), ('scaffold_56', 'SG2'), ('scaffold_8', 'SG2'), ('scaffold_9', 'SG2'), ('scaffold_10', 'SG2'), ('scaffold_13', 'SG2'), ('scaffold_2', 'SG3'), ('scaffold_19', 'SG3'), ('scaffold_20', 'SG3'), ('scaffold_45', 'SG6'), ('scaffold_46', 'SG6'), ('scaffold_6', 'SG1')])
24-02-04 09:01:29 [INFO] Outputing chromosome
- subgenome
assignments to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.chrom-subgenome.tsv
24-02-04 09:01:29 [INFO] Outputing significant differiential kmer
- subgenome
maps to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.sig.kmer-subgenome.tsv
24-02-04 09:01:29 [INFO] Start Pool with 1 process(es)
24-02-04 09:01:29 [INFO] 3 significant subgenome-specific kmers
24-02-04 09:01:29 [INFO] 3 SG1-specific kmers
24-02-04 09:01:29 [INFO] run CMD: Rscript /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer.mat.R
24-02-04 09:01:31 [INFO] Outputing PCA plot to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_pca.pdf
Traceback (most recent call last):
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 469, in run
cluster.pca(outfig, n_components=self.nsg, sg_color=self.colors,)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Cluster.py", line 50, in pca
X_pca = pca.fit_transform(self.data)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 383, in fit_transform
U, S, Vt = self._fit(X)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 430, in _fit
return self._fit_full(X, n_components)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 446, in _fit_full
raise ValueError("n_components=%r must be between 0 and "
ValueError: n_components=6 must be between 0 and min(n_samples, n_features)=4 with svd_solver='full'
However, it produced the two PDFs attached, which I assume seems to indicate a pretty much autohexaploid origin, right?
CBC_k15_q200_f2.kmer_freq.pdf
CBC_k15_q200_f2.kmer.mat.pdf
from subphaser.
The error is because there are too few differential kmers (only four). But it is not the time to say it is an autohexaploid. You may set -nsg 3
and -baseline 2
, or prune the three allelic chromosome sets to result three homoeologous chromosome sets like the wheat's ABD assembly. Even if it was an allohexploid (for example AABBDD), the current settings are identify differential kmers by comparing the homologous chromosome pairs (e.g. the two As).
from subphaser.
Ok, thanks a lot for the suggestion. I have finally managed to run SubPhaser using the unphased genome version and as initially suspected, I guess it looks pretty much like an autohexaploid except for a few chromosomes... Due to introgression maybe???
CBC_hap1k15_q200_f2.circos.pdf
CBC_hap1k15_q200_f2.LTR_Gypsy.tree.pdf
CBC_hap1k15_q200_f2.ltr.insert.density.pdf
CBC_hap1k15_q200_f2.kmer_pca.pdf
CBC_hap1k15_q200_f2.kmer.mat.pdf
from subphaser.
Yes, it looks like an autohexaploid. You may generate a kmer histogram and Smudgeplot (https://github.com/KamilSJaron/smudgeplot) for cross-valiadation. The plots can be generated from whole-genome HiFi reads.
Introgression is hard to say based on the results only.
from subphaser.
Related Issues (20)
- Error in os.link(figfile, dstfig) HOT 3
- IndexError: cannot do a non-empty take from an empty axes. HOT 10
- Failed to install SubPhaser HOT 9
- ModuleNotFoundError: No module named 'TEsorter' HOT 4
- The output subgenomes are not paired HOT 4
- Changing mutation rate HOT 1
- Invalid specifier: '>=3.6:' HOT 1
- Only one pair of homologous chromosomes were not phased HOT 5
- matplotlib raise RuntimeError ('Invalid DISPLAY variable') HOT 2
- Unbalanced of chromosomes number and differential kmers number among subgenomes HOT 1
- `TEsorter` cannot find `rexdb` in Singularity container HOT 3
- Singularity container fails if environmental variable `R_LIBS_USER` is set HOT 2
- No differential kmers HOT 2
- ValueError: All singletons are not allowed HOT 1
- ValueError: 0 kmer with fold > 2. Please reset the filter options. HOT 1
- 亚基因组分析 HOT 2
- Getting location of subgenome specific TEs HOT 2
- Suggestion for specific settings to improve subphasing HOT 3
- kmer 13 or less gives a lot of broken pipe errors HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from subphaser.