Giter Site home page Giter Site logo

cannot allocate memory about subphaser HOT 13 OPEN

dabitz avatar dabitz commented on June 20, 2024
cannot allocate memory

from subphaser.

Comments (13)

zhangrengang avatar zhangrengang commented on June 20, 2024

How much is the RAM of your computer?

from subphaser.

dabitz avatar dabitz commented on June 20, 2024

That's the thing. I am running from a cluster with 500G RAM and 64 threads

from subphaser.

zhangrengang avatar zhangrengang commented on June 20, 2024

How about the peak memory? Surely the large genome require large memory, but I can run the wheat genome (14Gb, 140M kmers, 21 chromosomes) with 1Tb RAM.
If it actually exceed the 500G RAM, you may try to increase -lower_count to reduce kmers, or reduce the chromosomes in the config file. If necessary, you may try to decrease the chunksize in /netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py from:

        # matrix
        logger.info('Loading kmer matrix from jellyfish')   # multiprocessing by kmer
        chunksize = None if self.pool_method == 'map' else 20000

to

        # matrix
        logger.info('Loading kmer matrix from jellyfish')   # multiprocessing by kmer
        chunksize = None if self.pool_method == 'map' else 200

By the way, if your hexaploid is an autohexaploid, there's no reason to waste time to try subphaser.

from subphaser.

dabitz avatar dabitz commented on June 20, 2024

Thanks! I will try on our HPC cluster with 1TB or adjust the parameters as you suggested. How long should the whole run take?
I am not sure is a autopolyploid, there is some evidence for a hybdrid between autotetra with a diploid.

from subphaser.

zhangrengang avatar zhangrengang commented on June 20, 2024

In general 1-2 days is needed for the large genome.

from subphaser.

dabitz avatar dabitz commented on June 20, 2024

somehow is strange... running on our HPC node with 1TB the job exits with>
Resource usage summary:

CPU time   :   2301.09 sec.
Max Memory :     69212 MB
Max Swap   :    754180 MB

Max Processes  :        44
Max Threads    :        48

from subphaser.

zhangrengang avatar zhangrengang commented on June 20, 2024

Are you using SLURM which limits Memory according to Processes?

from subphaser.

dabitz avatar dabitz commented on June 20, 2024

We use LSF, but I set the memory limit to 980G, and still exits. But it seems that the max memory set was not even reached before it exits.

from subphaser.

zhangrengang avatar zhangrengang commented on June 20, 2024

It is strange. You may try to reduce the -cpu set to 1 to see the memory cost.

from subphaser.

dabitz avatar dabitz commented on June 20, 2024

it did advance a bit, but still failed.

24-02-04 08:21:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa
24-02-04 08:22:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa
24-02-04 08:22:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa
24-02-04 08:22:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa
24-02-04 08:22:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa
24-02-04 08:22:48 [INFO] 62557073 kmers in total
24-02-04 08:22:48 [INFO] Filtering differential kmers
24-02-04 08:22:48 [INFO] Start Pool with 1 process(es)
24-02-04 08:28:46 [INFO] Processed 10000000 kmers
24-02-04 08:34:51 [INFO] Processed 20000000 kmers
24-02-04 08:40:59 [INFO] Processed 30000000 kmers
24-02-04 08:47:03 [INFO] Processed 40000000 kmers
24-02-04 08:52:35 [INFO] Processed 50000000 kmers
24-02-04 08:58:40 [INFO] Processed 60000000 kmers
24-02-04 09:00:08 [INFO] After filtering, remained 4 (0.00%) differential (freq >= 200) and 56 (0.00%) candidate (freq > 0) kmers
24-02-04 09:00:08 [INFO] Plot /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_freq.pdf
24-02-04 09:00:44 [INFO] New check point file: /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_CBC_k15_q200_f2.kmer.mat.ok
24-02-04 09:00:44 [INFO] ###Step: Cluster
24-02-04 09:00:44 [INFO] Performing bootstrap of 1000 replicates, with each replicate resampling 50% data with replacement
24-02-04 09:01:29 [INFO] Bootstrap: mean Adjusted Rand-Index: 0.9635; mean V-measure score: 0.9538
24-02-04 09:01:29 [INFO] Subgenome assignments: OrderedDict([('scaffold_1', 'SG1'), ('scaffold_4', 'SG2'), ('scaffold_16', 'SG3'), ('scaffold_18', 'SG2'), ('scaffold_25', 'SG3'), ('scaffold_28', 'SG3'), ('scaffold_7', 'SG2'), ('scaffold_11', 'SG2'), ('scaffold_12', 'SG2'), ('scaffold_14', 'SG3'), ('scaffold_63', 'SG2'), ('scaffold_66', 'SG5'), ('scaffold_41', 'SG2'), ('scaffold_47', 'SG2'), ('scaffold_52', 'SG3'), ('scaffold_54', 'SG2'), ('scaffold_55', 'SG2'), ('scaffold_57', 'SG2'), ('scaffold_5', 'SG2'), ('scaffold_37', 'SG2'), ('scaffold_38', 'SG2'), ('scaffold_40', 'SG3'), ('scaffold_42', 'SG2'), ('scaffold_48', 'SG2'), ('scaffold_22', 'SG3'), ('scaffold_23', 'SG2'), ('scaffold_17', 'SG2'), ('scaffold_35', 'SG2'), ('scaffold_36', 'SG2'), ('scaffold_65', 'SG2'), ('scaffold_26', 'SG2'), ('scaffold_27', 'SG2'), ('scaffold_30', 'SG2'), ('scaffold_31', 'SG2'), ('scaffold_33', 'SG2'), ('scaffold_39', 'SG4'), ('scaffold_50', 'SG2'), ('scaffold_53', 'SG3'), ('scaffold_60', 'SG2'), ('scaffold_61', 'SG2'), ('scaffold_62', 'SG2'), ('scaffold_64', 'SG2'), ('scaffold_15', 'SG4'), ('scaffold_3', 'SG5'), ('scaffold_21', 'SG3'), ('scaffold_24', 'SG3'), ('scaffold_29', 'SG3'), ('scaffold_34', 'SG4'), ('scaffold_32', 'SG3'), ('scaffold_56', 'SG2'), ('scaffold_8', 'SG2'), ('scaffold_9', 'SG2'), ('scaffold_10', 'SG2'), ('scaffold_13', 'SG2'), ('scaffold_2', 'SG3'), ('scaffold_19', 'SG3'), ('scaffold_20', 'SG3'), ('scaffold_45', 'SG6'), ('scaffold_46', 'SG6'), ('scaffold_6', 'SG1')])
24-02-04 09:01:29 [INFO] Outputing chromosome - subgenome assignments to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.chrom-subgenome.tsv
24-02-04 09:01:29 [INFO] Outputing significant differiential kmer - subgenome maps to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.sig.kmer-subgenome.tsv
24-02-04 09:01:29 [INFO] Start Pool with 1 process(es)
24-02-04 09:01:29 [INFO] 3 significant subgenome-specific kmers
24-02-04 09:01:29 [INFO] 3 SG1-specific kmers
24-02-04 09:01:29 [INFO] run CMD: Rscript /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer.mat.R
24-02-04 09:01:31 [INFO] Outputing PCA plot to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_pca.pdf
Traceback (most recent call last):
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 469, in run
cluster.pca(outfig, n_components=self.nsg, sg_color=self.colors,)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Cluster.py", line 50, in pca
X_pca = pca.fit_transform(self.data)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 383, in fit_transform
U, S, Vt = self._fit(X)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 430, in _fit
return self._fit_full(X, n_components)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 446, in _fit_full
raise ValueError("n_components=%r must be between 0 and "
ValueError: n_components=6 must be between 0 and min(n_samples, n_features)=4 with svd_solver='full'

However, it produced the two PDFs attached, which I assume seems to indicate a pretty much autohexaploid origin, right?

CBC_k15_q200_f2.kmer_freq.pdf
CBC_k15_q200_f2.kmer.mat.pdf

from subphaser.

zhangrengang avatar zhangrengang commented on June 20, 2024

The error is because there are too few differential kmers (only four). But it is not the time to say it is an autohexaploid. You may set -nsg 3 and -baseline 2, or prune the three allelic chromosome sets to result three homoeologous chromosome sets like the wheat's ABD assembly. Even if it was an allohexploid (for example AABBDD), the current settings are identify differential kmers by comparing the homologous chromosome pairs (e.g. the two As).

from subphaser.

dabitz avatar dabitz commented on June 20, 2024

Ok, thanks a lot for the suggestion. I have finally managed to run SubPhaser using the unphased genome version and as initially suspected, I guess it looks pretty much like an autohexaploid except for a few chromosomes... Due to introgression maybe???
CBC_hap1k15_q200_f2.circos.pdf
CBC_hap1k15_q200_f2.LTR_Gypsy.tree.pdf
CBC_hap1k15_q200_f2.ltr.insert.density.pdf
CBC_hap1k15_q200_f2.kmer_pca.pdf
CBC_hap1k15_q200_f2.kmer.mat.pdf

from subphaser.

zhangrengang avatar zhangrengang commented on June 20, 2024

Yes, it looks like an autohexaploid. You may generate a kmer histogram and Smudgeplot (https://github.com/KamilSJaron/smudgeplot) for cross-valiadation. The plots can be generated from whole-genome HiFi reads.
Introgression is hard to say based on the results only.

from subphaser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.