philarevalo / popcogent Goto Github PK
View Code? Open in Web Editor NEWMicrobial Populations as Clusters Of Gene Transfer
License: GNU General Public License v3.0
Microbial Populations as Clusters Of Gene Transfer
License: GNU General Public License v3.0
Hello,
I was wondering if there's any visualization tool/script you would suggest for plotting the .graphml file generated by PopCOGenT? I'm thinking to make a figure like Figure 3 in the PopCOGenT paper.
Thank you!
Yiyuan
Hello! I'm having an issue trying to run the core_gene_sweeps module. The error I get is:
Traceback (most recent call last):
File "phybreak2.maf_to_fasta.py", line 343, in <module>
corefile.write(">"+iso +"\n"+ full_seqdict[iso] +"\n")
KeyError: 'IRLA172'
However, the genome in particular is found in both the 'strain_names.txt' file and in the .maf alignment file. Any ideas on what might be causing this?
Hi,
still having trouble running flexible_genome_sweeps (bash snakemake.sh).
After setting the configuration file and running the script I get this error:
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 align_core_genes
1 all
1 cluster_tsv_to_tidy
1 get_flex_genes
1 make_master_table
5
rule cluster_tsv_to_tidy:
input: proc/acidovorax/clusters/clusters.0.tsv
output: output/acidovorax/acidovorax.0.master_presence_absence.csv
jobid: 4
wildcards: organism=acidovorax
Error in job cluster_tsv_to_tidy while creating output file output/acidovorax/acidovorax.0.master_presence_absence.csv.
RuleException:
ValueError in line 129 of /home/rsiani/PopCOGenT-master/src/flexible_genome_sweeps/Snakefile:
Python 3 package rpy2 needs to be installed to use the R function.
File "/home/rsiani/PopCOGenT-master/src/flexible_genome_sweeps/Snakefile", line 129, in __rule_cluster_tsv_to_tidy
File "/home/rsiani/.conda/envs/PopCOGenT/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
However I now installed rpy2 in every possible way and still cannot get through this.
Any idea?
Thanks in advance, Rob
Im running into an error copy pasted below. I am running PopCOGenT on 3 assemblies
Ouput directory does not exist. Creating new directory.
Traceback (most recent call last):
File "/GWSPH/groups/liu_price_lab/tools/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Larger genome'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "cluster.py", line 315, in
main()
File "cluster.py", line 70, in main
linear_model=negative_selection_linear_fit())
File "cluster.py", line 227, in make_edgefile
predict_df['Genome_size'] = trn_table['Larger genome'] / 1e6
File "/GWSPH/groups/liu_price_lab/tools/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in getitem
indexer = self.columns.get_loc(key)
File "/GWSPH/groups/liu_price_lab/tools/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Larger genome'
Hello. First all, thanks for this great tool.
I'm trying to run this script but a KeyError arrives:
Traceback (most recent call last):
File "phybreak4.retrieveLikelihood.py", line 142, in
ML_dict[subseq][str(tree_no)] += tree
KeyError: '78'
I was reading that is a problem with dictionaries, and sincerely I don't have any idea for the solution. Can you help me please?
Regards.
I am trying to run the flexible_genomes_sweeps using the Ruminococcus example dataset, as well as the default snakemake.sh and config.yaml files (with relevant pathway and file name entered). I am not able to get past the following error, and when I query which modules are present in the PopCOGenT environment (conda list -n PopCOGenT) I can see that rpy2 2.7.8 bioconda is present in the list. I am not sure what to try next.
Error in job cluster_tsv_to_tidy while creating output file output/Ruminococcus/Ruminococcus.0.master_presence_absence.csv.
RuleException:
ValueError in line 131 of /home/tate/PopCOGenT-master/src/flexible_genome_sweeps/Snakefile:
Python 3 package rpy2 needs to be installed to use the R function.
File "/home/tate/PopCOGenT-master/src/flexible_genome_sweeps/Snakefile", line 131, in __rule_cluster_tsv_to_tidy
File "/home/tate/miniconda3/envs/PopCOGenT/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Dear sir,
It good job in gene-flow and I want to explore the gene-flow in different host. So ran the code as your pipelines using your example in test directory. Then it still running and no error report in tree days later when run the phybreak3.MSAsubset_runPhyML.py. Hope your help.
Best regard,
Yun
Any recommendations for making a figure like the figure 3 in the paper?
Can you please create a release (and thus a stable url to a tarball) so that I can try to add PopCOGenT
to bioconda. See the bioconda docs if you're wondering why I'd need a tagged release. Adding PopCOGenT
to bioconda will make it easier to install with all of its dependencies into compute environments with existing bioinfo tools and all of their complex dependency structures.
Hi phil@philarevalo
I meet a problem when running "bash snakefile.sh" in flexible_genome_sweeps. the following is the error.
Waiting at most 5 seconds for missing files.
MissingOutputException in line 88 of /disk1/cau/cvmljy/pop/PopCOGenT-master/src/flexible_genome_sweeps/Snakefile:
Missing files after 5 seconds:
proc/sulfolobus/clusters/clu.0
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job run_mmseqs since they might be corrupted:
proc/sulfolobus/clusters/DB.0
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Hello,
I am unclear on the parameter inputs.
What is the difference between input_contig_dir and contig_dir and how do these relate to the input genomes dir for PopCOGenT?
What are ref_iso and ref_contig supposed to be?
Thanks,
Roth
Hi,
I want to extract the sequences based on the output of *.core_sweeps.csv
, which provided the start and end positions.
In the README
, you mentioned that
*.core_sweeps.csv
: The positions (in the coordinates of the whole genome alignment) of core genome sweeps.
I guess the align/*.core.fasta
is not what you mentioned, as it was concatenated and only contains core genome. Besides, it has gap. The align/*maf
seems to be the whole genome alignment, while it was not concatenated, making it hard to mapping the position. So I was wondering if the positions are in coordinates of the reference genome, which was provided in the phybreak_parameters.txt
at ref_iso
.
Thanks in advance!
Best,
Xiaojun
The flexible gene sweep pipeline should have a test output to check against.
Hello, I have been trying to get PopCOGenT running with the test data. I have not been able to get past this error.Is this the right place to ask for help?
(PopCOGenT) tate@Zareason:~/PopCOGenT-master/src/PopCOGenT$ bash PopCOGenT.sh
Traceback (most recent call last):
File "get_alignment_and_length_bias.py", line 5, in
from Bio import SeqIO
ModuleNotFoundError: No module named 'Bio'
Traceback (most recent call last):
File "cluster.py", line 2, in
import networkx as nx
ModuleNotFoundError: No module named 'networkx'
After the following I can see that biopython is in the PopCOGenT env.
conda list -n PopCOGenT
I installed the PopCOGenT environment in the following manner:
conda config --set restore_free_channel true
conda env create -f PopCOGenT.yml
conda install --name PopCOGenT mugsy=1.2.3 muscle=3.8.31
(I was not able to install phyml, mmseqs2, or infomap this way and have yet to install them).
And I have inserted the miniconda3 pathway as the mugsy installation and mugsyenv.sh
Thank-you,
Suzanne
PopCOGenT should have an expected test output to check against.
Hi,
Thanks for this pipeline! I met that issue during the installation and I wonder if that could impact the installation.
_conda env create -f PopCOGenT.yml
Collecting package metadata (repodata.json): done
Solving environment: done
Downloading and Extracting Packages
_openmp_mutex-4.5 | 22 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: \
SafetyError: The package for r-base located at /home/nico/miniconda3/pkgs/r-base-3.3.2-0
appears to be corrupted. The path 'lib/R/doc/html/packages.html'
has an incorrect size.
reported size: 2946 bytes
actual size: 11567 bytes
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::libgfortran-3.0.0-1, defaults/linux-64::libgcc-7.2.0-h69d50b8_2
path: 'lib/libgfortran.so.3'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::libgfortran-3.0.0-1, defaults/linux-64::libgcc-7.2.0-h69d50b8_2
path: 'lib/libgfortran.so.3.0.0'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::openblas-0.2.19-0, defaults/linux-64::libopenblas-0.3.6-h5a2b251_2
path: 'lib/libopenblas.so'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::openblas-0.2.19-0, defaults/linux-64::libopenblas-0.3.6-h5a2b251_2
path: 'lib/libopenblas.so.0'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::libopenblas-0.3.6-h5a2b251_2, conda-forge/linux-64::libblas-3.8.0-11_openblas
path: 'lib/libblas.so'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::libopenblas-0.3.6-h5a2b251_2, conda-forge/linux-64::libcblas-3.8.0-11_openblas
path: 'lib/libcblas.so'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: defaults/linux-64::libopenblas-0.3.6-h5a2b251_2, conda-forge/linux-64::liblapack-3.8.0-11_openblas
path: 'lib/liblapack.so'
done
Executing transaction: done
When I run the program, I got this:
sh PopCOGenT.sh
PopCOGenT.sh: 4: source: not found
PopCOGenT.sh: 5: source: not found
PopCOGenT.sh: 6: source: not found
usage: get_alignment_and_length_bias.py [-h] [--genome_dir GENOME_DIR]
[--genome_ext GENOME_EXT]
[--alignment_dir ALIGNMENT_DIR]
[--mugsy_path MUGSY_PATH]
[--mugsy_env MUGSY_ENV]
[--base_name BASE_NAME]
[--final_output_dir FINAL_OUTPUT_DIR]
[--num_threads NUM_THREADS]
[--keep_alignments] [--slurm]
[--script_dir SCRIPT_DIR]
[--source_path SOURCE_PATH]
get_alignment_and_length_bias.py: error: argument --genome_dir: expected one argument
usage: cluster.py [-h] [--base_name BASE_NAME]
[--length_bias_file LENGTH_BIAS_FILE]
[--clonal_cutoff CLONAL_CUTOFF]
[--output_directory OUTPUT_DIRECTORY]
[--infomap_args INFOMAP_ARGS] [--infomap_path INFOMAP_PATH]
[--single_cell]
cluster.py: error: argument --base_name: expected one argument
Here is the config file:
base_name='TARApop'
final_output_dir=/home/nico/programmes/PopCOGenT-master/output/
mkdir -p ${ [--final_output_dir FINAL_OUTPUT_DIR]
[--num_threads NUM_THREADS]
[--keep_alignments] [--slurm]
[--script_dir SCRIPT_DIR]
[--source_path SOURCE_PATH]
get_alignment_and_length_bias.py: error: argument --genome_dir: expected one argument
usage: cluster.py [-h] [--base_name BASE_NAME]
[--length_bias_file LENGTH_BIAS_FILE]
[--clonal_cutoff CLONAL_CUTOFF]
[--output_directory OUTPUT_DIRECTORY]
[--infomap_args INFOMAP_ARGS] [--infomap_path INFOMAP_PATH]
[--single_cell]
cluster.py: error: argument --base_name: expected one argument
Here is the config file:
_Base name for final output files ust a prefix to identify your outputs.
base_name='MAGspop'
final_output_dir=/home/nico/programmes/PopCOGenT-master/output/
mkdir -p ${final_output_dir}
mugsy_path=/home/nico/miniconda3/envs/PopCOGenT/bin/mugsy
mugsy_env=/home/nico/miniconda3/envs/PopCOGenT/bin/mugsyenv.sh
infomap_path=/home/nico/programmes/PopCOGenT-master/Infomap
genome_dir=/media/nico/MyBook/test/
genome_ext=.fasta
num_threads=10
keep_alignments=--keep_alignments
alignment_dir=/home/nico/programmes/PopCOGenT-master/output/proc/
mkdir -p ${alignment_dir}
single_cell=''
slurm_str=''
script_dir=''
source_path=''_
I probably have done something wrong, not sure where...
Thanks for your help!
Hello!
Would love to try PopCOGenT but ran into the error below getting started. I tried adding channels with the indicated joblib version to the yml but that didn't do the trick. Any suggestions? Thank you very much!
[k6logc@eofe7 PopCOGenT]$ conda env create -f PopCOGenT.yml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
Hi phil,thx a lot for provide this tool
But Im running into an error copy pasted below.
Traceback (most recent call last):
File "get_alignment_and_length_bias.py", line 166, in
main()
File "get_alignment_and_length_bias.py", line 89, in main
args.keep_alignments)
File "get_alignment_and_length_bias.py", line 142, in run_on_single_machine
renamed_genomes = [rename_for_mugsy(g) for g in glob.glob(genome_directory + '' + genome_extension)]
File "get_alignment_and_length_bias.py", line 142, in
renamed_genomes = [rename_for_mugsy(g) for g in glob.glob(genome_directory + '' + genome_extension)]
File "/home/zhang/Documents/PopCOGenT-master/src/PopCOGenT/length_bias_functions.py", line 45, in rename_for_mugsy
s.id = '{id}_{_num}'.format(id=mugsy_name, contig_num=str(i))
KeyError: '_num'
Traceback (most recent call last):
File "/home/zhang/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'Larger genome'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "cluster.py", line 315, in
main()
File "cluster.py", line 70, in main
linear_model=negative_selection_linear_fit())
File "cluster.py", line 227, in make_edgefile
predict_df['Genome_size'] = trn_table['Larger genome'] / 1e6
File "/home/zhang/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/pandas/core/frame.py", line 2059, in getitem
return self._getitem_column(key)
File "/home/zhang/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "/home/zhang/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "/home/zhang/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/pandas/core/internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "/home/zhang/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'Larger genome'
And i read about a earlier issue report about this,but the solve plan only cause more run error,that's the first one up forward KeyError: '_num'
Hi,
I tried to find the core gene sweep by the test data using PopCOGenT, but I got an error when running phybreak1.generate_maf.py:
sh: 1: source: not found
12 genomes
Starting Nucmer: Tue Feb 2 21:05:42 CST 2021
sh: 1: cannot create ./align/sulpho.M1627_contigs.queries.fsa: Directory nonexistent
.sh: 1: cannot create ./align/sulpho.M1627_contigs.filt.delta: Directory nonexistent
Nucmer search failed. Can't find delta file ./align/sulpho.M1627_contigs.filt.delta at /home/lenovo/software/mugsy_x86-64-v1r2.2/mugsy line 812.
I have unloaded the result files of bash PopCOGenT.sh, log file of running phybreak1.generate_maf.py and phybreak_parameters.txt, could you please give me any suggestions? Thanks.
strain_names.txt
phybreak_parameters.txt
sulfolobus_0.000355362.txt.cluster.tab.txt
sulfolobus.length_bias.txt
Best
Hao
Hi,
thanks a lot for providing this excellent software, very useful!
I’d have a few questions about the core_gene_sweeps module. Hope it’s ok that I put them together in one issue here.
1) Empty directory project_dir/align/trees/
After running scripts 1-7 I noticed this directory was empty, and I was wondering if maybe some script did not finish?
The trees are present in a file project_dir/align/phy_split/output_prefix.phy_phyml_tree.txt
though, so not sure if anything is missing.
2) Re-running core_gene_sweeps with changed focus population
Under usage it says:
For each population for which you wish to find sweeps, change the focus_population parameter and re-run scripts 3-7.
As script 3 (calculating the trees) was rather time-conusming, I was wondering if it’s necessary to run script 3 again, or if eventually the trees from the previous run could be used?
3) Generating output as given in Figure 5 (B and C) and Figure 6 in the paper
Are you planning to include scripts that generate this output (i.e. between population Pi in the sweep regions, Fst values, SNPs in a sliding window and trees for sweep and flanking regions) in the future, or eventually some of this can already be extracted directly from the output but I missed it?
4) Very minor: If I’m not mistaken, script 4 writes phybreak.leafdist_compare.R
into the project_dir
but then calls it from the working directory (PopCOGenT/src/core_gene_sweeps/
).
Thank you!
Matthias
Hello,
I have recently read your article regarding your tool and find it very interesting. However, I have a doubt regarding the expected inputs, because in the publication it is mentioned mostly applied to SAGs or even genomes from single cell. I wonder if it would be acceptable to also introduce MAGs in the analysis, since MAGs themself would be a set of population genomes? And if so, which ones would be acceptable, since in general there are those who work with medium quality MAGs (completeness estimates of ≥50% and less than 10% contamination), and others who work only with high quality MAGs (>90% complete with less than 5% contamination)?
Hi, recently I came across an article about using PopCOGenT. When I tried to use it, I followed the installation command in the README. Unfortunately, conda informed me that the corresponding version mentioned in the yml file is no longer available.
Looking for: ['biopython=1.68', 'joblib=0.9.4', 'networkx=1.11', 'numpy=1.11.3', 'pandas=0.19.2', 'python=3.6', 'scipy=0.18.1', 'statsmodels=0.8.0', 'snakemake=3.11.2', 'rpy2=2.8.5', 'r-tidyverse==1.0.0=r3.3.2_0']
Encountered problems while solving:
- package pandas-0.19.2-np112py36_1 is excluded by strict repo priority
- package scipy-0.18.1-np112py36_blas_openblas_201 is excluded by strict repo priority
- package biopython-1.68-py35_0 is excluded by strict repo priority
- package joblib-0.9.4-py36_0 is excluded by strict repo priority
- package r-tidyverse-1.0.0-r3.3.2_0 requires r-base 3.3.2*, but none of the providers can be installed
- package rpy2-2.8.5-py27r3.3.1_2 requires r-base 3.3.1*, but none of the providers can be installed
When I run phybreak4.retreiveLikelihood.py, I get the following:
PopCOGenT) connolly_j_husky_neu_edu@popcogent:/extra-space/home/PopCOGenT/src/core_gene_sweeps$ python3 phybreak4.retrieveLikelihood.py
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information available (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
/home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/R: /home/connolly_j_husky_neu_edu/.conda/envs/PopCOGenT/lib/R/bin/exec/../../lib/../../libtinfo.so.6: no version information availabble (required by /lib/x86_64-linux-gnu/libncursesw.so.5)
Fatal error: cannot open file 'phybreak.leafdist_compare.R': No such file or directory
It looks like some kind of versioning problem, but I haven't been able to resolve it. Any advice?
I'm reading the output of mugsy results using M1612_contigs and M1613_contigs in test files. However, I found that some results do not matched prefectly:
a score=933 label=85 mult=2
s M1612_contigs.M1612_contigs_0 889370 931 + 1691529 GAATTATACAAAAATTTATAAATAATTATATCAAATATTACCCATGGGAAAGAGTAAGTACAAGAGGGATTGGAGCAAATACGATGAGAACGTTATAATGAGATATACCCTAATGTTCCCCTTCTACGTCTTTGAACACTGGTTTACTAGCAGAGGAGAATAGGAACGCTAGGGCAAAGTATAAAGCTCCAAAGGAATTTAACGAATTCCTCCACACCTACCCTATAGGGCCATAGAAGGAGAGCACTAGAAAGACTAAAGATCATCACAACAAGCCTAGACTACTCAACAATATGGGAAAGAATAAGAAACATGAACATAACATTCCCAGAGGCAAGTGATGAACTTGAAGCAGACGCAACGGGAATAAACAAGAGAGGACAATAGCAAAATGGGGTAAAACTAGAGACTCAAAATTCCTCAAGATGGACAAGGACGAATTCAACGTAATAAACGCTGAAGTAATTAGCAACGAAGTTAAGACGGTTAAGGATTCACAAGATAAGGGAAAGAAGGTTTTATGGGGATAAGGCTTATGATACCAACGAGGCTGGAGTTGAGGTTGTTGTCCCACCTAGGAAGAACGCTTCTACTAAACGCAGTCATCCTGCTAGGCTGTGAGGGAGTTCAAGAAACTTGGCTATAATCGTTGGAGGGAGGAGAAGGGTTATGGTGTTAGGTGGAGGGTTGAGTCCTTGTTTTCTGCTGTTAACTTTTGGGGAGTCTGTTAGGGCTACAAGTTTTTTAAGGCAAGTGGTTGAGGCCAAGTTCTGGGCTTATGCATGGATGGTCCACTTGGCTGTAGTCGATAGGGCTCACGGTATTAGGATGTGAGCTTGAGAATAACGTTGAAATAAATATTAATTACTGAAAAATTCTCC-TTATGTCG-TATCATGCTTATGAAATAAATTGAAGATATCAACAAAGCAAC
s M1613_contigs.M1613_contigs_0 48715 96 + 1741614 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GGGTATTAGGGTGTGAGCTTGCGAATAACGTTGAAATAAATATTAATTACTGAAAGATT-TCCGTTAT-ACGATATCGTTTTAATGAAATAAATTGAA-----------------
However, PopCOGenT will not recognize the mismatch, and just take it as a whole sequence.
PopCOGenT/src/PopCOGenT/length_bias_functions.py
Lines 247 to 264 in 7296af9
Is this an expected feature?
Thanks!
Hi,
I have been using your script on my data but an error occurred using flexible_genome_sweeps:
Waiting at most 5 seconds for missing files.
Error in job run_mmseqs while creating output files proc/acidovorax/clusters/DB.0, proc/acidovorax/clusters/clu.0.
MissingOutputException in line 88 of /home/rsiani/PopCOGenT-master/src/flexible_genome_sweeps/Snakefile:
Missing files after 5 seconds:
proc/acidovorax/clusters/clu.0
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job run_mmseqs since they might be corrupted:
proc/acidovorax/clusters/DB.0
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Have you got any idea of what could have caused it? Thanks in advance and congratulations for your work!
Need to make sure that an easy fix doesn't also break the clonal clustering portion of the pipeline.
Hi Team,
I am facing the following issue [file not found error] while executing the PopCOGenT tool on around 361 gene sequences. I was able to run the tool on the test data.
context on the input data:
We are using sequences of a single gene from hundreds of isolates of the same genus. The sequence length of each gene is around 400 bp. Can this program be applied to single genes, or does it have to be used with draft genomes?
Thanks in advance,
Balu
Hi,
We are currently running the core_gene_sweep module, and stuck at third step which is running phyml. Do you have any suggestions on how to make this step run quicker, maybe increasing the number of threads or run them in parallel.
Our dataset includes 53 genomes, and according to phyml stats file, the number of datasets to run through is 108,133. So at the current pace, the script would take more than 20 days to complete running. Below is the command being run,
~/.local/bin/phyml -i PopCOGenT/src/core_gene_sweeps/output2align/phy_split/bovienii_core_genes.phy -n 108133 -q -m JC69 -f e -c 2 -a 0.022
Thank you,
Bhavya
I had run PopCOGenT.sh successfully before, but when I use another dataset it failed, the error message is :
Traceback (most recent call last):
File "cluster.py", line 315, in
main()
File "cluster.py", line 70, in main
linear_model=negative_selection_linear_fit())
File "cluster.py", line 298, in make_edgefile
n2 = ','.join(clonal_components[n2])
TypeError: sequence item 0: expected str instance, numpy.int64 found
Is there any special requirements for input fasta_file name?
Hi,
thank you for providing this package. I am excited to use it.
Do you intend in providing a license, and perhaps a contributing guide?
Kind regards,
V
Hi there,
Thank you for making this great tool. I'm running PopCOGenT on 190 bacterial genomes. And I got the error information as follows. I attached the complete log file here PopCOGenT.Gilliamella.log.
I was able to run PopCOGenT on the test dataset and another dataset with 80 genomes without any problem.
Thank you for any suggestions!
Yiyuan
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/yli/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 130, in call
return self.func(*args, **kwargs)
File "/home/yli/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 72, in call
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/yli/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 72, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/User/yli/PopCoGenT/PopCOGenT/src/PopCOGenT/length_bias_functions.py", line 20, in align_and_calculate_length_bias
random_seed)
File "/User/yli/PopCoGenT/PopCOGenT/src/PopCOGenT/length_bias_functions.py", line 90, in align_genomes
remove('{align_directory}/{prefix}'.format(prefix=prefix, align_directory=alignment_dir))
FileNotFoundError: [Errno 2] No such file or directory: '/User/yli/Startover/step10_PopCOGenT/Gilliamella/proc//bl26xmXEZE6LhLqtpVeqoPmYmSCzNPVO'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yli/miniconda3/envs/PopCOGenT/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/yli/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 140, in call
raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
FileNotFoundError Sat May 30 00:30:33 2020
PID: 41436Python 3.6.10: /home/yli/miniconda3/envs/PopCOGenT/bin/python
...........................................................................
/home/yli/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
67 def init(self, iterator_slice):
68 self.items = list(iterator_slice)
69 self._size = len(self.items)
70
71 def call(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(, ('/User/yli/S...ella/Gilliamella_zhB3022_Acerana.fa.renamed.mugsy', '/User/yli/S...liamella_zhP0221M0141_Amellifera.fa.renamed.mugsy', '/User/yli/Startover/step10_PopCOGenT/Gilliamella/
73
74 def len(self):
75 return self._size
76
Hi,
I meet trouble when running python get_alignment_and_length_bias.py
for about 180 genomes.
The following is the error:
.Parsing sequences for R2MyF9PMoHcjJAH9 multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home-user/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 130, in __call__
return self.func(*args, **kwargs)
File "/home-user/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home-user/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 72, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/mnt/home-user/software/PopCOGenT/src/PopCOGenT/length_bias_functions.py", line 26, in align_and_calculate_length_bias
length_bias_file)
File "/mnt/home-user/software/PopCOGenT/src/PopCOGenT/length_bias_functions.py", line 110, in calculate_length_bias
g2size)
File "/mnt/home-user/software/PopCOGenT/src/PopCOGenT/length_bias_functions.py", line 131, in get_transfer_measurement
s1temp, s2temp = zip(*filtered_blocks)
ValueError: not enough values to unpack (expected 2, got 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home-user/miniconda3/envs/PopCOGenT/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home-user/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py", line 140, in __call__
raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
ValueError Wed Jul 7 14:32:25 2021
PID: 302719Python 3.6.13: /home-user/miniconda3/envs/PopCOGenT/bin/python
...........................................................................
/home-user/miniconda3/envs/PopCOGenT/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
I requested 16 threads for this job. It works when I apply it to other datasets with less than 100 genomes. I am not sure if the number of genomes matters.
Could you please give me some suggestions?
Thanks in advance!
Xiaojun
Hi, popcogenT is excellent software and we have used it to resolve many questions.
However, one situation we meet is that we have gradually increased our genome set.
When the gene set is small (~500 genomes), we resolve them in to different populations. For example, genome A,B,C all belong to pop1.
When the genome set is larger (~1000 genomes), we resolve them again. In this time, genome A is pop2, genome B,C are still pop1.
This is the background to my following questions.
alpha=0.1
in the method summary_frame
. But within the length_bias.txt
, they should be 95% CI. Accordingly, should we set alpha=0.05
?Negative selection cutoff
changes along with the changes of the number in genomes. Is it normal, or how could we deal with it?Hi,
The following is the error information with my data.
Error in job parse_orfs while creating output file output/Mabs/Mabs.0.orfs.csv.
RuleException:
CalledProcessError in line 86 of /home/dragon/Database/python3/PopCOGenT/src/flexible_genome_sweeps/Snakefile:
Command '/home/dragon/Database/python3/miniconda3/envs/PopCOGenT/bin/python /home/dragon/Database/python3/PopCOGenT/src/flexible_genome_sweeps/.snakemake.1730bs5o.parse_orfs.py' returned non-zero exit status 1.
File "/home/dragon/Database/python3/PopCOGenT/src/flexible_genome_sweeps/Snakefile", line 86, in __rule_parse_orfs
File "/home/dragon/Database/python3/miniconda3/envs/PopCOGenT/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
I noticed the bug was from this line:
strain, contig, orf = re.match(r"(.*)([^_]+)(\d+)$", strain_contig_orf).groups()
Is there any special rule for the name of contig ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.