Comments (2)
Hi Nicolas,
Thanks for using it!
PIRATE has been successfully used on very large datasets >25,000 genomes. I would suggest that you:
1/ Check the genomes for quality. One poor quality genome can have a detrimental effect on the clustering and especially the paralog identification and classification.
2/ Start with a much smaller subset of your most diverse samples so that you can pick a range of thresholds (--steps) that accurately captures the diversity in your collection. You could also experiment with inflation values here to ensure sensible clusters are produced. I am afraid I don't have any tips for selecting an MCL inflat value for you :(
3/ Don't run it with gene alignment, it will take ages to finish and can be run separately or on genes of interest afterwards.
4/ You can also run it with paralog detection off (--para-off) on the initial run as this can take a long time to complete. It can then be rerun with paralog detection only, using the --pan-off option, once it has finished clustering at least once. You WILL need to keep intermediate files on each run for this to work (-z 2). I would test the workflow on a smaller subset so that you don't put the wrong options in on your full set and remove intermediate files or have to reprocess everything :)
5/ Throw as many cores as you can at it.
I hope that helps,
S
from pirate.
Hi Sion
Thanks for the quick input! I will let you know how it will work for me
from pirate.
Related Issues (20)
- error observed during "aligning all feature sequences" HOT 2
- Missing genome in output HOT 12
- Output gene sequences to run gene alignment separately HOT 4
- PIRATE_plots.pdf created by plot_summary.R HOT 1
- Error after MCL clustering step HOT 5
- How do you tell which gene families are single-copy or multi-copy? HOT 2
- Feature request: Option to include original IDs and annotations in fasta headers for align_features_sequences script HOT 2
- Average_dose =1 is appropriate to determine whether a gene family is a single copy? HOT 1
- - ERROR: link_clusters.pl failed. HOT 1
- Undefined subroutine &main::translate called HOT 2
- Error when running PIRATE MCL process
- For some single loci, a gene family but for others not. HOT 1
- problem in installation HOT 9
- Bump version in new release HOT 4
- Missing output files and coregenom files HOT 3
- stuck at threshold 60 during MCL clustering HOT 3
- PIRATE.pangenome_summary.txt HOT 6
- understanding pirate results
- question on presente/absence gene table data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pirate.