Giter Site home page Giter Site logo

panaroo's Introduction

panaroo

Build Status

An updated pipeline for pangenome investigation

alt text

Documentation

Documentation for Panaroo can be found here

panaroo's People

Contributors

danderson123 avatar gtonkinhill avatar johnlees avatar milnus avatar nzmacalasdair avatar pansapiens avatar samlipworth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

panaroo's Issues

Make output identical to roary where possible

To make this is easy as possible for people to replace roary in their pipelines, having the same output (in addition to/instead of/with a --roary flag) would be nice.

I think it's just the .Rtab file that's different in the current versions?

Gene coding table

Panaroo is a great approach and I am looking forward to use it. At the moment it is lacking a parameter to set the gene coding table to 11 or 4, depending on the organism studied. Otherwise I run into the following error:

 File "/lib/python3.6/site-packages/panaroo/prokka.py", line 161, in translate_sequences
    raise ValueError("Premature stop codon in a gene!")
ValueError: Premature stop codon in a gene!

Error in `generate network'

When generating the network from the CD-hit clustering, the function encounters a keyError:

Total CPU time 95.36
Traceback (most recent call last):
File "../squeaky/squeaky.py", line 180, in
main()
File "../squeaky/squeaky.py", line 136, in main
split_paralogs=args.split_paralogs)
File "/storage/3632-6230/Android/data/com.termux/files/squeaky/squeaky/generate_network.py", line 60, in generate_network
current_cluster = seq_to_cluster[id]
KeyError: '0_0_4'
I am up to date with origin/master -- if there a fix that has not been pushed to master?

Errno 24 when running panaroo

Hi I get the error message below when trying to run panaroo.

panaroo -i *.gff -o results error: argument -i/--input: can't open 'E1042_BEL_2011.gff': [Errno 24] Too many open files: 'E1042_BEL_2011.gff'
I have deleted this file and get the same error with the next file in the list. The files are all similar sizes. I also get the same error when running: panaroo-qc

plot_panaroo_qc: error: argument -i/--input: can't open 'E1040_BEL_2011.gff': [Errno 24] Too many open files: 'E1040_BEL_2011.gff'

Any suggestions?

Temp file in current working directory

There is a temp file being created in the current working directory on devel when invoking panaroo (temp_clusters_dna.txt). This may cause a problem if you don't have writing permission to that directory.

algorithm to bring sanity to node names

Panaroo node names are unwieldy and not very useful. I propose a method similar to the code below to fix them. It changes names like so:

oqxB7_yofA_1__mdtB_2_hdfR_5_ttgI_4 -> oqxB7
hcaR_1_benM_2_hdfR_4_hdfR_9_hdfR_8_hdfR_5_hdfR_1_benM_1_hdfR_3_benM_5 -> hcaR_9
baeR_1__baeR_2_barA_2_rcsC_13_baeR_barA -> baeR_2
hdfR_4_hcaR_2_hdfR_1_hdfR_8_hcaR_3_hcaR_1_hdfR_7_benM_6_benM_1_hcaR_4_hdfR_10 -> hdfR_13

names = [] for node, data in G.nodes(data=True): names.append(data['name']) for node, data in G.nodes(data=True): if list(data['name']).count('_') > 1 or\ (list(data['name']).count('_') == 1 and not\ data['name'].split('_')[-1].isdigit()): t1 = data['name'].split('_')[0] t2 = '_'.join(data['name'].split('_')[:1]) t3 = '' if data['name'].split('_')[-1].isdigit(): t3 = '_'.join(data['name'].split('_')[-2:]) else: t3 = data['name'].split('_')[-1] if not t1 in names: m.write('\t'.join([data['name'], t1]) + '\n') data['name'] = t1 names.append(t1) elif not t2 in names: m.write('\t'.join([data['name'], t2]) + '\n') data['name'] = t2 names.append(t2) elif not t3 in names: m.write('\t'.join([data['name'], t3]) + '\n') data['name'] = t3 names.append(t3) else: success = False for x in range(1, 200): t4 = t1 + '_' + str(x) if not t4 in names: m.write('\t'.join([data['name'], t4]) + '\n') data['name'] = t4 names.append(t4) success = True

mash absent from bioconda recipe

Hello,

I was trying to run panaroo-qc, and it fails because it couldn't find mash. Works fine after installing. So, probably missing mash in bioconda recipe.

Thanks!

Error in collapse mistranslations

Iam getting following error
collapse mistranslations...
Traceback (most recent call last):
File "/home/giga/anaconda3/bin/panaroo", line 11, in
load_entry_point('panaroo==0.1.0', 'console_scripts', 'panaroo')()

Headers in gene_presence_absence_roary

The headers in the gene_presence_absence_roary are each named contig rather than being named after the input GFF file, breaking some functionalities when used in place of Roary

--merge_paralogs option as a post-processing step?

Is it possible to have panaroo merge paralogs in a separate, post-processing step if the graph is already available? I see you have made such setup for alignment so I'm trying to find a way to implement this option in a similar way.

Also, I'm getting a valuerror on line 14 when I try the --merge_paralogs option

if check_merge_mems:
if len(G.nodes[nodeA]['members']
& G.nodes[nodeB]['members']) > 0:
raise ValueError("merging nodes with the same genome IDs!")

I think it detects nodes with same IDs in the graph output, but I'm not sure how I can fix this issue, or why this should raise an error in the first place. I'd appreciate if you could help. Thanks!

Typo: triming

  --trailing_recursive TRAILING_RECURSIVE
                        number of times to perform recursive triming of low
                        support nodes near the end of contigs

panaroo-spydrpick

I have run the main part of panaroo without issue but when trying to run panaroo-spydrpick I get an error:

panaroo-spydrpick -i gene_presence_absence.csv -o coevo/ --tree ../fastas/mashtree Traceback (most recent call last): File "/well/bag/users/lipworth/miniconda3/envs/roary/bin/panaroo-spydrpick", line 10, in <module> sys.exit(main()) File "/well/bag/users/lipworth/miniconda3/envs/roary/lib/python3.7/site-packages/panaroo/spydrpick.py", line 246, in main pa_matrix, gene_names, sample_names = read_presence_absence(args.pa_file) File "/well/bag/users/lipworth/miniconda3/envs/roary/lib/python3.7/site-packages/panaroo/spydrpick.py", line 12, in read_presence_absence matrix_txt = np.loadtxt(filename, dtype=str, delimiter=",", comments=None) File "/well/bag/users/lipworth/miniconda3/envs/roary/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1159, in loadtxt for x in read_data(_loadtxt_chunksize): File "/well/bag/users/lipworth/miniconda3/envs/roary/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1084, in read_data % line_num) ValueError: Wrong number of columns at line 2
I thought maybe this was due to the format of the csv file so I also tried the gene_presence_absence _roary.csv file:

panaroo-spydrpick -i gene_presence_absence_roary.csv -o coevo/ --tree ../fastas/mashtree [False False False ... True True True] 881 [881 881 881 ... 1 1 1] Traceback (most recent call last): File "/well/bag/users/lipworth/miniconda3/envs/roary/bin/panaroo-spydrpick", line 10, in <module> sys.exit(main()) File "/well/bag/users/lipworth/miniconda3/envs/roary/lib/python3.7/site-packages/panaroo/spydrpick.py", line 259, in main chunk_size=100) File "/well/bag/users/lipworth/miniconda3/envs/roary/lib/python3.7/site-packages/panaroo/spydrpick.py", line 117, in spydrpick mi = mi_00 * (np.log(mi_00) - np.log(mi_1) - np.log(mi_0)) ValueError: operands could not be broadcast together with shapes (100,37990) (37990,1)

any idea how I can get this to work? Happy to share the data

Output some basic data and post-run diagnostic plots

Plots of input data:
Number of genes per samples
Number of unique kmers per sample
MDS plot of samples based on kmer distances
Plot(?) or file of summary statistics as per roary (sliding thresholds?)

Plots to help diagose successful/unsuccessful runs:
Histogram with number of sampels on x, number of genes on y

Also, distribution plot for choosing first/second clustering thresholds

Allow input files to be provided in a file

When you get up to 10^3/10^4 genomes, especially if using the long paths to the pathogen informatics directories, the length of the command will probably exceed that allowable by bash, which is annoying to get around.

Having the option (or replacing the current one) to list these inputs in a file would get around this.
It can also let you give them a different name from the filename, if that is useful.

Error in collapse gene families

Hi, I'm running a data set of ~900 complete prokaryote genomes from NCBI and it has errored out at the 'collapse gene families' stage:

collapse gene families..
Traceback (most recent call last):
File "/home/rhall/.local/bin/panaroo", line 11, in
load_entry_point('panaroo==1.1.2', 'console_scripts', 'panaroo')()
File "/home/rhall/.local/lib/python3.6/site-packages/panaroo/main.py", line 280, in main
quiet=(not args.verbose))
File "/home/rhall/.local/lib/python3.6/site-packages/panaroo/clean_network.py", line 137, in collapse_families
seqid_to_index[sid] = centroid_to_index[seqid_to_centroid[sid]]
KeyError: '948_1_102'

Do you have a suggestion for fixing this?
Many thanks!

Presence/absence in Roary format issue

I have noticed that in the gene_presence_absence__roary.csv file the columns giving the number of isolates pr. cluster and the number of sequences pr. cluster seems to have been switched around.
I have run Panaroo with 10 genomes and the column giving number of isolates range between 1 and 50, whereas the column for number of sequences reach a maximum of 10.

Querry regaring statistics and alignment

Hi,
Panaroo default threshold for a gene to be in core if its there in 95% of the isolates. But the summary_statistics.txt file tells core is in between 99 to 100 % and soft core is in between 95 and 99%.
While constructing the overall core-genes alignment does it considered only on core (99% to 100%) or its included the genes from soft-core (95% to 99) too?

Thanking you,
Sumeet

Detailed Tutorial Required

Hello I am using Panaroo for pangenome estimation. I didn't find a detailed tutorial where I can see how to use this pipeline effectively and how to utilize all of its functionalities. Rather a single command I saw in docs. After running that command I only got following files
combined_DNA_CDS.fasta
combined_protein_cdhit_out.txt.clstr
gene_data.csv pre_filt_graph.gml
combined_protein_cdhit_out.txt
combined_protein_CDS.fasta
tmphdrk_kmh
I didn't find any gene presence/absence file and not any other pangenome/coregenome, accesory,
How can I generate these files.
Please provide a detailed tutorial like other pangenome estimation tools are providing.
Thank You.

Groups without sequence in pan_genome_reference.fa

Dear PANAROO team,

I started to use your tool (v1.1.2 installed through conda) few days ago and I think I found a bug (or something I do not understand :) ).

For some groups, I cannot find a reference sequence in the pan_genome_reference.fa.
This groups contain only one gene (but other "singleton" groups have a sequence in the pan_genome_reference.fa so this is not the reason why I guess). And these genes are in the GFF files and gene_data.csv file.

## sample of gene_presence_absence_roary.csv lines describing groups without  reference sequence in pan_genome_reference.fa

Gene,Non-unique Gene name,Annotation,No. isolates,No. sequences,Avg sequences per isolate,Genome Fragment,Order within Fragment,Accessory Fragment,Accessory Order with Fragment,QC,Min group size nuc,Max group size nuc,Avg group size nuc,MYSP-0248,MYSP-0250,MYSP-0251,MYSP-0255,MYSP-0258,MYSP-0261,MYSP-0270,MYSP-0278,MYSP-0279,MYSP-0291,MYSP-0297,MYSP-0301,MYSP-0305,MYSP-0325,MYSP-0339,MYSP-0342,MYSP-0346,MYSP-0347,MYSP-0348,MYSP-0357,MYSP-0367
group_2992,,IS256 family transposase ISSod4,1,1,1,1,5674,,,,1203,1203,1203,,,,,,,MYSP-0270_00277,,,,,,,,,,,,,,
group_2997,,IS630 family transposase ISEc33,1,1,1,1,5679,,,,1032,1032,1032,,,,,,,MYSP-0270_00679,,,,,,,,,,,,,,
group_2998,,IS630 family transposase ISEc33,1,1,1,1,5680,,,,1032,1032,1032,,,,,,,MYSP-0270_00970,,,,,,,,,,,,,,
group_2999,,IS256 family transposase ISSod4,1,1,1,1,5681,,,,1203,1203,1203,,,,,,,MYSP-0270_01056,,,,,,,,,,,,,,
group_3000,,IS630 family transposase ISEc33,1,1,1,1,5682,,,,1032,1032,1032,,,,,,,MYSP-0270_01753,,,,,,,,,,,,,,
group_3001,,IS630 family transposase ISEc33,1,1,1,1,5683,,,,1032,1032,1032,,,,,,,MYSP-0270_01918,,,,,,,,,,,,,,
group_3002,,IS256 family transposase ISSod4,1,1,1,1,5684,,,,1203,1203,1203,,,,,,,MYSP-0270_01983,,,,,,,,,,,,,,
group_3003,,IS3 family transposase ISVisp4,1,1,1,1,5685,,,,396,396,396,,,,,,,MYSP-0270_02121,,,,,,,,,,,,,,
group_3004,,IS256 family transposase ISSod4,1,1,1,1,5686,,,,1203,1203,1203,,,,,,,MYSP-0270_02325,,,,,,,,,,,,,,
group_3009,xerC_2,Tyrosine recombinase XerC,1,1,1,1,5692,,,,1074,1074,1074,,,,,,,MYSP-0270_02400,,,,,,,,,,,,,,
group_3010,dnaG_2,DNA primase,1,1,1,1,5693,,,,3075,3075,3075,,,,,,,MYSP-0270_02401,,,,,,,,,,,,,,
group_3011,,hypothetical protein,1,1,1,1,5694,,,,399,399,399,,,,,,,MYSP-0270_02402,,,,,,,,,,,,,,
group_3012,symE_3,Endoribonuclease SymE,1,1,1,1,5695,,,,234,234,234,,,,,,,MYSP-0270_02403,,,,,,,,,,,,,,
group_3013,,hypothetical protein,1,1,1,1,5696,,,,378,378,378,,,,,,,MYSP-0270_02404,,,,,,,,,,,,,,
group_3015,,IS630 family transposase ISEc33,1,1,1,1,5698,,,,1032,1032,1032,,,,,,,MYSP-0270_02698,,,,,,,,,,,,,,

## group members are present in the source GFF files (PROKKA annotation):

MYSP-0270-1_arrow	prokka	gene	288601	289803	.	-	.	ID=MYSP-0270_00277_gene;locus_tag=MYSP-0270_00277
MYSP-0270-1_arrow	Prodigal:002006	CDS	288601	289803	.	-	0	ID=MYSP-0270_00277;Parent=MYSP-0270_00277_gene,MYSP-0270_00277_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISSod4;locus_tag=MYSP-0270_00277;product=IS256 family transposase ISSod4
MYSP-0270-1_arrow	prokka	gene	679096	680127	.	+	.	ID=MYSP-0270_00679_gene;locus_tag=MYSP-0270_00679
MYSP-0270-1_arrow	Prodigal:002006	CDS	679096	680127	.	+	0	ID=MYSP-0270_00679;Parent=MYSP-0270_00679_gene,MYSP-0270_00679_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISEc33;locus_tag=MYSP-0270_00679;product=IS630 family transposase ISEc33
MYSP-0270-1_arrow	prokka	gene	992591	993622	.	-	.	ID=MYSP-0270_00970_gene;locus_tag=MYSP-0270_00970
MYSP-0270-1_arrow	Prodigal:002006	CDS	992591	993622	.	-	0	ID=MYSP-0270_00970;Parent=MYSP-0270_00970_gene,MYSP-0270_00970_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISEc33;locus_tag=MYSP-0270_00970;product=IS630 family transposase ISEc33
MYSP-0270-1_arrow	prokka	gene	1080239	1081441	.	+	.	ID=MYSP-0270_01056_gene;locus_tag=MYSP-0270_01056
MYSP-0270-1_arrow	Prodigal:002006	CDS	1080239	1081441	.	+	0	ID=MYSP-0270_01056;Parent=MYSP-0270_01056_gene,MYSP-0270_01056_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISSod4;locus_tag=MYSP-0270_01056;product=IS256 family transposase ISSod4
MYSP-0270-1_arrow	prokka	gene	1811953	1812984	.	-	.	ID=MYSP-0270_01753_gene;locus_tag=MYSP-0270_01753
MYSP-0270-1_arrow	Prodigal:002006	CDS	1811953	1812984	.	-	0	ID=MYSP-0270_01753;Parent=MYSP-0270_01753_gene,MYSP-0270_01753_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISEc33;locus_tag=MYSP-0270_01753;product=IS630 family transposase ISEc33
MYSP-0270-1_arrow	prokka	gene	1980851	1981882	.	+	.	ID=MYSP-0270_01918_gene;locus_tag=MYSP-0270_01918
MYSP-0270-1_arrow	Prodigal:002006	CDS	1980851	1981882	.	+	0	ID=MYSP-0270_01918;Parent=MYSP-0270_01918_gene,MYSP-0270_01918_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISEc33;locus_tag=MYSP-0270_01918;product=IS630 family transposase ISEc33
MYSP-0270-1_arrow	prokka	gene	2049751	2050953	.	-	.	ID=MYSP-0270_01983_gene;locus_tag=MYSP-0270_01983
MYSP-0270-1_arrow	Prodigal:002006	CDS	2049751	2050953	.	-	0	ID=MYSP-0270_01983;Parent=MYSP-0270_01983_gene,MYSP-0270_01983_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISSod4;locus_tag=MYSP-0270_01983;product=IS256 family transposase ISSod4
MYSP-0270-1_arrow	prokka	gene	2195257	2195652	.	-	.	ID=MYSP-0270_02121_gene;locus_tag=MYSP-0270_02121
MYSP-0270-1_arrow	Prodigal:002006	CDS	2195257	2195652	.	-	0	ID=MYSP-0270_02121;Parent=MYSP-0270_02121_gene,MYSP-0270_02121_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISVisp4;locus_tag=MYSP-0270_02121;product=IS3 family transposase ISVisp4
MYSP-0270-1_arrow	prokka	gene	2406691	2407893	.	-	.	ID=MYSP-0270_02325_gene;locus_tag=MYSP-0270_02325
MYSP-0270-1_arrow	Prodigal:002006	CDS	2406691	2407893	.	-	0	ID=MYSP-0270_02325;Parent=MYSP-0270_02325_gene,MYSP-0270_02325_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISSod4;locus_tag=MYSP-0270_02325;product=IS256 family transposase ISSod4
MYSP-0270-1_arrow	prokka	gene	2479054	2480127	.	-	.	ID=MYSP-0270_02400_gene;Name=xerC_3;gene=xerC_3;locus_tag=MYSP-0270_02400
MYSP-0270-1_arrow	Prodigal:002006	CDS	2479054	2480127	.	-	0	ID=MYSP-0270_02400;Parent=MYSP-0270_02400_gene,MYSP-0270_02400_mRNA;Name=xerC_3;gene=xerC_3;inference=ab initio prediction:Prodigal:002006,protein motif:HAMAP:MF_01808;locus_tag=MYSP-0270_02400;product=Tyrosine recombinase XerC
MYSP-0270-1_arrow	prokka	gene	2480156	2483230	.	-	.	ID=MYSP-0270_02401_gene;Name=dnaG_3;gene=dnaG_3;locus_tag=MYSP-0270_02401
MYSP-0270-1_arrow	Prodigal:002006	CDS	2480156	2483230	.	-	0	ID=MYSP-0270_02401;Parent=MYSP-0270_02401_gene,MYSP-0270_02401_mRNA;eC_number=2.7.7.-;Name=dnaG_3;gene=dnaG_3;inference=ab initio prediction:Prodigal:002006,protein motif:HAMAP:MF_00974;locus_tag=MYSP-0270_02401;product=DNA primase
MYSP-0270-1_arrow	prokka	gene	2483331	2483729	.	+	.	ID=MYSP-0270_02402_gene;locus_tag=MYSP-0270_02402
MYSP-0270-1_arrow	Prodigal:002006	CDS	2483331	2483729	.	+	0	ID=MYSP-0270_02402;Parent=MYSP-0270_02402_gene,MYSP-0270_02402_mRNA;inference=ab initio prediction:Prodigal:002006;locus_tag=MYSP-0270_02402;product=hypothetical protein
MYSP-0270-1_arrow	prokka	gene	2483801	2484034	.	+	.	ID=MYSP-0270_02403_gene;Name=symE_4;gene=symE_4;locus_tag=MYSP-0270_02403
MYSP-0270-1_arrow	Prodigal:002006	CDS	2483801	2484034	.	+	0	ID=MYSP-0270_02403;Parent=MYSP-0270_02403_gene,MYSP-0270_02403_mRNA;eC_number=3.1.-.-;Name=symE_4;gene=symE_4;inference=ab initio prediction:Prodigal:002006,protein motif:HAMAP:MF_01193;locus_tag=MYSP-0270_02403;product=Endoribonuclease SymE
MYSP-0270-1_arrow	prokka	gene	2484095	2484472	.	-	.	ID=MYSP-0270_02404_gene;locus_tag=MYSP-0270_02404
MYSP-0270-1_arrow	Prodigal:002006	CDS	2484095	2484472	.	-	0	ID=MYSP-0270_02404;Parent=MYSP-0270_02404_gene,MYSP-0270_02404_mRNA;inference=ab initio prediction:Prodigal:002006;locus_tag=MYSP-0270_02404;product=hypothetical protein
MYSP-0270-1_arrow	prokka	gene	2794245	2795276	.	-	.	ID=MYSP-0270_02698_gene;locus_tag=MYSP-0270_02698
MYSP-0270-1_arrow	Prodigal:002006	CDS	2794245	2795276	.	-	0	ID=MYSP-0270_02698;Parent=MYSP-0270_02698_gene,MYSP-0270_02698_mRNA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISEc33;locus_tag=MYSP-0270_02698;product=IS630 family transposase ISEc33

## and  present in the gene_data.csv also

MYSP-0270,MYSP-0270-1_arrow,6_0_246,MYSP-0270_00277,MSQPFDFDKALKALQDGQALTGKDGILTPLIKQLTEAALAAELDSHLAQDIAANRKNGSSKKTLKTPTGAFELATPRDRNGSFEPQLVKKHQTTLSDEIERKIIRMFALGMSYKDISQEIEDLYAFSVSSATISAVTDKVIPELKLWQQRPLEAVYPFVWLDAIHYKIREDGRYQSKAVYTVLALNLEGKKEILGLYLSESEGANFWLSVLTDLQNRGVNDILIACVDGLTGFPEAINSIYPDTEVQLCVIHQIRNSIKYVASKHHKAFMTDLKPVYRAVSKEAAEVALDELEEKWGQQYPVVIQSWRRKWENLSHYFRYPATIRKVIYTTNAIESVHRQFRKLTKTKGAFPNENSLLKLLYLGLMNAQEKWTMPIQSWNLTLSQLAIYFEGRLDKVITL,ATGTCCCAACCCTTCGATTTCGATAAAGCCCTGAAAGCACTTCAGGATGGTCAGGCGCTGACTGGCAAAGATGGCATCTTAACGCCGTTAATCAAACAGTTAACTGAGGCTGCGCTGGCTGCTGAGCTGGACTCTCATCTGGCTCAGGATATTGCTGCTAACCGGAAAAATGGTTCGTCCAAAAAAACCCTCAAAACGCCAACCGGTGCCTTTGAACTCGCCACGCCCCGCGATCGTAACGGCTCTTTTGAGCCTCAATTGGTCAAAAAGCATCAGACCACGCTTTCTGATGAGATTGAGCGCAAGATCATCCGCATGTTCGCGCTGGGCATGAGTTACAAGGATATCAGCCAGGAAATTGAAGACCTGTATGCTTTCAGTGTTTCCAGCGCCACGATCAGTGCCGTCACCGATAAAGTTATTCCTGAACTGAAACTGTGGCAGCAACGCCCTCTTGAAGCGGTTTATCCCTTTGTCTGGCTGGATGCCATTCATTATAAAATCCGCGAAGACGGGCGTTATCAGAGCAAAGCCGTGTACACCGTTCTAGCACTGAATCTCGAAGGCAAGAAAGAGATCCTGGGCCTGTATCTGTCTGAAAGCGAAGGGGCTAACTTCTGGCTGTCGGTGCTGACCGATCTACAAAACCGCGGCGTGAACGACATTCTGATTGCCTGTGTGGATGGTCTGACAGGGTTCCCGGAAGCGATAAACAGCATTTACCCGGATACCGAAGTCCAGCTCTGTGTTATCCATCAGATCCGAAACTCGATTAAATATGTCGCCTCAAAGCACCATAAGGCGTTCATGACCGACCTGAAGCCAGTTTATCGTGCAGTCTCGAAAGAGGCGGCAGAGGTGGCGCTGGATGAACTGGAGGAGAAATGGGGCCAGCAGTACCCGGTGGTTATTCAGTCATGGCGGAGAAAATGGGAAAATCTGTCCCATTACTTCCGGTATCCGGCGACGATCCGTAAGGTAATTTACACCACAAACGCCATTGAATCAGTGCACCGTCAGTTCAGAAAGCTGACGAAAACGAAAGGTGCATTCCCGAATGAAAACAGTCTGTTGAAGCTACTTTATCTGGGGTTAATGAATGCCCAGGAAAAATGGACAATGCCAATACAAAGCTGGAATTTGACATTGTCACAGCTGGCGATTTATTTTGAAGGCCGCCTTGATAAAGTGATTACGTTGTAA,,IS256 family transposase ISSod4
MYSP-0270,MYSP-0270-1_arrow,6_0_633,MYSP-0270_00679,MPIIAPIPRGERRLMQKAIHKTRDKNHARRLTAMLMLHRGERVSNVARTLCCARSSVGRWINWFTLSGVEGLKSLPAGRARRWPFEHICTLLRELVKHSPGDFGYQRSRWSTELLAIKINEITGCQLHAGTVRRWLPSAGLVWRRAAPTLRIRDPHKDEKMAVIRKALDECSAEHPVFYEDEVDIHLNPKIGADWQMRGQQKRVVTPGQNEKYYLAGALHSGTGKVSYAGGNSKSSALFISLLKRLKATYRRAKTITLIVDNYIIHKSRETQRWLKENPKFRVIYQPVYSPWVNHVERLWQALHDTITRNHQCRSMWQLLKKVRHFMETVSPFPGGKHGQAKV,ATGCCGATCATAGCACCTATTCCCCGTGGCGAAAGACGCCTGATGCAGAAAGCTATCCATAAAACGCGTGATAAAAATCATGCCCGCAGGCTCACCGCTATGCTGATGCTTCATCGGGGTGAGCGGGTCAGCAATGTCGCCAGAACGCTCTGCTGTGCCCGTTCATCCGTCGGACGCTGGATTAACTGGTTTACGCTGTCGGGTGTTGAAGGGCTGAAGTCATTACCCGCCGGGCGTGCCCGCCGTTGGCCGTTTGAGCATATCTGCACGCTGCTACGCGAGCTGGTAAAACATTCTCCCGGCGATTTTGGCTATCAGCGTTCACGCTGGAGTACAGAACTTCTGGCGATAAAAATCAATGAGATAACGGGATGCCAGTTGCATGCCGGAACCGTGCGCCGCTGGTTGCCGTCTGCGGGGCTTGTCTGGCGAAGGGCCGCGCCAACCCTGCGTATCCGCGACCCGCATAAAGATGAAAAAATGGCGGTAATCCGCAAAGCGCTGGACGAATGCAGCGCAGAGCATCCGGTATTTTATGAAGATGAAGTGGATATCCACCTCAATCCAAAAATCGGTGCGGACTGGCAGATGCGCGGGCAGCAAAAACGCGTGGTGACACCGGGGCAGAATGAAAAATACTATCTGGCCGGGGCGTTACACAGCGGAACGGGTAAAGTCAGCTATGCGGGCGGTAACAGCAAAAGTTCGGCGCTGTTCATCAGCCTGCTGAAGCGGCTTAAAGCGACGTACCGGCGGGCGAAAACCATCACGCTGATCGTGGACAACTACATTATCCACAAAAGCCGTGAAACACAGCGCTGGCTGAAGGAGAACCCGAAGTTCAGGGTCATTTACCAGCCGGTTTACTCGCCATGGGTGAATCACGTTGAACGCCTGTGGCAGGCGCTTCACGACACAATCACGCGCAATCATCAGTGCCGCTCAATGTGGCAATTGTTGAAAAAAGTTCGCCATTTTATGGAAACCGTCAGCCCGTTTCCCGGAGGAAAGCATGGTCAGGCAAAAGTGTAG,,IS630 family transposase ISEc33
MYSP-0270,MYSP-0270-1_arrow,6_0_909,MYSP-0270_00970,MPIIAPIPRGERRLMQKAIHKTRDKNHARRLTAMLMLHRGERVSNVARTLCCARSSVGRWINWFTLSGVEGLKSLPAGRARRWPFEHICTLLRELVKHSPGDFGYQRSRWSTELLAIKINEITGCQLHAGTVRRWLPSAGLVWRRAAPTLRIRDPHKDEKMAVIRKALDECSAEHPVFYEDEVDIHLNPKIGADWQMRGQQKRVVTPGQNEKYYLAGALHSGTGKVSYAGGNSKSSALFISLLKRLKATYRRAKTITLIVDNYIIHKSRETQRWLKENPKFRVIYQPVYSPWVNHVERLWQALHDTITRNHQCRSMWQLLKKVRHFMETVSPFPGGKHGQAKV,ATGCCGATCATAGCACCTATTCCCCGTGGCGAAAGACGCCTGATGCAGAAAGCTATCCATAAAACGCGTGATAAAAATCATGCCCGCAGGCTCACCGCTATGCTGATGCTTCATCGGGGTGAGCGGGTCAGCAATGTCGCCAGAACGCTCTGCTGTGCCCGTTCATCCGTCGGACGCTGGATTAACTGGTTTACGCTGTCGGGTGTTGAAGGGCTGAAGTCATTACCCGCCGGGCGTGCCCGCCGTTGGCCGTTTGAGCATATCTGCACGCTGCTACGCGAGCTGGTAAAACATTCTCCCGGCGATTTTGGCTATCAGCGTTCACGCTGGAGTACAGAACTTCTGGCGATAAAAATCAATGAGATAACGGGATGCCAGTTGCATGCCGGAACCGTGCGCCGCTGGTTGCCGTCTGCGGGGCTTGTCTGGCGAAGGGCCGCGCCAACCCTGCGTATCCGCGACCCGCATAAAGATGAAAAAATGGCGGTAATCCGCAAAGCGCTGGACGAATGCAGCGCAGAGCATCCGGTATTTTATGAAGATGAAGTGGATATCCACCTCAATCCAAAAATCGGTGCGGACTGGCAGATGCGCGGGCAGCAAAAACGCGTGGTGACACCGGGGCAGAATGAAAAATACTATCTGGCCGGGGCGTTACACAGCGGAACGGGTAAAGTCAGCTATGCGGGCGGTAACAGCAAAAGTTCGGCGCTGTTCATCAGCCTGCTGAAGCGGCTTAAAGCGACGTACCGGCGGGCGAAAACCATCACGCTGATCGTGGACAACTACATTATCCACAAAAGCCGTGAAACACAGCGCTGGCTGAAGGAGAACCCGAAGTTCAGGGTCATTTACCAGCCGGTTTACTCGCCATGGGTGAATCACGTTGAACGCCTGTGGCAGGCGCTTCACGACACAATCACGCGCAATCATCAGTGCCGCTCAATGTGGCAATTGTTGAAAAAAGTTCGCCATTTTATGGAAACCGTCAGCCCGTTTCCCGGAGGAAAGCATGGTCAGGCAAAAGTGTAG,,IS630 family transposase ISEc33
MYSP-0270,MYSP-0270-1_arrow,6_0_981,MYSP-0270_01056,MSQPFDFDKALKALQDGQALTGKDGILTPLIKQLTEAALAAELDSHLAQDIAANRKNGSSKKTLKTPTGAFELATPRDRNGSFEPQLVKKHQTTLSDEIERKIIRMFALGMSYKDISQEIEDLYAFSVSSATISAVTDKVIPELKLWQQRPLEAVYPFVWLDAIHYKIREDGRYQSKAVYTVLALNLEGKKEILGLYLSESEGANFWLSVLTDLQNRGVNDILIACVDGLTGFPEAINSIYPDTEVQLCVIHQIRNSIKYVASKHHKAFMTDLKPVYRAVSKEAAEVALDELEEKWGQQYPVVIQSWRRKWENLSHYFRYPATIRKVIYTTNAIESVHRQFRKLTKTKGAFPNENSLLKLLYLGLMNAQEKWTMPIQSWNLTLSQLAIYFEDRLDKVITL,ATGTCCCAACCCTTCGATTTCGATAAAGCCCTGAAAGCACTTCAGGATGGTCAGGCGCTGACTGGCAAAGATGGCATCTTAACGCCGTTAATCAAACAGTTAACTGAGGCTGCGCTGGCTGCTGAGCTGGACTCTCATCTGGCTCAGGATATTGCTGCTAACCGGAAAAATGGTTCGTCCAAAAAAACGCTCAAAACGCCAACCGGTGCCTTTGAACTCGCCACGCCCCGCGATCGTAACGGCTCTTTTGAGCCTCAATTGGTCAAAAAGCATCAGACCACGCTTTCTGATGAGATTGAGCGCAAGATCATCCGCATGTTCGCGCTGGGCATGAGTTACAAGGATATCAGCCAGGAAATTGAAGACCTGTATGCTTTCAGTGTTTCCAGCGCCACGATCAGTGCCGTCACCGATAAAGTTATTCCTGAACTGAAACTGTGGCAGCAACGCCCTCTTGAAGCGGTTTATCCCTTTGTCTGGCTGGATGCCATTCATTATAAAATCCGCGAAGACGGGCGTTATCAGAGCAAAGCCGTGTACACCGTTCTAGCACTGAATCTCGAAGGCAAGAAAGAGATCCTGGGCCTGTATCTGTCTGAAAGCGAAGGGGCTAACTTCTGGCTGTCGGTGCTGACCGATCTACAAAACCGCGGCGTGAACGACATTCTGATTGCCTGTGTGGATGGTCTGACAGGGTTCCCGGAAGCGATAAACAGCATTTACCCGGATACCGAAGTCCAGCTCTGTGTTATCCATCAGATCCGAAACTCGATTAAATATGTCGCCTCAAAGCACCATAAGGCGTTCATGACCGACCTGAAGCCAGTTTATCGTGCAGTCTCGAAAGAGGCGGCAGAGGTGGCGCTGGATGAACTGGAGGAGAAATGGGGCCAGCAGTACCCGGTGGTTATTCAGTCATGGCGGAGAAAATGGGAAAATCTGTCCCATTACTTCCGGTATCCGGCGACGATCCGTAAGGTAATTTACACCACAAACGCCATTGAATCAGTGCACCGTCAGTTCAGAAAGCTGACGAAAACGAAAGGTGCATTCCCGAATGAAAACAGTCTGTTGAAGCTACTTTATCTGGGGTTAATGAATGCCCAGGAAAAATGGACAATGCCAATACAAAGCTGGAATTTGACATTGTCACAGCTGGCGATTTATTTTGAAGACCGCCTTGATAAAGTGATTACGTTGTAA,,IS256 family transposase ISSod4
MYSP-0270,MYSP-0270-1_arrow,6_0_1648,MYSP-0270_01753,MPIIAPIPRGERRLMQKAIHKTRDKNHARRLTAMLMLHRGERVSNVARTLCCARSSVGRWINWFTLSGVEGLKSLPAGRARRWPFEHICTLLRELVKHSPGDFGYQRSRWSTELLAIKINEITGCQLHAGTVRRWLPSAGLVWRRAAPTLRIRDPHKDEKMAVIRKALDECSAEHPVFYEDEVDIHLNPKIGADWQMRGQQKRVVTPGQNEKYYLAGALHSGTGKVSYAGGNSKSSALFISLLKRLKATYRRAKTITLIVDNYIIHKSRETQRWLKENPKFRVIYQPVYSPWVNHVERLWQALHDTITRNHQCRSMWQLLKKVRHFMETVSPFPGGKHGQAKV,ATGCCGATCATAGCACCTATTCCCCGTGGCGAAAGACGCCTGATGCAGAAAGCTATCCATAAAACGCGTGATAAAAATCATGCCCGCAGGCTCACCGCTATGCTGATGCTTCATCGGGGTGAGCGGGTCAGCAATGTCGCCAGAACGCTCTGCTGTGCCCGTTCATCCGTCGGACGCTGGATTAACTGGTTTACGCTGTCGGGTGTTGAAGGGCTGAAGTCATTACCCGCCGGGCGTGCCCGCCGTTGGCCGTTTGAGCATATCTGCACGCTGCTACGCGAGCTGGTAAAACATTCTCCCGGCGATTTTGGCTATCAGCGTTCACGCTGGAGTACAGAACTTCTGGCGATAAAAATCAATGAGATAACGGGATGCCAGTTGCATGCCGGAACCGTGCGCCGCTGGTTGCCGTCTGCGGGGCTTGTCTGGCGAAGGGCCGCGCCAACCCTGCGTATCCGCGACCCGCATAAAGATGAAAAAATGGCGGTAATCCGCAAAGCGCTGGACGAATGCAGCGCAGAGCATCCGGTATTTTATGAAGATGAAGTGGATATCCACCTCAATCCAAAAATCGGTGCGGACTGGCAGATGCGCGGGCAGCAAAAACGCGTGGTGACACCGGGGCAGAATGAAAAATACTATCTGGCCGGGGCGTTACACAGCGGAACGGGTAAAGTCAGCTATGCGGGCGGTAACAGCAAAAGTTCGGCGCTGTTCATCAGCCTGCTGAAGCGGCTTAAAGCGACGTACCGGCGGGCGAAAACCATCACGCTGATCGTGGACAACTACATTATCCACAAAAGCCGTGAAACACAGCGCTGGCTGAAGGAGAACCCGAAGTTCAGGGTCATTTACCAGCCGGTTTACTCGCCATGGGTGAATCACGTTGAACGCCTGTGGCAGGCGCTTCACGACACAATCACGCGCAATCATCAGTGCCGCTCAATGTGGCAATTGTTGAAAAAAGTTCGCCATTTTATGGAAACCGTCAGCCCGTTTCCCGGAGGAAAGCATGGTCAGGCAAAAGTGTAG,,IS630 family transposase ISEc33
MYSP-0270,MYSP-0270-1_arrow,6_0_1806,MYSP-0270_01918,MPIIAPIPRGERRLMQKAIHKTRDKNHARRLTAMLMLHRGERVSNVARTLCCARSSVGRWINWFTLSGVEGLKSLPAGRARRWPFEHICTLLRELVKHSPGDFGYQRSRWSTELLAIKINEITGCQLHAGTVRRWLPSAGLVWRRAAPTLRIRDPHKDEKMAVIRKALDECSAEHPVFYEDEVDIHLNPKIGADWQMRGQQKRVVTPGQNEKYYLAGALHSGTGKVSYAGGNSKSSALFISLLKRLKATYRRAKTITLIVDNYIIHKSRETQRWLKENPKFRVIYQPVYSPWVNHVERLWQALHDTITRNHQCRSMWQLLKKVRHFMETVSPFPGGKHGQAKV,ATGCCGATCATAGCACCTATTCCCCGTGGCGAAAGACGCCTGATGCAGAAAGCTATCCATAAAACGCGTGATAAAAATCATGCCCGCAGGCTCACCGCTATGCTGATGCTTCATCGGGGTGAGCGGGTCAGCAATGTCGCCAGAACGCTCTGCTGTGCCCGTTCATCCGTCGGACGCTGGATTAACTGGTTTACGCTGTCGGGTGTTGAAGGGCTGAAGTCATTACCCGCCGGGCGTGCCCGCCGTTGGCCGTTTGAGCATATCTGCACGCTGCTACGCGAGCTGGTAAAACATTCTCCCGGCGATTTTGGCTATCAGCGTTCACGCTGGAGTACAGAACTTCTGGCGATAAAAATCAATGAGATAACGGGATGCCAGTTGCATGCCGGAACCGTGCGCCGCTGGTTGCCGTCTGCGGGGCTTGTCTGGCGAAGGGCCGCGCCAACCCTGCGTATCCGCGACCCGCATAAAGATGAAAAAATGGCGGTAATCCGCAAAGCGCTGGACGAATGCAGCGCAGAGCATCCGGTATTTTATGAAGATGAAGTGGATATCCACCTCAATCCAAAAATCGGTGCGGACTGGCAGATGCGCGGGCAGCAAAAACGCGTGGTGACACCGGGGCAGAATGAAAAATACTATCTGGCCGGGGCGTTACACAGCGGAACGGGTAAAGTCAGCTATGCGGGCGGTAACAGCAAAAGTTCGGCGCTGTTCATCAGCCTGCTGAAGCGGCTTAAAGCGACGTACCGGCGGGCGAAAACCATCACGCTGATCGTGGACAACTACATTATCCACAAAAGCCGTGAAACACAGCGCTGGCTGAAGGAGAACCCGAAGTTCAGGGTCATTTACCAGCCGGTTTACTCGCCATGGGTGAATCACGTTGAACGCCTGTGGCAGGCGCTTCACGACACAATCACGCGCAATCATCAGTGCCGCTCAATGTGGCAATTGTTGAAAAAAGTTCGCCATTTTATGGAAACCGTCAGCCCGTTTCCCGGAGGAAAGCATGGTCAGGCAAAAGTGTAG,,IS630 family transposase ISEc33
MYSP-0270,MYSP-0270-1_arrow,6_0_1869,MYSP-0270_01983,MSQPFDFDKALKALQDGQALTGKDGILTPLIKQLTEAALAAELDSHLAQDIAANRKNGSSKKTLKTPTGAFELATPRDRNGSFEPQLVKKHQTTLSDEIERKIIRMFALGMSYKDISQEIEDLYAFSVSSATISAVTDKVIPELKLWQQRPLEAVYPFVWLDAIHYKIREDGRYQSKAVYTVLALNLEGKKEILGLYLSESEGANFWLSVLTDLQNRGVNDILIACVDGLTGFPEAINSIYPDTEVQLCVIHQIRNSIKYVASKHHKAFMTDLKPVYRAVSKEAAEVALDELEEKWGQQYPVVIQSWRRKWENLSHYFRYPATIRKVIYTTNAIESVHRQFRKLTKTKGAFPNENSLLKLLYLGLMNAQEKWTMPIQSWNLTLSQLAIYFEGRLDKVITL,ATGTCCCAACCCTTCGATTTCGATAAAGCCCTGAAAGCACTTCAGGATGGTCAGGCGCTGACTGGCAAAGATGGCATCTTAACGCCGTTAATCAAACAGTTAACTGAGGCTGCGCTGGCTGCTGAGCTGGACTCTCATCTGGCTCAGGATATTGCTGCTAACCGGAAAAATGGTTCGTCCAAAAAAACCCTCAAAACGCCAACCGGTGCCTTTGAACTCGCCACGCCCCGCGATCGTAACGGCTCTTTTGAGCCTCAATTGGTCAAAAAGCATCAGACCACGCTTTCTGATGAGATTGAGCGCAAGATCATCCGCATGTTCGCGCTGGGCATGAGTTACAAGGATATCAGCCAGGAAATTGAAGACCTGTATGCTTTCAGTGTTTCCAGCGCCACGATCAGTGCCGTCACCGATAAAGTTATTCCTGAACTGAAACTGTGGCAGCAACGCCCTCTTGAAGCGGTTTATCCCTTTGTCTGGCTGGATGCCATTCATTATAAAATCCGCGAAGACGGGCGTTATCAGAGCAAAGCCGTGTACACCGTTCTAGCACTGAATCTCGAAGGCAAGAAAGAGATCCTGGGCCTGTATCTGTCTGAAAGCGAAGGGGCTAACTTCTGGCTGTCGGTGCTGACCGATCTACAAAACCGCGGCGTGAACGACATTCTGATTGCCTGTGTGGATGGTCTGACAGGGTTCCCGGAAGCGATAAACAGCATTTACCCGGATACCGAAGTCCAGCTCTGTGTTATCCATCAGATCCGAAACTCGATTAAATATGTCGCCTCAAAGCACCATAAGGCGTTCATGACCGACCTGAAGCCAGTTTATCGTGCAGTCTCGAAAGAGGCGGCAGAGGTGGCGCTGGATGAACTGGAGGAGAAATGGGGCCAGCAGTACCCGGTGGTTATTCAGTCATGGCGGAGAAAATGGGAAAATCTGTCCCATTACTTCCGGTATCCGGCGACGATCCGTAAGGTAATTTACACCACAAACGCCATTGAATCAGTGCACCGTCAGTTCAGAAAGCTGACGAAAACGAAAGGTGCATTCCCGAATGAAAACAGTCTGTTGAAGCTACTTTATCTGGGGTTAATGAATGCCCAGGAAAAATGGACAATGCCAATACAAAGCTGGAATTTGACATTGTCACAGCTGGCGATTTATTTTGAAGGCCGCCTTGATAAAGTGATTACGTTGTAA,,IS256 family transposase ISSod4
MYSP-0270,MYSP-0270-1_arrow,6_0_2007,MYSP-0270_02121,MKQYVKRTQRDYSLSFKLAVVEQVEKGEMTYRQAQERYGIQGCSTVLNWLRKYGQLDWHSSAQRSTRGGLMTKSLPLTPEQRIKELEQQLAESEVKAQFFEAVVKVMNTEFGATLTKKQLASLSRRQKHPD,ATGAAACAGTATGTTAAACGTACACAACGGGACTACTCTCTGTCCTTTAAACTGGCCGTCGTTGAGCAGGTTGAAAAAGGTGAGATGACATATCGTCAGGCTCAGGAGCGCTACGGTATTCAGGGGTGCTCCACCGTTCTGAACTGGCTGCGTAAGTATGGCCAGCTGGACTGGCACTCTTCAGCGCAGCGCAGCACCCGTGGAGGACTCATGACAAAATCCCTTCCCCTTACCCCCGAACAGCGAATCAAAGAGCTTGAGCAGCAGCTGGCTGAGTCCGAAGTTAAGGCACAGTTCTTCGAGGCCGTCGTGAAGGTCATGAACACTGAGTTCGGCGCCACGCTGACAAAAAAGCAGTTAGCTTCCTTATCGCGCAGACAGAAACACCCGGACTGA,,IS3 family transposase ISVisp4
MYSP-0270,MYSP-0270-1_arrow,6_0_2205,MYSP-0270_02325,MSQPFDFDKALKALQDGQALTGKDGILTPLIKQLTEAALAAELDSHLAQDIAANRKNGSSKKTLKTPTGAFELATPRDRNGSFEPQLVKKHQTTLSDEIERKIIRMFALGMSYKDISQEIEDLYAFSVSSATISAVTDKVIPELKLWQQRPLEAVYPFVWLDAIHYKIREDGRYQSKAVYTVLALNLEGKKEILGLYLSESEGANFWLSVLTDLQNRGVNDILIACVDGLTGFPEAINSIYPDTEVQLCVIHQIRNSIKYVASKHHKAFMTDLKPVYRAVSKEAAEVALDELEEKWGQQYPMVIQSWRRKWENLSHYFRYPATIRKVIYTTNAIESVHRQFRKLTKTKGAFPNENSLLKLLYLGLMNAQEKWTMPIQSWNLTLSQLAIYFEGRLDKVITL,ATGTCCCAACCCTTCGATTTCGATAAAGCCCTGAAAGCACTTCAGGATGGTCAGGCGCTGACTGGCAAAGATGGCATCTTAACGCCGTTAATCAAACAGTTAACTGAGGCTGCGCTGGCTGCTGAGCTGGACTCTCATCTGGCTCAGGATATTGCTGCTAACCGGAAAAATGGTTCGTCCAAAAAAACGCTCAAAACGCCAACCGGTGCCTTTGAACTCGCCACGCCCCGCGATCGTAACGGCTCTTTTGAGCCTCAATTGGTCAAAAAGCATCAGACCACGCTTTCTGATGAGATTGAGCGCAAGATCATCCGCATGTTCGCGCTGGGCATGAGTTACAAGGATATCAGCCAGGAAATTGAAGACCTGTATGCTTTCAGTGTTTCCAGCGCCACGATCAGTGCCGTCACCGATAAAGTTATTCCTGAACTGAAACTGTGGCAGCAACGCCCTCTTGAAGCGGTTTATCCCTTTGTCTGGCTGGATGCCATTCATTATAAAATCCGCGAAGACGGGCGTTATCAGAGCAAAGCCGTGTACACCGTTCTAGCACTGAATCTCGAAGGCAAGAAAGAGATCCTGGGCCTGTATCTGTCTGAAAGCGAAGGGGCTAACTTCTGGCTGTCGGTGCTGACCGATCTACAAAACCGCGGCGTGAACGACATTCTGATTGCCTGTGTGGATGGTCTGACAGGGTTCCCGGAAGCGATAAACAGCATTTACCCGGATACCGAAGTCCAGCTCTGTGTTATCCATCAGATCCGAAACTCGATTAAATATGTCGCCTCAAAGCACCATAAGGCGTTCATGACCGACCTGAAGCCAGTTTATCGTGCAGTCTCGAAAGAGGCGGCAGAGGTGGCGCTGGATGAACTGGAGGAGAAATGGGGCCAGCAGTACCCGATGGTTATTCAGTCATGGCGGAGAAAATGGGAAAATCTGTCCCATTACTTCCGGTATCCGGCGACGATCCGTAAGGTAATTTACACCACAAACGCCATTGAATCAGTGCACCGTCAGTTCAGAAAGCTGACGAAAACGAAAGGTGCATTCCCGAATGAAAACAGTCTGTTGAAGCTACTTTATCTGGGGTTAATGAATGCCCAGGAAAAATGGACAATGCCAATACAAAGCTGGAATTTGACATTGTCACAGCTGGCGATTTATTTTGAAGGCCGCCTTGATAAAGTGATTACGTTGTAA,,IS256 family transposase ISSod4
MYSP-0270,MYSP-0270-1_arrow,6_0_2280,MYSP-0270_02400,MPYRRPPSNRKPSPNRLLTVDDIYRQPVGPATHPKSLYALLLRFVKWRRERNWSETTLKTQTHHSYRFICWAAERGIHYAAEVTRPVLESWQRHLYQYRKANGEALTSRTQRTALQPLQVWFSWMAKQGLILANPAADLELPRLEKRLPRTILSVEQVEDIVNLCDLTTLQGIRDRALLELLWSTGIRRGEVAGLEIYSVDFSRQILTIVQGKGKEDRVIPAGERALWWLKRYIVHVRPEILAVPDCKALFLAMDGVAGLTASGITNAVVPYLRESGIDKGSCHLFRHAMATQMLENGADLRWIQAMLGHRSVESTQIYTQVSIRALQAVHASTHPAEQREKPEPDAAAEPPDGPLS,ATGCCTTACCGCAGACCGCCGTCAAACCGCAAACCTTCCCCGAACCGTCTGCTTACCGTTGACGACATCTACCGCCAGCCGGTCGGCCCGGCCACCCATCCGAAAAGCCTGTACGCGCTGCTGCTGCGGTTCGTGAAGTGGCGGCGGGAGCGCAACTGGTCGGAGACCACACTGAAGACCCAGACACATCACAGCTACCGCTTTATCTGCTGGGCGGCTGAACGGGGAATACACTATGCGGCGGAGGTGACAAGGCCGGTGCTGGAGAGCTGGCAGCGGCATCTGTACCAGTACCGCAAGGCAAACGGTGAAGCACTGACCAGCCGGACGCAGCGCACGGCGTTACAGCCGCTTCAGGTGTGGTTCTCGTGGATGGCGAAACAGGGGCTGATACTGGCGAATCCGGCGGCAGACCTGGAGCTGCCGAGGCTGGAGAAGCGTCTGCCGCGCACGATACTGAGCGTGGAGCAGGTGGAGGACATCGTGAACCTGTGCGACCTCACCACGCTTCAGGGTATCCGTGACCGGGCGCTGCTGGAACTGCTGTGGTCAACGGGCATCCGTCGCGGCGAGGTGGCCGGGCTTGAGATATACAGCGTGGACTTCTCCCGGCAGATACTGACCATCGTGCAGGGCAAGGGAAAGGAGGACCGGGTGATACCGGCAGGTGAGCGGGCGCTGTGGTGGCTGAAGCGCTACATCGTTCACGTCAGACCAGAAATCCTCGCGGTGCCTGACTGTAAGGCGCTGTTCCTGGCGATGGACGGCGTGGCAGGGCTGACGGCCAGCGGCATTACGAACGCGGTGGTGCCGTACCTGAGAGAGTCGGGCATCGACAAAGGGAGCTGCCACCTGTTCCGGCATGCCATGGCGACGCAGATGCTGGAGAACGGCGCAGACCTGCGGTGGATACAGGCGATGCTCGGTCACCGGAGCGTGGAGAGCACGCAGATATATACGCAGGTGAGTATCCGGGCGTTGCAGGCGGTCCATGCCAGTACGCATCCGGCAGAGCAGCGGGAGAAGCCGGAGCCGGACGCCGCAGCGGAGCCGCCGGACGGGCCGCTGAGTTAG,xerC_3,Tyrosine recombinase XerC
MYSP-0270,MYSP-0270-1_arrow,6_0_2281,MYSP-0270_02401,MNNGRVTPAELEQLKRDVSLAAVAKSQNRVLTKQGKDFAVLCPFHAEKTPSCVISPAKNLYHCFGCNAGGSVLDWLQHTENLTYAQTLVRLRELAGCSTLRVVSQNQPSSSAAAVPASSSPPARQTLTDLDDDGQALLHQVADWYHQNLLNSPETLTWLEKRGLTHPELVSHFRLGFAGPHGVAGALPSPSSKEGKALRSRLTALGVIRESNRQDHFRGCLTVPVTGWTESYDPASRGRVLQLYGRRTMADHQVKKGSAKHLYLPSPLCGIWNEAALAAASEVILCEALIDAMTFWCAGFRNVIAAFGVHGFTPGHLAALQYHGVKRVLIAFDRDEAGDRGADAVAGQLAGAGIDAWRVRFPAGLDANAYALKSGNAESALTLALEQAVRLSGPVQAVSGSDAGAASQTGAVRSESSSSSAAFPASQSAHQPAETLACEVTLSGELLLRSGPRIWRVRGWQKNQLPEVMKVNVRVLDESSGAFHTDQLDMYHAKQRQAYVSTAANELACDSAVIKREAGRVLLALEGKQDERQRAAEQESAASAVALSTDEEAAALALLKSPDLAERIVADLAACGVVGESSNLLTGYLAATSRKLDKPLAVLIQSSSAAGKSSLMDAVLGLIPEEERVQYSAMTGQSLYYLGETSLQHKILAIAEEEGVRQAAYALKLLQSDGELKIASTGKDEQSGELVTREYKVQGPVMLMLTTTASDVDEELLNRCLVLTVNESREQTQAIHAMQRRAQTLEGLLAQSEKGYLTRLHQNAQRLLRPLKVVNPYAERLTFLSDKTRTRRDHMKYLTLIQAVALLHQHQREVKRAEHRGQVLEYIEVQPSDIALANKLAHEVLGRTLDEMPPQTRKLLLLLKEMVGGLAESQNCQPSEVRFSRRDIRERLHWSDSQLKHHCLRLAEMEYLLVHGGSRGHLLQYQLLWDGGDGEEAHLCGLLNVDENASGDEGGNRKFGSEDSRSALSSGQVRGKFGQEKVASGQAAEGLQAGVVRVDENAVIREKKKTVLPPSPSLSQPSSS,ATGAACAACGGAAGAGTAACCCCGGCAGAGCTGGAACAGCTGAAGCGTGATGTGTCTCTGGCTGCGGTGGCGAAGTCGCAGAATCGCGTGCTGACGAAGCAGGGAAAAGACTTCGCTGTTCTCTGCCCGTTCCACGCTGAAAAGACGCCTTCCTGCGTTATCTCTCCGGCTAAAAACCTCTATCACTGCTTCGGCTGCAACGCGGGCGGGTCGGTGCTGGACTGGCTGCAACACACTGAAAACCTGACTTACGCGCAGACGCTGGTTCGCCTTCGTGAGCTGGCCGGATGTTCAACTTTGCGAGTTGTCTCGCAGAATCAGCCTTCCTCTTCAGCCGCTGCCGTTCCCGCCTCTTCTTCGCCGCCTGCCCGTCAGACGCTCACCGATCTGGACGATGACGGTCAGGCGCTGCTGCATCAGGTCGCGGACTGGTATCACCAGAACCTGCTGAACTCACCGGAAACCCTGACCTGGCTGGAAAAACGCGGCCTGACGCATCCTGAACTGGTGAGTCACTTCCGGCTGGGGTTCGCCGGGCCGCACGGTGTGGCGGGTGCGCTGCCGTCGCCGTCCAGCAAAGAGGGTAAAGCGCTGCGTTCGCGGCTGACTGCCCTCGGCGTGATACGCGAAAGCAACCGGCAGGATCACTTCCGGGGCTGCCTGACGGTGCCGGTTACGGGCTGGACTGAGAGTTACGATCCGGCGTCGCGTGGTCGGGTGCTCCAGCTGTACGGGCGACGGACGATGGCGGATCATCAGGTTAAAAAAGGCTCGGCAAAACACCTCTATCTGCCGTCGCCGCTGTGCGGGATCTGGAATGAAGCGGCGCTGGCCGCCGCCTCTGAAGTCATCCTGTGCGAAGCGCTGATCGATGCCATGACCTTCTGGTGCGCCGGGTTCCGTAACGTGATCGCGGCGTTCGGGGTACACGGCTTTACGCCGGGCCATCTGGCGGCGCTGCAGTATCATGGCGTAAAGCGGGTGCTGATCGCCTTCGATCGGGACGAGGCCGGGGATCGGGGTGCGGACGCGGTGGCCGGTCAGCTTGCCGGAGCCGGGATCGATGCCTGGCGGGTGCGGTTCCCGGCGGGCCTGGATGCAAACGCCTATGCGCTGAAAAGCGGCAACGCTGAATCGGCGCTGACGCTGGCCCTTGAGCAGGCGGTGCGGCTGTCCGGGCCGGTTCAGGCCGTGTCCGGCAGCGACGCCGGAGCCGCGTCTCAAACCGGTGCTGTCCGCAGTGAATCATCTTCCTCTTCAGCCGCCTTCCCGGCCTCACAGTCTGCGCATCAGCCCGCTGAAACTCTCGCCTGTGAGGTGACGCTCTCCGGTGAGCTGCTGCTGCGTTCCGGGCCGCGCATCTGGCGGGTGCGGGGCTGGCAGAAGAATCAGCTGCCTGAAGTGATGAAGGTCAACGTGCGGGTGCTGGATGAGTCAAGCGGTGCGTTCCACACCGACCAGCTGGACATGTACCACGCGAAGCAGCGGCAGGCTTACGTGAGCACGGCGGCAAACGAGCTGGCGTGTGACAGCGCAGTGATAAAGCGCGAGGCGGGCCGGGTGCTGCTGGCTCTCGAAGGTAAGCAGGACGAGCGGCAGCGGGCTGCAGAGCAGGAAAGCGCCGCCTCAGCGGTGGCGCTGAGCACGGACGAGGAAGCGGCGGCGCTGGCGCTGCTGAAATCCCCGGACCTGGCAGAGCGCATCGTGGCAGACCTTGCGGCGTGCGGTGTAGTCGGCGAGTCGTCAAACCTGCTGACCGGGTATCTGGCGGCGACGTCGCGCAAGCTGGATAAGCCGTTAGCGGTACTGATACAGAGCAGCAGCGCGGCGGGGAAGTCATCGCTGATGGACGCGGTGCTGGGTCTGATACCTGAGGAGGAGCGGGTGCAGTACAGCGCGATGACCGGGCAGAGCCTGTACTACCTGGGGGAGACCTCGCTGCAACATAAAATCCTCGCCATCGCGGAGGAGGAAGGGGTACGTCAGGCGGCGTATGCGCTGAAGCTGTTGCAGAGTGACGGGGAGCTGAAAATCGCCTCAACGGGCAAGGACGAGCAGTCGGGTGAGCTGGTGACGCGGGAGTACAAAGTCCAGGGGCCGGTGATGCTGATGTTAACCACCACGGCGTCGGACGTTGACGAGGAGCTGCTGAACCGCTGCCTGGTGCTGACGGTAAATGAGTCGCGGGAGCAGACGCAGGCGATACATGCGATGCAGCGCCGGGCGCAGACGCTGGAAGGGCTGCTGGCGCAGTCGGAAAAGGGTTATCTGACGCGCCTGCACCAGAACGCGCAGCGGCTGCTGCGGCCGCTGAAGGTGGTGAATCCTTACGCTGAGCGGCTGACGTTCCTGAGCGACAAGACCCGGACGCGGCGCGACCATATGAAGTACCTGACGCTTATCCAGGCAGTGGCGCTGCTGCATCAGCATCAGCGGGAGGTAAAACGGGCTGAGCATCGCGGGCAGGTGCTGGAGTATATCGAGGTGCAGCCGTCCGATATCGCGCTGGCGAATAAGCTGGCGCATGAAGTGCTGGGCCGGACGCTGGATGAGATGCCGCCGCAGACCAGGAAGCTGCTGTTACTGCTGAAGGAGATGGTCGGCGGGCTGGCGGAGTCGCAGAACTGCCAGCCCTCAGAGGTGCGGTTCTCACGGCGGGACATCCGCGAACGACTGCACTGGAGCGACAGCCAGCTGAAGCACCACTGCCTGCGGCTGGCGGAGATGGAATATCTGCTGGTCCACGGCGGAAGCCGTGGGCATCTGTTGCAGTATCAGCTGTTATGGGACGGCGGCGACGGAGAAGAAGCGCACCTGTGCGGGCTGCTGAACGTGGATGAAAACGCGAGTGGTGACGAGGGCGGCAACCGTAAGTTCGGGTCTGAGGATAGCAGGTCTGCCTTAAGTTCGGGTCAGGTTCGGGGTAAGTTCGGGCAGGAAAAAGTGGCGTCAGGTCAGGCGGCAGAAGGCTTACAGGCCGGAGTGGTTCGGGTTGATGAAAACGCAGTAATAAGAGAAAAAAAGAAAACGGTGCTGCCACCTTCGCCTTCGTTATCACAACCCTCTTCATCTTAG,dnaG_3,DNA primase
MYSP-0270,MYSP-0270-1_arrow,6_0_2282,MYSP-0270_02402,MNAINSLCAGMDMSAFAERLRLLREARSLSQVRLSELLGVDPRAYNRWEKGATAPHLETVIKIADVLQVTLDELTGRKAVSEEVKIRNHTLHALWQKADLLPDSDQQALIAVLDSFVKKSMVEQAIGFNSRR,ATGAACGCCATTAATTCATTGTGTGCAGGTATGGATATGTCAGCATTTGCAGAACGTCTTCGCTTGTTGCGTGAGGCCAGAAGTTTGAGTCAGGTTCGACTGTCCGAACTGCTGGGCGTTGATCCACGCGCCTATAACCGCTGGGAAAAAGGTGCAACAGCGCCTCACCTGGAGACGGTGATTAAGATTGCCGATGTGTTACAGGTGACGCTGGATGAACTGACGGGCAGGAAGGCCGTTTCGGAAGAGGTGAAGATCCGTAACCATACGCTTCATGCTCTGTGGCAGAAGGCAGATCTTTTACCGGACTCAGATCAACAGGCGCTGATCGCCGTGCTGGACAGTTTTGTTAAGAAGTCGATGGTTGAACAGGCAATAGGATTTAACAGCAGGCGTTAA,,hypothetical protein
MYSP-0270,MYSP-0270-1_arrow,6_0_2283,MYSP-0270_02403,MADTHHKSETRTPTTAASESRAHYYKVGYRPNKGQPNPLPQLTIKGRWLEALGFTTGQKIEVITGPGQLIIRLATEG,ATGGCTGACACGCATCATAAGTCAGAGACCCGCACACCCACAACCGCCGCCAGTGAATCGCGGGCGCATTATTACAAAGTGGGATACCGGCCTAATAAGGGCCAGCCGAACCCGCTGCCACAGCTCACTATCAAAGGCCGCTGGCTGGAAGCTCTGGGTTTTACCACCGGGCAGAAGATCGAGGTGATCACCGGGCCGGGACAGCTGATTATCCGGCTGGCGACTGAAGGGTAA,symE_4,Endoribonuclease SymE
MYSP-0270,MYSP-0270-1_arrow,6_0_2284,MYSP-0270_02404,MTLFEECQEALSADFEILENQEKKEAVDILNKYPFASGAISWPEIEYSDYENINDLLNVSLLKNADVFVLVDDASIPVFRTNLSLIAENIYDVTALSPKLFIFNNEIILQPLFPTEMFRLGIRSK,ATGACTTTATTTGAAGAGTGTCAAGAGGCGCTTAGTGCTGATTTCGAAATTCTTGAAAACCAAGAAAAGAAAGAAGCTGTTGATATTCTTAATAAATACCCTTTTGCAAGTGGGGCCATTTCTTGGCCCGAAATTGAATATTCGGATTATGAAAATATTAATGATTTATTGAATGTAAGTCTCCTGAAAAATGCTGATGTATTCGTTCTTGTAGATGATGCCAGCATTCCTGTTTTCAGGACAAACTTGAGTTTGATTGCTGAAAATATTTATGATGTTACAGCTTTATCACCGAAGTTATTTATTTTTAATAATGAAATTATACTACAACCTTTATTCCCGACAGAAATGTTCCGTTTGGGAATAAGATCTAAATAA,,hypothetical protein
MYSP-0270,MYSP-0270-1_arrow,6_0_2570,MYSP-0270_02698,MPIIAPIPRGERRLMQKAIHKTRDKNHARRLTAMLMLHRGERVSNVARTLCCARSSVGRWINWFTLSGVEGLKSLPAGRARRWPFEHICTLLRELVKHSPGDFGYQRSRWSTELLAIKINEITGCQLHAGTVRRWLPSAGLVWRRAAPTLRIRDPHKDEKMAVIRKALDECSAEHPVFYEDEVDIHLNPKIGADWQMRGQQKRVVTPGQNEKYYLAGALHSGTGKVSYAGGNSKSSALFISLLKRLKATYRRAKTITLIVDNYIIHKSRETQRWLKENPKFRVIYQPVYSPWVNHVERLWQALHDTITRNHQCRSMWQLLKKVRHFMETVSPFPGGKHGQAKV,ATGCCGATCATAGCACCTATTCCCCGTGGCGAAAGACGCCTGATGCAGAAAGCTATCCATAAAACGCGTGATAAAAATCATGCCCGCAGGCTCACCGCTATGCTGATGCTTCATCGGGGTGAGCGGGTCAGCAATGTCGCCAGAACGCTCTGCTGTGCCCGTTCATCCGTCGGACGCTGGATTAACTGGTTTACGCTGTCGGGTGTTGAAGGGCTGAAGTCATTACCCGCCGGGCGTGCCCGCCGTTGGCCGTTTGAGCATATCTGCACGCTGCTACGCGAGCTGGTAAAACATTCTCCCGGCGATTTTGGCTATCAGCGTTCACGCTGGAGTACAGAACTTCTGGCGATAAAAATCAATGAGATAACGGGATGCCAGTTGCATGCCGGAACCGTGCGCCGCTGGTTGCCGTCTGCGGGGCTTGTCTGGCGAAGGGCCGCGCCAACCCTGCGTATCCGCGACCCGCATAAAGATGAAAAAATGGCGGTAATCCGCAAAGCGCTGGACGAATGCAGCGCAGAGCATCCGGTATTTTATGAAGATGAAGTGGATATCCACCTCAATCCAAAAATCGGTGCGGACTGGCAGATGCGCGGGCAGCAAAAACGCGTGGTGACACCGGGGCAGAATGAAAAATACTATCTGGCCGGGGCGTTACACAGCGGAACGGGTAAAGTCAGCTATGCGGGCGGTAACAGCAAAAGTTCGGCGCTGTTCATCAGCCTGCTGAAGCGGCTTAAAGCGACGTACCGGCGGGCGAAAACCATCACGCTGATCGTGGACAACTACATTATCCACAAAAGCCGTGAAACACAGCGCTGGCTGAAGGAGAACCCGAAGTTCAGGGTCATTTACCAGCCGGTTTACTCGCCATGGGTGAATCACGTTGAACGCCTGTGGCAGGCGCTTCACGACACAATCACGCGCAATCATCAGTGCCGCTCAATGTGGCAATTGTTGAAAAAAGTTCGCCATTTTATGGAAACCGTCAGCCCGTTTCCCGGAGGAAAGCATGGTCAGGCAAAAGTGTAG,,IS630 family transposase ISEc33

spydrpick error message

Following the instruction I get the following error message:
File "/anaconda3/bin/panaroo-spydrpick", line 10, in
sys.exit(main())
File "
/anaconda3/lib/python3.7/site-packages/panaroo/spydrpick.py", line 246, in main
pa_matrix, gene_names, sample_names = read_presence_absence(args.pa_file)
File "/anaconda3/lib/python3.7/site-packages/panaroo/spydrpick.py", line 12, in read_presence_absence
matrix_txt = np.loadtxt(filename, dtype=str, delimiter=",", comments=None)
File "
/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "~anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1071, in read_data
% line_num)

Have tried using the roary output file instead -
File "/anaconda3/bin/panaroo-spydrpick", line 10, in
sys.exit(main())
File "anaconda3/lib/python3.7/site-packages/panaroo/spydrpick.py", line 259, in main
chunk_size=100)
File "/anaconda3/lib/python3.7/site-packages/panaroo/spydrpick.py", line 117, in spydrpick
mi = mi_00 * (np.log(mi_00) - np.log(mi_1) - np.log(mi_0))
ValueError: operands could not be broadcast together with shapes (100,15701) (15701,1)

Any suggestions
Thanks

get_neighborhood.py TypeError: 'int' object is not iterable

Hi - getting this error when running get_neighborhood.py:

Traceback (most recent call last):
  File "getneighbours.py", line 101, in <module>
    main()
  File "getneighbours.py", line 76, in main
    G.nodes[n]['members'] = set(G.nodes[n]['members'])
TypeError: 'int' object is not iterable

I've cloned the latest repository to make sure that wasn't the problem. Do you know what is causing this?

error when using multiple cores

site-packages/joblib/externals/loky/process_executor.py:706:
UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
"timeout or by a memory leak.", UserWarning

When I use -t option with 40 cores - I get this error prior to generating pan or core genome
(and after initial results and after processing paralogs) - resulting in not generating the aligned sequences.

Thanks

Example .gff inputs

Hi,

I was trying to run panaroo with a couple of gff files from RefSeq database, but I guess they had some problems. One was with sequencing not being divisible by 3, which I understand but the other error didn't have any other explanation that could help. It is raised at the end of parsing function, so I think it must be a problem way before that:

panaroo/panaroo/prokka.py

Lines 200 to 224 in 11442bc

try:
protienHandle = open(output_dir + "combined_protein_CDS.fasta", 'w+')
DNAhandle = open(output_dir + "combined_DNA_CDS.fasta", 'w+')
csvHandle = open(output_dir + "gene_data.csv", 'w+')
csvHandle.write(
"gff_file,scaffold_name,clustering_id,annotation_id,prot_sequence,dna_sequence,gene_name,description\n"
)
job_list = list(enumerate(gff_list))
job_list = [
job_list[i:i + n_cpu] for i in range(0, len(job_list), n_cpu)
]
for job in tqdm(job_list):
gene_sequence_list = Parallel(n_jobs=n_cpu)(
delayed(get_gene_sequences)(gff, gff_no)
for gff_no, gff in job)
for i, gene_seq in enumerate(gene_sequence_list):
output_files(gene_seq[0], gene_seq[1], protienHandle,
DNAhandle, csvHandle, job[i][1])
protienHandle.close()
DNAhandle.close()
csvHandle.close()
return True
except:
print("Error reading prokka input!")
raise RuntimeError("Error reading prokka input!")

I re-annotated the fasta files using prodigal, and panaroo seems to work fine with prodigal output. I looked though the repo and couldn't find any example gff inputs, is it possible if you could provide one? Or maybe you came across similar issues with annotations from RefSeq before? Thanks!

come up with a new name for it

  • pancita (little tummy)
  • something to do with Pan's Labyrinth
  • Maybe Panemelon (rhymes with Pademelon) to follow on from Prokka (rhymes with Quokka)
  • Pan de gnomes (Bread of gnomes)

Refound gene tags

I'm analyzing the results of panaroo right now, I have a question about the refound genes. I'm trying to map the genes in each node to their original locus tags in the gff files. Here's an example of what I have:

I have a node labeled 501, it has the geneIDs 42_1_57;60_0_13;44_1_35;19_3_38;45_1_66;40_refound_2166... and so on. So 42_1_57 means isolate 42, contig/scaffold 1 and CDS 57 (all 0-based). I see I can actually find this exact entry in gene_data.csv as KAB03,NZ_CP017647.1,42_1_57,NZ_CP017647.1_cds3782,MTGN... and easily extract all the information I need.
But with refound genes it is a little different. Say, I want to map the 40_refound_2166 to its original locus tag, how do I interpret the tag assigned by panaroo? I understand the first number is the isolate index, but I couldn't figure out the second one.

I hope I could explain the question, really appreciate if you could explain this part.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.