An improved algorithm to measure the semantic similarity of gene ontology terms
Web server is available at http://dna.cs.miami.edu/GOGO/.
Zhao, C. and Wang, Z. (2018) GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms. Scientific Reports, 8, 15107; doi:10.1038/s41598-018-33219-y.
GOGO don't need to be installed. Get familiar with the commands below (linux command),
and make sure the input file following the correct format.
Before using cluster function, an cluster tool must be compiled.
Compile affinity propagation cluster tool (Frey 2007):
gcc -o apcluster apcluster.c
The format:
Each line of the input file should contain two GO terms, which
separated by space. In the result file, GOGO appends ontology and
similarity of GO terms pair at the end of lines of the input file.
Input file:
GO:1903097 GO:0010847
GO:0045252 GO:0045240
GO:0031216 GO:0004553
Output file:
GO:1903097 GO:0010847 BPO 0.694
GO:0045252 GO:0045240 CCO 0.803
GO:0031216 GO:0004553 MFO 0.753
The command:
perl go_comb.pl <input.file> <output.file>
Example:
perl go_comb.pl ./data/eg_go_in.txt ./data/result_eg_go_in.txt
The format:
Suppose user have some gene pairs to calculate, each line of the
input file should contain two genes separated by “;”. The first field
of each gene is gene name, then followed by its GO terms. Moreover,
GOGO could identify the ontology of GO terms, and calculate similarities
of genes based on BPO, CCO and MFO, respectively. In result file, GOGO
appends ontology and similarity of gene pairs at the end of lines of
the input file.
Input:
ADH2 GO:0006067 GO:0004022 GO:0000947 GO:0006116 GO:0005737; PDC6 GO:0006067 GO:0004737 GO:0000949 GO:0006569 GO:0005737 GO:0006559
……
Output:
ADH2 GO:0006067 GO:0004022 GO:0000947 GO:0006116 GO:0005737; PDC6 GO:0006067 GO:0004737 GO:0000949 GO:0006569 GO:0005737 GO:0006559 BPO 0.612 CCO 1.000 MFO 0.030
……
The command:
perl gene_pair_comb.pl <input.file> <output.file>
Example:
perl gene_pair_comb.pl ./data/eg_gene_pair_in.txt ./data/result_eg_gene_pair_in.txt
Calculate pairwise semantic similarities between a list of genes and then classify the genes based on their GO term similarities
The format:
Each line of the input file contains only one gene, and the first
field of gene is gene name, then followed by its GO terms. In
similarity result file, GOGO will present all the possible gene pairs
of genes set uploaded by user. For each line, the first two fields
are gene pairs, then followed by ontologies and similarities. In the
cluster result file, we present the cluster of genes based on the
ontologies. Note that GOGO couldn’t get the cluster result for an
ontology, if a gene isn’t annotated with a GO term from the ontology.
Each line is a cluster of genes, and the first field of a line is
the exemplar of this cluster.
Input:
ADH4 GO:0004022 GO:0000947 GO:0005739 GO:0006113
……
ARO9 GO:0009072 GO:0005634 GO:0008793 GO:0005737
……
ADH5 GO:0005634 GO:0043458 GO:0000947 GO:0006116 GO:0005737
……
Output:
ADH4 PDC1 BPO 0.431 CCO 0.482 MFO 0.030
ADH4 ARO9 BPO 0.265 CCO 0.509 MFO 0.031
ADH4 PDC5 BPO 0.431 CCO 0.509 MFO 0.030
……
Cluster output file:
BPO:
ARO8 ARO9
PDC6 PDC1 PDC5 ARO10
ADH1 ADH4 SFA1 ADH3 ADH2 ADH5
CCO:
ADH3 ADH4
PDC6 SFA1 ARO10 ARO8 ADH2
ADH5 PDC1 ARO9 PDC5
ADH1
The command:
perl gene_list_comb.pl <input.file> <output.file> <cluster.file>
Example:
perl gene_list_comb.pl ./data/eg_gene_list_in.txt ./data/result_eg_gene_list_in.txt ./data/result_cluster_eg_gene_list_in.txt
DAG files are generated by script: dag.pl and file: go.obo:
perl dag.pl data/go.obo data/dag_BPO_ancestor.txt data/dag_BPO_children.txt BPO
perl dag.pl data/go.obo data/dag_CCO_ancestor.txt data/dag_CCO_children.txt CCO
perl dag.pl data/go.obo data/dag_MFO_ancestor.txt data/dag_MFO_children.txt MFO
User could update the annotation file, by downloading go.obo file from here
to the directory: /GOGO/data/.
Then run the command below at the directory: /GOGO/:
./dag.sh