tanghaibao / goatools Goto Github PK

View Code? Open in Web Editor NEW

747.0 23.0 212.0 58.89 MB

Python library to handle Gene Ontology (GO) terms

License: BSD 2-Clause "Simplified" License

Python 99.76% Shell 0.01% Makefile 0.02% Jupyter Notebook 0.22%

gene-ontology genomics gene-set-enrichment fdr bioinfomatics goslim-terms fisher-tests fdr-benjamini sidak holm-sidak

goatools's Introduction

goatools's People

Contributors

Stargazers

Watchers

Forkers

gturco guniorobot ecswart assaflavi hmenager caoxinkai yarden huanannd criswell ezequieljsosa jni enrichettamileti rajanil bicycle1885 matthieu-bruneaux cmungall deannachurch jancr defrox stuppie vals bgruening mfiers cfretter fw1121 meono fidelram lileiting bayusantoso yunesj bioinfo2015 heyggu academiq idoerg dangeles deep-introspection alex-wave matrs pdaicode patflick jfear brandoninvergo deto yuz302 olgabot burkesquires pilarcormo milkcookie dayedepps teslaa22 gymreklab sgordon007 ruru-adra atps flopezo sung-huan shenmengyuan yintz resurgo-genetics cosmo-ethz uweschmitt kyounghyoun chookoo asalt aswarren odinidoer v-mostafapour gudeqing inambioinfo toluadeyelu jlsteenwyk hsooi yitongfengg sisov ustczln yangfuquan1993 mdelrio1 lolo-1992 padilha nchingtw living1069 yaoxingqi archieyoung thomashsmith camiloaruiz shanwai1234 yangle293 iszhi swagarikagiri yangming nayfous priya-gittest celinereisser ahmetrifaioglu zzygyx9119 canergen maozhitao smyang2018 wangpanqiao cinaljess

goatools's Issues

Loci with no GO annotation

Hi there,

I am using goatools to test for over- and/or under-represented GOSlim categories in a plant lineage that is lacking a closely-related annotated genome. Thus, for the 2091 loci in my "population" only 819 have known GO annotations. Similarly, my "study" data sets have many loci that are also lacking GO annotations.

My question is: Should I include loci in my "population" and "study" files that do not have GO annotations? I am concerned that, if I only use loci for which there are annotations, I may bias the results.

Any suggestions/input would be much appreciated.

Thanks!

0.7.10 setuptools issue

$ pip install goatools
Collecting goatools
  Using cached goatools-0.7.10.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/jw/23sjzsn97lz8qj3kp4bg4pw00000gn/T/pip-build-OpOQPw/goatools/setup.py", line 8, in <module>
        from setup_helper import SetupHelper
    ImportError: No module named setup_helper

Difference between gene2go.gz and GO BP 2015 annotations

Hi there,

Thanks for this code, it's exactly what I needed. I'm switching over from Enrichr (doesn't allow for background gene sets...), and they use GO BP 2015 annotations. I notice that the annotations in the file that they use is quite different than the annotations taken from NCBI, as was done in the tutorial.

Specifically, I plotted the number of terms for each GO pathway in the Enrichr annotations file vs. the NCBI file recommended in the tutorial (log2 scale), and found that there were many terms missing from the latter:

It appears that the NCBI annotations file is up to date. Could you please explain the discrepancy here?

Thanks!

Consider using recommend URLs for GO ontology downloads

See: http://geneontology.org/page/download-ontology

We appreciate it's a hassle changing, and will continue to support the currently URLs that you use, but we'd encourage people to move to the new more informative naming schema that's harmonized across ontologies.

I'll send a PR

association file

Hi,
I am working also on a completely new organism, so I used gene prediction and blast them against Uniref90 in order to get gene IDs.

I found only this page ( http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot-goa/features?segment=Q9LZR0 ) which map gene ID ( Q9LZR0 ) to GO terms.

Could I used that page to create the association file?

Thank you in advance

count_terms() called with wrong number of args

http://github.com/tanghaibao/goatools/blob/master/goatools/multiple_testing.py#L108
calls count_terms() with 2 args, but it takes 3. cant do fdr testing.

using "ND" entries in the GAF file.

It seems like goatools automatically ignore the entries with evidence code "ND" in the GAF file. But for my case I actually needed to use them. I found in the associations.py line 132, if 'NOT' not in ntgaf.Qualifier and evidence_code != 'ND':, this seems to be hardcore. It would be very helpful in the future version, if there is an option to include these entries.

some tests/doctest examples

at least a readme.rst with some tests/examples.

plot parents/children in networkx

have a helper method to plot the parents or children of a given GO term.
this means merging back after parents and diverged and then converged.

conda does not install properly when using python 3.4

Something seems weird with the bioconda installation. Trying to install goatools through bioconda into a new python 3 environment leads to a UnsatisfiableError. However it installs just fine into the same environment using pip.

dots before the GO terms

Hello,
What the dots mean before the GO terms?

    python scripts/find_enrichment.py --alpha=0.05 --fdr --indent data/study data/population data/association
    id  enrichment  description ratio_in_study  ratio_in_pop    p_uncorrected   p_bonferroni    p_holm  p_sidak p_fdr
    .GO:0003824 e   catalytic activity  106/276 7773/33239  2.6e-08 2.01e-05    2.01e-05    1.96e-05    0
    ..GO:0016740    e   transferase activity    45/276  2713/33239  7.46e-06    0.00575 0.00574 0.0056  0.008
    .....GO:0006464 e   cellular protein modification process   33/276  1723/33239  7.9e-06 0.00608 0.00606 0.00593 0.008
    ....GO:0036211  e   protein modification process    33/276  1723/33239  7.9e-06 0.00608 0.00606 0.00593 0.008
    ...GO:0019748   e   secondary metabolic process 11/276  252/33239   9.5e-06 0.00731 0.00728 0.00713 0.009
    ......GO:0006468    e   protein phosphorylation 22/276  918/33239   1.01e-05    0.00776 0.00771 0.00757 0.009
    .....GO:0016310 e   phosphorylation 22/276  941/33239   1.47e-05    0.0114  0.0113  0.0111  0.009
    .....GO:0008474 e   palmitoyl-(protein) hydrolase activity  3/276   7/33239 1.93e-05    0.0149  0.0148  0.0145  0.01
    ...GO:0005839   e   proteasome core complex 4/276   23/33239    3.64e-05    0.028   0.0277  0.0273  0.016
    ....GO:0043412  e   macromolecule modification  33/276  1870/33239  5.53e-05    0.0426  0.0421  0.0415  0.026
    ....GO:0044550  e   secondary metabolite biosynthetic process   8/276   158/33239   5.61e-05    0.0432  0.0427  0.0421  0.027

Thank you in advance.

make goanalysis stuff more easily callable as a module function

? discuss how to do this.
should be able to take lists of accn names.

Infinite loop when parsing

from goatools.obo_parser import GODag
obo_dag = GODag(obo_file="../GO/go-basicNS.obo")

leads to an infinite loop on:
'2.7.9 |Anaconda 2.2.0 (64-bit)| (default, Dec 18 2014, 16:57:52) [MSC v.1500 64 bit (AMD64)]'

the .obo file being:
format-version: 1.2
data-version: releases/2015-06-20
date: 19:06:2015 14:25
saved-by: dph
auto-generated-by: OBO-Edit 2.3.1

I temporarily fixed it in CFretter@a7aee05

import error (python2.7)

Traceback (most recent call last):
File "scripts/find_enrichment.py", line 25, in
from goatools.go_enrichment import GOEnrichmentStudy
File "scripts/../goatools/init.py", line 5, in
from goatools.go_enrichment import *
File "scripts/../goatools/go_enrichment.py", line 18, in
import fisher
ImportError: No module named fisher

go_enrichment.py summary to file

Hi,
Could you please add a method which allows to write summary to file?

Thank you in advance.

Mic

plot_go_term.py to Cytoscape or VisANT

Hi,
any plans to do plot_go_term.py to Cytoscape or VisANT ?

'GOEnrichmentStudy' object has no attribute 'results'

Hi,
I tried the option --compare with the following comand:

    $ python scripts/find_enrichment.py --compare --indent data/study data/population data/association

and got the following error massage:

    removed 276 overlapping items
    load obo file gene_ontology.1_2.obo
    41085 nodes imported
    id      enrichment      description     ratio_in_study  ratio_in_pop    p_uncorrected   p_bonferroni    p_holm  p_sidak p_fdr
    Traceback (most recent call last):
    File "scripts/find_enrichment.py", line 116, in <module>
        g.print_summary(min_ratio=min_ratio, indent=opts.indent, pval=opts.pval)
    File "scripts/../goatools/go_enrichment.py", line 154, in print_summary
        for rec in self.results:
    AttributeError: 'GOEnrichmentStudy' object has no attribute 'results'

The bug about more than one relationship in obo file

here is the code in obo_parser.py

    if hasattr(rec, name):
        if name not in self.attrs_scalar:
            if name not in self.attrs_nested:
                getattr(rec, name).add(value)
            else:
                self._add_nested(rec, name, value)
        else:
            raise Exception("ATTR({NAME}) ALREADY SET({VAL})".format(
                NAME=name, VAL=getattr(rec, name)))
    else: # Initialize new GOTerm attr
        if name in self.attrs_scalar:
            setattr(rec, name, value)
        elif name not in self.attrs_nested:
            setattr(rec, name, set([value]))
        else:
            name = '_{:s}'.format(name)
            setattr(rec, name, defaultdict(list))
            self._add_nested(rec, name, value)

here, this code, if hasattr(rec, name), is never true, because name is relationship but the attr is relationship by name = '{:s}'.format(name), so only the last one relationship of each go_term saved.
And I change the code like this:

    _name = '_{:s}'.format(name) if name in self.attrs_nested else name
    if hasattr(rec, _name):
        if name not in self.attrs_scalar:
            if name not in self.attrs_nested:
                getattr(rec, name).add(value)
            else:
                self._add_nested(rec, _name, value)
        else:
            raise Exception("ATTR({NAME}) ALREADY SET({VAL})".format(
                NAME=name, VAL=getattr(rec, name)))
    else: # Initialize new GOTerm attr
        if name in self.attrs_scalar:
            setattr(rec, name, value)
        elif name not in self.attrs_nested:
            setattr(rec, name, set([value]))
        else:
            # name = '_{:s}'.format(name)
            setattr(rec, _name, defaultdict(list))
            self._add_nested(rec, _name, value)

Thanks a lot.

Error when running find_enrichment.py with --fdr option

Hi there,

I am trying to run the "find_enrichment.py" script with the --fdr option. My population of genes is only 2151 and my study sample file(s) is(/are) less than 1000 genes. The Bonferroni numbers seemed very high, which I thought might be due to my sample size. However, when I run "find_enrichment.py" I receive the following errors:

File "/Users/Grusz/bin/goatools-0.5.2/scripts/find_enrichment.py", line 124, in
study=study, methods=methods)
File "/Users/Grusz/bin/goatools-0.5.2/scripts/../goatools/go_enrichment.py", line 95, in init
self.run_study(study)
File "/Users/Grusz/bin/goatools-0.5.2/scripts/../goatools/go_enrichment.py", line 134, in run_study
self.term_pop, self.obo_dag)
File "/Users/Grusz/bin/goatools-0.5.2/scripts/../goatools/multiple_testing.py", line 109, in calc_qval
new_term_study = go_enrichment.count_terms(new_study, assoc, obo_dag)
NameError: global name 'go_enrichment' is not defined

I've scanned through the "go_enrichment.py" and "mutiple_testing.py" files to see where things are going wrong, but can't seem to find the problem. Any suggestions would be much appreciated.

Thanks,
Amanda

fisher import error

Python 3.4


>>> import fisher
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/fisher-0.1.4-py3.4-linux-x86_64.egg/fisher/__init__.py", line 1, in <module>
    from cfisher import *
ImportError: No module named 'cfisher'

Fix:
Looks like a Python3 compatibility issue

 __init__.py

from __future__ import absolute_import
from .cfisher import *

Print error of GOTerm class in obo_parser

Printing records in go-basic.obo like:

reader = OBOReader()
for rec in reader: print(rec)

gives:

TypeError: unsupported format string passed to NoneType.__format__

Python isn't happy formatting None into strings. Changing the level and depth attributes in GOTerm.__init__() from None to "" allows the records to be printed.

Unable to compute similarity between most of entities

I was trying to compute similarity between two given entities. semantic_similarity & resnik_sim works for few entities but it's giving an error return max(common_parent_go_ids(terms, go), key=lambda t: go[t].depth) ValueError: max() arg is an empty sequence
It issues this error when these is no common parent in both provided entities/genes. Here is one example producing this error
semantic_similarity(GO:0003676, GO:0007516, go)

Is there any alternative to compute similarity measure between such entities who doesn't share common parents. I'm sorry If I have missed something.

error using Python 3.5

Hi,

I'm just trying to use your tool (thanks for developing!) but it seems there's an error in my environment. Is it a Python 3 compatibility?

Thanks so much!

python ~/anaconda/bin/find_enrichment.py --pval=0.01 --indent /Users/alomana/scratch/temporalFile.txt /Users/alomana/gDrive2/projects/TREES-C/PfuEGRIN/data/go/populationFile.txt /Users/alomana/gDrive2/projects/TREES-C/PfuEGRIN/data/go/associationFile.txt
Study: 4 vs. Population 1462
load obo file go-basic.obo
go-basic.obo: format-version(1.2) data-version(releases/2016-09-10)
47199 nodes imported
Traceback (most recent call last):
File "/Users/alomana/anaconda/bin/find_enrichment.py", line 4, in
import('pkg_resources').run_script('goatools==0.6.5', 'find_enrichment.py')
File "/Users/alomana/anaconda/lib/python3.5/site-packages/setuptools-23.0.0-py3.5.egg/pkg_resources/init.py", line 719, in run_script
File "/Users/alomana/anaconda/lib/python3.5/site-packages/setuptools-23.0.0-py3.5.egg/pkg_resources/init.py", line 1511, in run_script
File "/Users/alomana/anaconda/lib/python3.5/site-packages/goatools-0.6.5-py3.5.egg/EGG-INFO/scripts/find_enrichment.py", line 137, in
File "/Users/alomana/anaconda/lib/python3.5/site-packages/goatools-0.6.5-py3.5.egg/goatools/go_enrichment.py", line 207, in init
TypeError: unsupported operand type(s) for >>: 'builtin_function_or_method' and '_io.TextIOWrapper'

Python 3.5.2 |Anaconda 4.1.1 (x86_64)| (default, Jul 2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Method of FDR

Could you please specify what is a method for False Discovery Rate calculation? Is it Benjamini-Hochberg or Benjamini-Yekutieli?

Thank You

Print out all terms, regardless of alpha values

Hi,
First of all, thank you very much for the scritps. They are very useful and easy to use.
I was wondering if there is any way to get all GO terms analysed, regardless alpha values.
That is, would like to know:
"id enrichment description ratio_in_study ratio_in_pop p_uncorrected"
for all terms in my study, even if the values are not statistically significant.
I tried with --alpha=1.0, but this option does not give me all terms.
I have some experience with Python, so I can try to modified the script if necessary, but I can't find where to do it.
Thank you very much.
Gabriel

Unable to install on Windows

Not really an issue with goatools, but I thought I'd put this here to draw attention to this issue:

brentp/fishers_exact_test#16

... which was preventing me from installing fisher (and as a result, goatools) in Windows.

Although I believe I have identified the problem, I'm not versed enough in C/Cython to submit a fix.

For anyone else having this issue, here is a temporary fix:

Clone the fisher repo locally and navigate to its root folder
Checkout commit e550127
Run "pip install ." (that's pip install ) to install the package
Now install goatools using pip install goatools

I haven't verified, though, that the fisher package at commit e550127 is stable, though, so you might run into issues doing enrichment tests. However, I just wanted to download a parser for the obo files so this works for me.

Child Term Relationships

It seems that goatools is only considering children that have an "Is a" relationship with their parent, but not a "Part of". It seems to me, after reading the Ontology relations page, that "Part of" should be included.

B is necessarily part of A: wherever B exists, it is as part of A, and the presence of the B implies the presence of A

Is there any reason it is not?

What about the other relationships (regulate, has_part)? I don't think they should be included by default, but maybe an optional argument to get_all_children to include them?

go_enrichment ValueError: Attempted relative import in non-package

python /Users/Genesis/Downloads/goatools-0.6.4/goatools/go_enrichment.py /Users/Genesis/Desktop/data/study.txt /Users/Genesis/Desktop/data/population.txt /Users/Genesis/Desktop/data/association.txt
Traceback (most recent call last):
File "/Users/Genesis/Downloads/goatools-0.6.4/goatools/go_enrichment.py", line 22, in
from multiple_testing import Methods, Bonferroni, Sidak, HolmBonferroni, FDR, calc_qval
File "/Users/Genesis/Downloads/goatools-0.6.4/goatools/multiple_testing.py", line 12, in
from .ratio import count_terms
ValueError: Attempted relative import in non-package

Purification with zero results

Hi all!

If I understand correctly, if in a first sample of comparison some GO is abscent and in the other - is present abundantly, goatools will not count it as a purification and will not print in results at all. While, as I think, it should be counted as a strong purification. Was it made intentionally or is it a bug?

Is there an option to custom the output columns of wr_xlsx

is it possible to custom the columns of the excel tables generated by goeaobj.wr_xlsx?

for example I want to export columns:
"ratio_in_study", "ratio_in_pop"

Consider support for obographs json format

Note that we now have a proposed JSON representation of OBO that would obviate the need for special purpose parsers. Your comments as a developer would be most welcome:

https://github.com/geneontology/obographs/

See also this post describing motivation

https://douroucouli.wordpress.com/2016/10/04/a-developer-friendly-json-exchange-format-for-ontologies/

Python3 Support

Are there any plans on upgrading to Python3 ?
I ran everything through 2to3 and then manually fixed a couple things and got the obo_parser and mapslim to work correctly. I haven't tested everything else though. Is there any interest in migrating everything to py3?

When ungzipping, open file in binary mode

This fixes the download_ncbi_associations functions under Python 3.

Currently the following error is produced:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-2afea7ecb634> in <module>()
      1 from goatools.base import download_ncbi_associations
----> 2 gene2go = download_ncbi_associations()

/home/lucas/Stack/programming/python/goatools/goatools/base.py in download_ncbi_associations(gene2go, prt)
    129     if not os.path.isfile(gene2go):
    130         wget.download("ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/{GZ}".format(GZ=gz))
--> 131         assert gunzip(gz) == gene2go
    132         if prt is not None:
    133             prt.write("\n  DOWNLOADED: {FILE}\n".format(FILE=gene2go))

/home/lucas/Stack/programming/python/goatools/goatools/base.py in gunzip(gz, file_gunzip)
    143     with gzip.open(gz, 'rb') as zstrm:
    144         with  open(file_gunzip, 'w') as ostrm:
--> 145             ostrm.write(zstrm.read())
    146     os.remove(gz)
    147     return file_gunzip

TypeError: write() argument must be str, not bytes

Does goatools work with every obo ontology and gaf annotations?

Hi,

I am looking for a tool that can be used to look for enrichment based on GAF and obo of any ontology. E.g. plant ontology PO, TO, EO...

Is this possible with goatools?

Best,
Daniel

TermsCount(Go,Association) not working for human genes - goa_human.gaf

It works perfectly when I use Arabidopsis thaliana but when I try to use any Homo sapiens I get error while computing TermsCount.

Here is errTraceback:

Traceback (most recent call last): File "/Users/zeeshannawaz/wp-tomoe-angiogenesis/src/angen/embed/preglove/go/gene_ontology_wrapper.py", line 88, in <module> termcounts = TermCounts(go, associations) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goatools/semantic.py", line 31, in __init__ self._count_terms(godag, annots) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/goatools/semantic.py", line 42, in _count_terms allterms |= godag[go_id].get_all_parents() KeyError: 'GO:0102756'

Can't `pip install goatoools` on version 0.6.9

Can't pip install goatools on version 0.6.9 in new virtualenv. pip install goatools==0.6.5 works fine.

(venv) jeff$ pip install goatools
Collecting goatools
  Using cached goatools-0.6.9.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/jw/23sjzsn97lz8qj3kp4bg4pw00000gn/T/pip-build-tHhzLf/goatools/setup.py", line 19, in <module>
        open('requirements.txt').readlines()]
    IOError: [Errno 2] No such file or directory: 'requirements.txt'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/jw/23sjzsn97lz8qj3kp4bg4pw00000gn/T/pip-build-tHhzLf/goatools/

(venv) jeff$ pip install goatools==0.6.5
Collecting goatools==0.6.5
Collecting fisher (from goatools==0.6.5)
  Using cached fisher-0.1.4.tar.gz
Collecting xlsxwriter (from goatools==0.6.5)
  Using cached XlsxWriter-0.9.3-py2.py3-none-any.whl
Collecting wget (from goatools==0.6.5)
Collecting statsmodels (from goatools==0.6.5)
  Using cached statsmodels-0.6.1-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Building wheels for collected packages: fisher
  Running setup.py bdist_wheel for fisher ... done
  Stored in directory: /Users/jeff/Library/Caches/pip/wheels/94/70/27/f3e07047ba539a9c8a24c5738dad745c6fe3f1d76aa714ed83
Successfully built fisher
Installing collected packages: fisher, xlsxwriter, wget, statsmodels, goatools
Successfully installed fisher-0.1.4 goatools-0.6.5 statsmodels-0.6.1 wget-3.2 xlsxwriter-0.9.3

How to generate files goatools / data /

Hello,
Could you please provide some examples how the files in goatools / data / were generated?

Thank you in advance.

"LINK : error LNK2001: unresolved external symbol PyInit_fisher/cfisher"

Hi All,

I met a problem when I tried to install goatools on Windows 7 system. I am looking forward to your suggestion!

The python version is 3.4 (32bit). I have installed VS2010.

The error shows:

Searching for goatools
Best match: goatools 0.5.5
Processing goatools-0.5.5-py3.4.egg
goatools 0.5.5 is already the active version in easy-install.pth
Installing map_to_slim.py script to C:\Python34\Scripts
Installing plot_go_term.py script to C:\Python34\Scripts
Installing write_hierarchy.py script to C:\Python34\Scripts
Installing find_enrichment.py script to C:\Python34\Scripts

Using c:\python34\lib\site-packages\goatools-0.5.5-py3.4.egg
Processing dependencies for goatools
Searching for fisher
Reading https://pypi.python.org/simple/fisher/
Best match: fisher 0.1.4
Downloading https://pypi.python.org/packages/source/f/fisher/fisher-0.1.4.tar.gz#md5=bfc763b7333a1f428e4c447dd8a85968
Processing fisher-0.1.4.tar.gz
Writing C:\Users\Duan\AppData\Local\Temp\easy_install-khjvz9br\fisher-0.1.4\setup.cfg
Running fisher-0.1.4\setup.py -q bdist_egg --dist-dir C:\Users\Duan\AppData\Local\Temp\easy_install-khjvz9br\fisher-0.1.4\egg-dist-tmp-xgn4zhdy
cfisher.c
c:\python34\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
LINK : error LNK2001: unresolved external symbol PyInit_fisher/cfisher
build\temp.win32-3.4\Release\src\cfisher.lib : fatal error LNK1120: 1 unresolved externals

Thanks!

pygraphviz.Agraph().name returns AttributeError

Thanks for writing this handy module!
I ran into a bit of trouble when I tried to generate a GML file. Line 306 in obo_parser.py throws an AttributeError when the "gml" option in the draw_lineage() function is set to True. It appears that the name must be set when the graph is initialized. Deleting line 306 and changing line 269 to:

G = pgv.AGraph(name="GO tree")

seemed to fix the problem

Error printing GOTerm

If I try to read an obo file with OBOReader, printing GO terms is generating an error.

import goatools
reader = goatools.obo_parser.OBOReader('go-basic.obo')
go_terms = list(reader)
go_terms[0]
TypeError                                 Traceback (most recent call last)
C:\Anaconda\lib\site-packages\IPython\core\formatters.pyc in __call__(self, obj)
    697                 type_pprinters=self.type_printers,
    698                 deferred_pprinters=self.deferred_printers)
--> 699             printer.pretty(obj)
    700             printer.flush()
    701             return stream.getvalue()

C:\Anaconda\lib\site-packages\IPython\lib\pretty.pyc in pretty(self, obj)
    381                             if callable(meth):
    382                                 return meth(obj, self, cycle)
--> 383             return _default_pprint(obj, self, cycle)
    384         finally:
    385             self.end_group()

C:\Anaconda\lib\site-packages\IPython\lib\pretty.pyc in _default_pprint(obj, p, cycle)
    501     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
    502         # A user-provided repr. Find newlines and replace them with p.break_()
--> 503         _repr_pprint(obj, p, cycle)
    504         return
    505     p.begin_group(1, '<')

C:\Anaconda\lib\site-packages\IPython\lib\pretty.pyc in _repr_pprint(obj, p, cycle)
    692     """A pprint that just redirects to the normal repr function."""
    693     # Find newlines and replace them with p.break_()
--> 694     output = repr(obj)
    695     for idx,output_line in enumerate(output.splitlines()):
    696         if idx:

C:\Anaconda\lib\site-packages\goatools\obo_parser.pyc in __repr__(self)
    207                 ret.append("{K}:{V}".format(K=key, V=val))
    208             else:
--> 209                 ret.append("{K}: {V} items".format(K=key, V=len(val)))
    210                 if len(val) < 10:
    211                     for elem in val:

TypeError: object of type 'NoneType' has no len()

Generating association file

Hi,

I am trying to run GO term enrichment analysis with my own background set for humans, mice, and yeast. I am assuming that I could use "find_enrichment.py" and include a text file for my query genes (or proteins) and a background set, correct? However, I am curious as to how I could generate an association file. Have you used any UniProt database to generate the association file?

Any help is appreciated.

Thanks

Rav

part_of relationship

Hi,
Can I discover "part_of" relationship between different terms or your program build children field only on "is_a" relationship?

Comparison studies and fisher.pvalue_population

fisher.pvalue_population is defined like this:

# k, n = study_true, study_tot,
# C, G = population_true, population_tot
def pvalue_population(int k, int n, int C, int G):
    #print "k=%i, n=%i, C=%i, G=%i" % (k, n, C, G)
    return pvalue(k, n - k, C - k, G - C - n + k)

This suggests that population_true and population_tot must include the respective counts from study_true and study_tot, since those counts will be subtracted.

However, in find_enrichment.read_geneset(), you explicitly make sure the population does not include the common terms when "comparing":

    if compare:
        ...
        pop -= common

Should the study set actually be added to the comparison set instead? Alternatively, should another function besides fisher.pvalue_population be used?

Get infos about not enriched GO terms

Dear all,

Is there a way to get the infos about all the GO terms associated with the study items of the enrichment analysis, and not only the significantly enriched/purified GO terms?

Thank you very much,
and best wishes,
Isabel

restructure goatools

rename genemerge.py to something intuitive.
include (and use by default) files for the obo (gene_ontology.obo) and population (accns ) and description (description) so the user only has to specify a study file and default will be to output to stdout.
put everything in module structure with init.py and use setup.py to install the script interface.

extract fischer.py into a seperate module

since there's no fischer's test in scipy or on pypi, we should put one in pypi.
before doing that we should have:

a function/method that works with numpy (so can accept numpy arrays and return an array of p-values) this same function can also work with simple python integers. with an import at the top like:

try:
from numpy import log
lambda ffloat a: a.astype('f')
except:
from math import log
lambda ffloat a: float(a)
a cython function (or just keep the swig) for speed.
these will be available as fisher.NumpyFisher, fisher.CFisher
and fischer.Fischer will default to one of them ???

find_enrichment.py results

Hi,
After running find_enrichment.py I just wonder whether it is possible to know which Gene ID has been used for each line of the below results?

id enrichment description ratio_in_study ratio_in_pop p_uncorrected p_bonferroni p_holm p_sidak p_fdr
.GO:0003824 e catalytic activity 106/276 7781/33239 2.64e-08 2.05e-05 2.05e-05 2e-05 0
..GO:0016740 e transferase activity 45/276 2713/33239 7.46e-06 0.00578 0.00578 0.00564 0.004

Thank you in advance
...

missing GO terms in GAF file

Hi,

@dvklopfenstein and I had a conversation in issue #79 about discrepancies in GAF files between goa_human.gaf.gz found on the Gene Ontology site, and the one used for Enrichr.

I looked into using GOrilla instead, and they seem to have a much more complete set of GO enrichments than the one taken from Gene Ontology site. For instance, none of my top 3 GO terms found on GOrilla exist: "GO:0010604", "GO:0048522", "GO:0009893".

I'm using your suggested code to load the GAF file:

from goatools.associations import read_gaf

# Import GO annotations from http://geneontology.org/gene-associations/goa_human.gaf.gz
goatools_annotations = read_gaf("../data/goa_human.gaf")

Any ideas where I can find the most complete annotation file that is human specific?

Thanks,
Johnny

Conda package

Hello,

I recently developed a conda package for goatools.
More info.

If you want, you can add the following badge:

Bérénice

Replace print statements with flexible logging

Excessive output from print statements in GODag crash my Jupyter/IPython notebooks. I've attached a screenshot of my screen filled with bright red output to stderr. My workaround is the Jupyter magic %%capture, but this also suppresses bona fide warnings.

This issue extends to command-line usage and calls from Python scripts.
A logger would solve these problems and allow flexible logging to a file or stream, which may be useful in some applications.

tanghaibao / goatools Goto Github PK

goatools's Introduction

goatools's People

Contributors

Stargazers

Watchers

Forkers

goatools's Issues

Recommend Projects

Recommend Topics

Recommend Org