Giter Site home page Giter Site logo

pachterlab / gget Goto Github PK

View Code? Open in Web Editor NEW
860.0 11.0 64.0 258.9 MB

๐Ÿงฌ gget enables efficient querying of genomic reference databases

Home Page: https://gget.bio

License: BSD 2-Clause "Simplified" License

Python 100.00%
alphafold2 databases enrichment-analysis ensembl gget ncbi reference uniprot alphafold archs4

gget's Introduction

gget

pypi version image Downloads Conda license status status Code Coverage

gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

alt text

If you use gget in a publication, please cite*:

Luebbert, L., & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

Read the article here: https://doi.org/10.1093/bioinformatics/btac836

Installation

pip install --upgrade gget

Alternative:

conda install -c bioconda gget

For use in Jupyter Lab / Google Colab:

import gget

๐Ÿช„ Quick start guide

Command line:

# Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release
$ gget ref homo_sapiens

# Get Ensembl IDs of human genes with "ace2" or "angiotensin converting enzyme 2" in their name/description
$ gget search -s homo_sapiens 'ace2' 'angiotensin converting enzyme 2'

# Look up gene ENSG00000130234 (ACE2) and its transcript ENST00000252519
$ gget info ENSG00000130234 ENST00000252519

# Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234
$ gget seq --translate ENSG00000130234

# Quickly find the genomic location of (the start of) that amino acid sequence
$ gget blat MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# BLAST (the start of) that amino acid sequence
$ gget blast MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# Align multiple nucleotide or amino acid sequences against each other (also accepts path to FASTA file)  
$ gget muscle MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS

# Align one or more amino acid sequences against a reference (containing one or more sequences) (local BLAST) (also accepts paths to FASTA files)  
$ gget diamond MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS -ref MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS  

# Use Enrichr for an ontology analysis of a list of genes
$ gget enrichr -db ontology ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P

# Get the human tissue expression of gene ACE2
$ gget archs4 -w tissue ACE2

# Get the protein structure (in PDB format) of ACE2 as stored in the Protein Data Bank (PDB ID returned by gget info)
$ gget pdb 1R42 -o 1R42.pdb

# Find Eukaryotic Linear Motifs (ELMs) in a protein sequence
$ gget setup elm # setup only needs to be run once
$ gget elm -o results MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# Fetch a scRNAseq count matrix (AnnData format) based on specified gene(s), tissue(s), and cell type(s) (default species: human)
$ gget setup cellxgene # setup only needs to be run once
$ gget cellxgene --gene ACE2 SLC5A1 --tissue lung --cell_type 'mucus secreting cell' -o example_adata.h5ad

# Predict the protein structure of GFP from its amino acid sequence
$ gget setup alphafold # setup only needs to be run once
$ gget alphafold MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Python (Jupyter Lab / Google Colab):

import gget
gget.ref("homo_sapiens")
gget.search(["ace2", "angiotensin converting enzyme 2"], "homo_sapiens")
gget.info(["ENSG00000130234", "ENST00000252519"])
gget.seq("ENSG00000130234", translate=True)
gget.blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.muscle(["MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS", "MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS"])
gget.diamond("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS", reference="MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS")
gget.enrichr(["ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"], database="ontology", plot=True)
gget.archs4("ACE2", which="tissue")
gget.pdb("1R42", save=True)

gget.setup("elm") # setup only needs to be run once
ortho_df, regex_df = gget.elm("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")

gget.setup("cellxgene") # setup only needs to be run once
gget.cellxgene(gene = ["ACE2", "SLC5A1"], tissue = "lung", cell_type = "mucus secreting cell")

gget.setup("alphafold") # setup only needs to be run once
gget.alphafold("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK")

Call gget from R using reticulate:

system("pip install gget")
install.packages("reticulate")
library(reticulate)
gget <- import("gget")

gget$ref("homo_sapiens")
gget$search(list("ace2", "angiotensin converting enzyme 2"), "homo_sapiens")
gget$info(list("ENSG00000130234", "ENST00000252519"))
gget$seq("ENSG00000130234", translate=TRUE)
gget$blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget$blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget$muscle(list("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS", "MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS"), out="out.afa")
gget$diamond("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS", reference="MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS")
gget$enrichr(list("ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"), database="ontology")
gget$archs4("ACE2", which="tissue")
gget$pdb("1R42", save=TRUE)

gget's People

Contributors

anhchi172 avatar aubakirovarman avatar cbrueffer avatar dylanlawless avatar lakigigar avatar lauraluebbert avatar mboffelli avatar nh13 avatar noriakis avatar tdido avatar vecerkovakaterina avatar victorg775 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gget's Issues

pdb module

I love the new alphafold feature! Could there also be a gget pdb command for fetching structures from PDB? Combined with gget blast -db pdbaa this could be very powerful for comparing predictions and templates.

Ability to match parameters of BLAST as per the web app

Hello,
This request is to make the params of BLAST match exactly that of the web app (or an ability to pass them as xargs to the code)

This is an example of params sent by the code today:

ย  ย 
Program blastn
Word size 28
Expect value 10
Hitlist size 100
Match/Mismatch scores 1,-2
Gapcosts 0,2.5
Low Complexity Filter Yes
Filter string L;m;
Genetic Code 1

and these are the params if BLAST is done on the web directly

Program blastn
Word size 11
Expect value 0.05
Hitlist size 100
Match/Mismatch scores 2,-3
Gapcosts 5,2
Low Complexity Filter Yes
Filter string L;m;
Genetic Code 1

Thank you very much

Jupyter Notebook Kernel Dies When Using gget alphafold

I am able to use every gget module except for the alphaFold module. Whenever I implement a command line with AlphaFold the Jupyter Notebook kernel dies almost immediately. Is this something that occurs for others? Any recommendations are appreciated.

Generate new prediction from amino acid sequence

import gget
gget.setup("alphafold")
gget.alphafold("MAAHKGAEH")

Screen Shot 2022-10-27 at 10 17 39 AM

Download only exons sequence

Request type

Extension of existing module

Request description

Dear Sir/Madame,
It would be highly beneficial if it will be possible to download only exons, when ask about sequences of all or particular isoform of mRNA. Now get download together with introns, unfortunately.
Best regards,
Irinฤ… Tuszynska.

Example command

get seq -iso -exons

Example return value

isoforms in fasta format without introns

A couple CLI tweaks

Hi @lauraluebbert ,

I noticed with gget 0.2.0 that gget --help returns with an error code. Also, it appears that gget -v or gget --version aren't currently working as expected.

Best,
Mike

AttributeError: module 'psutil' has no attribute 'Process'

hello,
I installed gget using the command:
pip install --upgrade gget
then when i tried to install alphafold using the command:
gget setup alphafold
I got the following error:

gget setup alphafold
Traceback (most recent call last):
  File "/home/najib/miniconda3/bin/gget", line 5, in <module>
    from gget.main import main
  File "/home/najib/miniconda3/lib/python3.9/site-packages/gget/__init__.py", line 10, in <module>
    from .gget_alphafold import alphafold
  File "/home/najib/miniconda3/lib/python3.9/site-packages/gget/gget_alphafold.py", line 31, in <module>
    from ipywidgets import GridspecLayout
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipywidgets/__init__.py", line 25, in <module>
    from .widgets import *
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipywidgets/widgets/__init__.py", line 4, in <module>
    from .widget import Widget, CallbackDispatcher, register, widget_serialization
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipywidgets/widgets/widget.py", line 13, in <module>
    from ipykernel.comm import Comm
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipykernel/comm/__init__.py", line 3, in <module>
    from .comm import Comm
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipykernel/comm/comm.py", line 14, in <module>
    from ipykernel.kernelbase import Kernel
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 70, in <module>
    class Kernel(SingletonConfigurable):
  File "/home/najib/miniconda3/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 79, in Kernel
    processes: t.Dict[str, psutil.Process] = {}
AttributeError: module 'psutil' has no attribute 'Process'

i uninstalled the psutil and installed again, i got the same error.
what is the possible solution?
thank you.

expand gget search to include synonym hits in addition to name and description hits

What happened?

Hi,
I am searching for the gene "WISP2" using gget with the following command:
gget search -s homo_sapiens "WISP2"
which returns the following result:
Thu Jun 29 14:53:00 2023 INFO Fetching results from database: homo_sapiens_core_109_38 Thu Jun 29 14:53:02 2023 INFO Total matches found: 1. Thu Jun 29 14:53:02 2023 INFO Query time: 5.15 seconds. [ { "ensembl_id": "ENSG00000244558", "gene_name": "KCNK15-AS1", "ensembl_description": "KCNK15 and WISP2 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:49901]", "ext_ref_description": "KCNK15 and WISP2 antisense RNA 1", "biotype": "lncRNA", "url": "https://useast.ensembl.org/homo_sapiens/Gene/Summary?g=ENSG00000244558" } ]

But I would expect the gene with ensembl ID "ENSG00000064205" which has the symbol "CCN5" but lists "WISP2" as a synonym.

Apparently gget matches the search term in the "description" field but one can argue that a match in the "Gene Synonyms" field should be weighted higher.

gget version

0.27.7

Operating System (OS)

Linux

User interface

Command-line

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

gget search -s homo_sapiens "WISP2"

Which output/error did you get?

Output for the gene "ENSG00000244558" with symbol "KCNK15-AS1" but I would expect the gene "ENSG00000064205" with symbol "CCN5".
The reason is that the gene symbol "WISP2" (search term) is a gene synonym for "CCN5".

Option to BLAST one protein sequence against another

Thank you for the very cool and important package! This will save me hours and hours of computational work

I was wondering if you can add an option to BLAST two protein sequences against each other and get their e-value etc. I have a list of proteins that I want to compare to each other. If you'd prefer to point me to how I can make this feature and do a pull request, I'm more than happy to do so too!

Multiple sequence alignment for multiple species

What happened?

We want to use gget.muscle for multiple sequence alignments for multiple species. These multiple species have different length ensembl_id/gene_names. Therefore, they start at different places on the page.

gget version

I import gget into Google CoLab each time I run my analysis. For this example it is only minimal in other cases it can be more severe depending on the species.

gget.muscle.docx

Operating System (OS)

Windows, Other (please specify above)

User interface

Google Colab (please include a shareable link above)

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

!pip install --upgrade gget
import gget
gget.search
gget.seq
gget.muscle

Which output/error did you get?

No response

Error running alphafold

Hi I am running gget version: 0.3.7. When I run alphafold prediction I get this error:
gget alphafold AASEQUENCE
/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/haiku/_src/data_structures.py:37: FutureWarning: jax.tree_structure is deprecated, and will be removed in a future release. Use jax.tree_util.tree_structure instead.
PyTreeDef = type(jax.tree_structure(None))
Fri Aug 12 20:12:30 2022 INFO Validating input sequence(s).
Using the single-chain model.
Fri Aug 12 20:12:30 2022 INFO Finding closest source for reference database.
Jackhmmer search: 5%|โ–ˆโ–ˆโ–‰ | 7/147 [elapsed: 11:32 remaining: 3:50:48]
Traceback (most recent call last):
File "/home/ccadmin/anaconda3/envs/gget/bin/gget", line 8, in
sys.exit(main())
File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/gget/main.py", line 1439, in main
alphafold(
File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/gget/gget_alphafold.py", line 467, in alphafold
raw_msa_results = get_msa(
File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/gget/gget_alphafold.py", line 147, in get_msa
raw_msa_results[db_name].extend(jackhmmer_runner.query(fasta_path))
File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/alphafold/data/tools/jackhmmer.py", line 205, in query
os.remove(db_local_chunk(i))
FileNotFoundError: [Errno 2] No such file or directory: '/home/ccadmin/tmp/jackhmmer/fcb45c67-8b27-4156-bbd8-9d11512babf2/uniref90_2021_03.fasta.8'
Any idea how to fix this?

gget pdb error in python

What happened?

Hi,
when i uesd command gget pdb 1R42 -o 1R42.pdb, recieved error with
Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/gmx_plumed/bin/gget", line 8, in
sys.exit(main())
File "/home/xxx/anaconda3/envs/gmx_plumed/lib/python3.8/site-packages/gget/main.py", line 2102, in main
pdb_results = pdb(
TypeError: pdb() got an unexpected keyword argument 'verbose'

gget version

0.27.7

Operating System (OS)

Linux

User interface

Command-line

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

gget pdb 1R42 -o 1R42.pdb

Which output/error did you get?

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/gmx_plumed/bin/gget", line 8, in <module>
    sys.exit(main())
  File "/home/xxx/anaconda3/envs/gmx_plumed/lib/python3.8/site-packages/gget/main.py", line 2102, in main
    pdb_results = pdb(
TypeError: pdb() got an unexpected keyword argument 'verbose'

[Errno 104] Connection reset by peer

Running:
gget.ref("mouse")

I've got this

error:
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
ProtocolError Traceback (most recent call last)
ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
496
497 except (ProtocolError, socket.error) as err:
--> 498 raise ConnectionError(err, request=request)
499
500 except MaxRetryError as e:

ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Error detecting openmm

Hi, as the title says, I tried to run this and installed all the dependencies. But, still, somehow it doesn't detect openmm. Can this be resolved? A screenshot is attached. Thanks.

image

Frozen terminal

Dear developers,

I found minor issue running gget alphafold on macOS (11.6.5). My terminal went blocked after finishing "Jackhmmer search". It was unblocked by closing the window of the image reporting the "Non-Gap Count" vs "Per-Residue Count Of Non-Gap Amin Acids in the MSA For Sequence 1". Then it proceeded with "Running model_2".

Best,

Lorenzo

Speeding up gget.info by making pdb search mutable

Request type

Extension of existing module

Request description

Dear Devs,
Hi! I really like this package, itโ€™s a great project, but I ran into an issue when I added it to a workflow; I need to process a large number of Ensembl IDs so get.info would be the perfect fit for that but it takes a rather long time for mouse Ensembl IDs and it seems like thatโ€™s entirely due to the PDB API (which in my case, currently with mouse IDs, only returns NaNs). Would it be possible to make the PDB search mutable? Iโ€™m currently doing in that in a modified version of this function and like that it takes 5 s rather than 2.5 min per call.
Best,
Matthias

Example command

gget.info('ENSexampleID', pdb=False)

Example return value

Same as usual but without PDB entry (or NaN instead of an entry)

Fails to depict and answer the polymeric forms

i have used gget to predict the structure of chloride dismutase and it successfully gave me the pdb file of the structure in a monomeric form and when i cross checked it with pdb database it showed the structure to be a hexameric protein.
Now is it necessary to fill this gap ?

Specify version in ensembl "search" module

Request type

Extension of existing module

Request description

Hi,
I would like to have the option to specify a version of the queried ensembl database when searching with the "search" module.
As far as I can see, gget uses the most current version but I can see use-cases where one would like to query older versions also.

Example command

gget search -s homo_sapiens -v 96 "ACE2"

Example return value

No response

GENCODE GTFs+FASTAs

Request type

Extension of existing module

Request description

Have gget fetch FASTAs and GTFs from GENCODE

Example command

No response

Example return value

No response

Invalid command line Expected -option_name or --option_name, got '-' using gget muscle

What happened?

When I try to execute gget muscle via command line or via python script I get the same error message:

Via command line:

C:\Users\mypath > gget muscle fastas_test.txt

Or via python script:
file_name = "fastas_test.txt" gget.muscle(file_name) print(type(muscle_alignment)) print(muscle_alignment)

I get the same error message:

"Fri Aug 18 15:52:58 2023 INFO MUSCLE compiled.
Fri Aug 18 15:52:58 2023 INFO MUSCLE aligning...

Invalid command line
Expected -option_name or --option_name, got '-'"

This is how my fasta file is formatted:

Sequence1
MKTITLECMAIKLLDHLSEKEEQVKQRAEGEVKPEALQEALEKALKETGGVTVTKH
VRWTDGLTCGQLIKTNHPDKPTRIYVCLRGTKAKIADGAILKEDVIEKSTLNKFDLRG
ELKIEEKLRHLHPENSQSGDYPEAQIADGSKEVTLRQTCPTGWYFLRLLVDKDDKSYIK
ALAEKAAAKKVGELFNRYERPDCTLAKLDYVRVQVNIDAGGVLAALDRAEAIYALGLKA
ETEDGVVKLAADDKAAAPKAASRDPKAAGRVSSDPGVDSDGRGAPGPDGVGDSDSEDIAD
SEDKK
Sequence2
MSTQQIYTQVKVLQAMRHNGKLTLIIRATGGVVKLAADDKAAAPKAASRDPKAAGRVSS
DPGVDSDGRGAPGPDGVGDSDSEDIADSEDKK

gget version

0.27.0

Operating System (OS)

Windows

User interface

Command-line

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

gget muscle fastas_test.txt

Which output/error did you get?

Fri Aug 18 15:52:36 2023 INFO MUSCLE compiled.
Fri Aug 18 15:52:36 2023 INFO MUSCLE aligning...


Invalid command line
Expected -option_name or --option_name, got '-'

Background for enrichr

Hi!
Would it be possible to specify a background set of genes for the enrichr function?
Best
GJ

gget alphafold "zsh: illegal hardware instruction" on M1

Hello,

I'm trying to run gget alphafold on my M1 mac, but am encountering the following error:

zsh: illegal hardware instruction

I noticed other threads (143) that comment on the difficulty of running tensorflow with m1 hardware and was wondering if this might be the issue?

I checked to see what version of tensorflow was installed with pip and found several tensorflow-related packages, but not tensorflow itself, I'm guessing this is why other workarounds don't work (i.e. installing tensorflow alpha, or what is suggested here: https://www.youtube.com/watch?v=WFIZn6titnc) :

tensorboard 2.9.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow-cpu 2.9.1
tensorflow-estimator 2.9.0
tensorflow-io-gcs-filesystem 0.26.0

Is there an easy way to resolve this?

-Alex

gget alphafold: RuntimeError: jaxlib version 0.4.1 is newer than and incompatible with jax version 0.3.25. Please update your jax and/or jaxlib packages

I have installed gget in my ubuntu 20.04. However, when I do a gget setup alphafold it sets it up fine as below:

Thu Jan 12 06:06:31 2023 INFO openmm v7.5.1 already installed.
Thu Jan 12 06:06:31 2023 INFO Installing AlphaFold from source (requires pip and git).
Thu Jan 12 06:06:37 2023 INFO AlphaFold installed succesfully.
Thu Jan 12 06:06:37 2023 INFO Installing pdbfixer from source (requires pip and git).
Thu Jan 12 06:06:41 2023 INFO pdbfixer installed succesfully.
Thu Jan 12 06:06:41 2023 INFO AlphaFold model parameters already downloaded.

However my run on
gget alphafold MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH

returns the following error. I have also tried doing pip install -U gget followed by pip install -U gget I still get the same error:

Traceback (most recent call last):
File "/home/sutripa/anaconda3/bin/pip", line 11, in
sys.exit(main())
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/cli/main.py", line 70, in main
return command.main(cmd_args)
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 98, in main
return self._main(args)
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 214, in _main
self.handle_pip_version_check(options)
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 143, in handle_pip_version_check
session = self._build_session(
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 88, in _build_session
session = PipSession(
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/network/session.py", line 289, in init
self.headers["User-Agent"] = user_agent()
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_internal/network/session.py", line 132, in user_agent
linux_distribution = distro.linux_distribution() # type: ignore
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_vendor/distro.py", line 125, in linux_distribution
return _distro.linux_distribution(full_distribution_name)
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_vendor/distro.py", line 681, in linux_distribution
self.version(),
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_vendor/distro.py", line 741, in version
self.lsb_release_attr('release'),
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_vendor/distro.py", line 903, in lsb_release_attr
return self._lsb_release_info.get(attribute, '')
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_vendor/distro.py", line 556, in get
ret = obj.dict[self._fname] = self._f(obj)
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/pip/_vendor/distro.py", line 1014, in _lsb_release_info
stdout = subprocess.check_output(cmd, stderr=devnull)
File "/home/sutripa/anaconda3/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/sutripa/anaconda3/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('lsb_release', '-a')' returned non-zero exit status 1.
Traceback (most recent call last):
File "/home/sutripa/anaconda3/bin/gget", line 8, in
sys.exit(main())
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/gget/main.py", line 1551, in main
alphafold(
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/gget/gget_alphafold.py", line 302, in alphafold
from alphafold.model import model
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/alphafold/model/model.py", line 20, in
from alphafold.model import features
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/alphafold/model/features.py", line 19, in
from alphafold.model.tf import input_pipeline
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/alphafold/model/tf/input_pipeline.py", line 17, in
from alphafold.model.tf import data_transforms
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/alphafold/model/tf/data_transforms.py", line 18, in
from alphafold.model.tf import shape_helpers
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/alphafold/model/tf/shape_helpers.py", line 16, in
import tensorflow.compat.v1 as tf
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/init.py", line 51, in
from ._api.v2 import compat
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/init.py", line 37, in
from . import v1
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/v1/init.py", line 30, in
from . import compat
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/v1/compat/init.py", line 37, in
from . import v1
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/v1/compat/v1/init.py", line 47, in
from tensorflow._api.v2.compat.v1 import lite
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/v1/lite/init.py", line 9, in
from . import experimental
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/v1/lite/experimental/init.py", line 8, in
from . import authoring
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/_api/v2/compat/v1/lite/experimental/authoring/init.py", line 8, in
from tensorflow.lite.python.authoring.authoring import compatible
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/lite/python/authoring/authoring.py", line 43, in
from tensorflow.lite.python import convert
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/lite/python/convert.py", line 28, in
from tensorflow.lite.python import util
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/tensorflow/lite/python/util.py", line 55, in
from jax import xla_computation as _xla_computation
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/jax/init.py", line 35, in
from jax import config as _config_module
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/jax/config.py", line 17, in
from jax._src.config import config # noqa: F401
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/jax/_src/config.py", line 27, in
from jax._src import lib
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/jax/_src/lib/init.py", line 74, in
version = check_jaxlib_version(
File "/home/sutripa/anaconda3/lib/python3.9/site-packages/jax/_src/lib/init.py", line 69, in check_jaxlib_version
raise RuntimeError(msg)
RuntimeError: jaxlib version 0.4.1 is newer than and incompatible with jax version 0.3.25. Please update your jax and/or jaxlib packages.

Local variable 'db_connection' referenced before assignment

Installed on host via pip install --upgrade gget:

$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:

    System Software Overview:

      System Version: macOS 12.5 (21G72)
      Kernel Version: Darwin 21.6.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: Earl Grey
      User Name: Alex Reynolds (areynolds)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 1 day 8:44

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,4
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.4 GHz
      Number of Processors: 1
      Total Number of Cores: 8
      L2 Cache (per Core): 256 KB
      L3 Cache: 16 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1731.140.2.0.0 (iBridge: 19.16.16064.0.0,0)
      OS Loader Version: 540.120.3~19
      Serial Number (system): C02CT0C0PT01
      Hardware UUID: C6082A3D-359C-5F2C-AC84-5068C7891897
      Provisioning UDID: C6082A3D-359C-5F2C-AC84-5068C7891897
      Activation Lock Status: Disabled

Problematic command:

$ python --version
Python 3.8.13
$ gget search -s homo_sapiens 'usf1'
Tue Aug  9 20:03:08 2022 INFO Fetching results from database: homo_sapiens_core_107_38
Tue Aug  9 20:03:11 2022 ERROR The Ensembl server returned the following error: Character set 'utf8' unsupported
Traceback (most recent call last):
  File "/Users/areynolds/miniconda3/bin/gget", line 8, in <module>
    sys.exit(main())
  File "/Users/areynolds/miniconda3/lib/python3.8/site-packages/gget/main.py", line 1223, in main
    gget_results = search(
  File "/Users/areynolds/miniconda3/lib/python3.8/site-packages/gget/gget_search.py", line 172, in search
    df_temp = pd.read_sql(query, con=db_connection)
UnboundLocalError: local variable 'db_connection' referenced before assignment

Using version 0.3.7:

$ gget --version
gget version: 0.3.7

cellxgene filter improvements

Request type

Extension of existing module

Request description

The gget_cellxgene module could support additional filtering options, providing the full set of metadata attributes that are available in the Census. In particular, supporting is_primary_data will allow users to ignore cells that are duplicated across multiple CELLxGENE datasets. See https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md#is_primary_data. It would be reasonable to have a default value of True for this attribute.

Also, the current defaults will attempt to load the entire Census into an AnnData object, which requires a considerable amount of RAM (100's GBs) and high network bandwidth. You might consider enforcing that at least one obs filter is specified or a subset of genes is specified.

Example command

No response

Example return value

No response

Option to reduce verbosity or turn off logging

Request type

Extension of existing module

Request description

This is a really useful easy-to-use package. I have been using this for relatively large requests and I was wondering if there is a way to reduce the verbosity of the logging info or turn off the logging completely?

Example command

No response

Example return value

No response

Add Uniprot localisation data

Many thanks for this brilliant tool. I was wondering if it would be possible to add the "subcellular localisation" segment of the uniprot ID to the tools output?

This would be immensely helpful in terms of filtering for sub cellular location.

Many thanks and apologies if it does this already, but I couldn't identify this data in the output

Error in gget setup alphafold: Ignored the following versions that require a different python version...

Hello, thank you for this great tool! I am having trouble with the gget setup alphafold command.

I created a conda environment with gget, git, pip and openmm=7.5.1 (attached). The first time I executed the setup command it finished with no errors, but when I tried to run gget.alphafold() it says "Some third-party dependencies are missing". Then, when I run the setup command again this is the error message:

  Thu Jan 19 10:45:08 2023 INFO openmm v7.5.1 already installed.
  Thu Jan 19 10:45:08 2023 INFO Installing AlphaFold from source (requires pip and git).
  Note: switching to '569eb4fea3733b979cb0442750b875759dd5ecc0'.
  
  You are in 'detached HEAD' state. You can look around, make experimental
  changes and commit them, and you can discard any commits you make in this
  state without impacting any branches by switching back to a branch.
  
  If you want to create a new branch to retain commits you create, you may
  do so (now or later) by using -c with the switch command. Example:
  
    git switch -c <new-branch-name>
  
  Or undo this operation with:
  
    git switch -
  
  Turn off this advice by setting config variable advice.detachedHead to false
  
  ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.0rc1 Requires-Python >=3.7,<3.10; 1.7.0rc2 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10
  ERROR: Could not find a version that satisfies the requirement scipy==1.7.0 (from versions: 0.8.0, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.6.0, 1.6.1, 1.7.2, 1.7.3, 1.8.0rc1, 1.8.0rc2, 1.8.0rc3, 1.8.0rc4, 1.8.0, 1.8.1, 1.9.0rc1, 1.9.0rc2, 1.9.0rc3, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0rc1, 1.10.0rc2, 1.10.0)
  ERROR: No matching distribution found for scipy==1.7.0
  Thu Jan 19 10:46:34 2023 ERROR AlphaFold installation failed.
  Thu Jan 19 10:46:34 2023 INFO openmm v7.5.1 already installed.
  Thu Jan 19 10:46:34 2023 INFO Installing AlphaFold from source (requires pip and git).
  Note: switching to '569eb4fea3733b979cb0442750b875759dd5ecc0'.
  
  You are in 'detached HEAD' state. You can look around, make experimental
  changes and commit them, and you can discard any commits you make in this
  state without impacting any branches by switching back to a branch.
  
  If you want to create a new branch to retain commits you create, you may
  do so (now or later) by using -c with the switch command. Example:
  
    git switch -c <new-branch-name>
  
  Or undo this operation with:
  
    git switch -
  
  Turn off this advice by setting config variable advice.detachedHead to false
  
  ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.0rc1 Requires-Python >=3.7,<3.10; 1.7.0rc2 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10
  ERROR: Could not find a version that satisfies the requirement scipy==1.7.0 (from versions: 0.8.0, 0.9.0, 
  [alphafold_conda.txt](https://github.com/pachterlab/gget/files/10455613/alphafold_conda.txt)
  0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.6.0, 1.6.1, 1.7.2, 1.7.3, 1.8.0rc1, 1.8.0rc2, 1.8.0rc3, 1.8.0rc4, 1.8.0, 1.8.1, 1.9.0rc1, 1.9.0rc2, 1.9.0rc3, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0rc1, 1.10.0rc2, 1.10.0)
  ERROR: No matching distribution found for scipy==1.7.0
  Thu Jan 19 10:47:04 2023 ERROR AlphaFold installation failed.
  Thu Jan 19 10:47:05 2023 ERROR 

Any suggestions on how to continue?

AlphaFold model parameters download error

Hi! I am hitting a SSL cert problem when running alphafold setup:

Tue Aug 16 10:18:49 2022 INFO Downloading AlphaFold model parameters (requires 4.1 GB of storage). This might take a few minutes.
curl: (60) SSL certificate problem: unable to get local issuer certificate                                                                                
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Where are the parameters being downloaded from? I believe this will help me check if I have the right certs and are in the right place. Any additional advice to solve this error would be greatly appreciated! Thank you!

gget seq encounters missing gene name from uniprot and throws type error

What happened?

  • Error only occurs only for a small fraction of ids (I went through about 19,000).
  • Issue: df_uniprot['gene_name'] is NaN (which is np.float, gget_seq.py expects str)

ids where I encounter the error:

['ENSG00000275740', 'ENSG00000249624', 'ENSG00000288716', 'ENSG00000288712', 
 'ENSG00000288708', 'ENSG00000288706', 'ENSG00000288654', 'ENSG00000288646', 
 'ENSG00000288644', 'ENSG00000288634', 'ENSG00000288626', 'ENSG00000288625', 
 'ENSG00000288623', 'ENSG00000288608', 'ENSG00000288570', 'ENSG00000286224', 
 'ENSG00000286131', 'ENSG00000288629', 'ENSG00000288645', 'ENSG00000284934', 
 'ENSG00000284895', 'ENSG00000284684', 'ENSG00000285976', 'ENSG00000288715',
 'ENSG00000257046']

gget version

0.27.9

Operating System (OS)

Linux, macOS

User interface

Python

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

import gget
gget.seq(['ENSG00000257046'], translate=True)

Which output/error did you get?

Thu Oct 12 18:16:04 2023 INFO Requesting amino acid sequence of the canonical transcript ENST00000540229 of gene ENSG00000257046 from UniProt.
Thu Oct 12 18:16:05 2023 WARNING No reviewed UniProt results were found for ID ENST00000540229. Returning all unreviewed results.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/fruity/miniconda3/envs/biopy/lib/python3.8/site-packages/gget/gget_seq.py", line 379, in seq
    ">"
TypeError: can only concatenate str (not "numpy.float64") to str

Change gget search url to useast Ensembl mirror

What happened?

uswest.ensembl.org was discontinued. Change URLs returned by gget search to useast.ensembl.org

gget version

NA

Operating System (OS)

Not applicable

User interface

Not applicable

Are you using a computer with an Apple M1 chip?

Not applicable

What is the exact command that was run?

No response

Which output/error did you get?

No response

gget blast can not restrict by taxonomy

Request type

Extension of existing module

Request description

I tried to use gget blast and I noticed I can not filter by taxonomy. It would be nice if you can implement that. Here is the filter box in uniprot blast:
image


Looking at the result, this is the command option added to the blast run that may help:

-taxidlist ncbiblast-R20230405-025515-0636-88381305-p1m.taxidlist

Example command

No response

Example return value

No response

Alphafold running time

What happened?

Dear developers,

I tried to run alphafold from the gget package and compare the execution times to the Alphafold2 implemented in Google Colab.
I submitted the following sequence and used the default parameters in both cases: MAEHGAHFTAASVADDQPSIFEVVAQDSLMTAVRPALQHVVKVLAESNPTHYGFLWRWFDEIFTLLDLLLQQHYLSRTSASFSENFYGLKRIVMGDTHKSQRLASAGLPKQQLWKSIMFLVLLPYLKVKLEKLVSSLREEDEYSIHPPSSRWKRFYRAFLAAYPFVNMAWEGWFLVQQLRYILGKAQHHSPLLRLAGVQLGRLTVQDIQALEHKPAKASMMQQPARSVSEKINSALKKAVGGVALSLSTGLSVGVFFLQFLDWWYSSENQETIKSLTALPTPPPPVHLDYNSDSPLLPKMKTVCPLCRKTRVNDTVLA

The Colab AlphaFold finished in 15 minutes and required 4.6GB RAM and 3.6GB of GPU while gget AlphaFold finished in ~5h in a desktop computer with 8GB GPU, and 64GB RAM.

If I understood correctly, gget alphafold implements the simplified Alphafold version that also uses Google Colab. Why is there such a great difference in execution time?

gget version

0.27.2

Operating System (OS)

Linux

User interface

Command-line

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

gget alphafold MAEHGAHFTAASVADDQPSIFEVVAQDSLMTAVRPALQHVVKVLAESNPTHYGFLWRWFDEIFTLLDLLLQQHYLSRTSASFSENFYGLKRIVMGDTHKSQRLASAGLPKQQLWKSIMFLVLLPYLKVKLEKLVSSLREEDEYSIHPPSSRWKRFYRAFLAAYPFVNMAWEGWFLVQQLRYILGKAQHHSPLLRLAGVQLGRLTVQDIQALEHKPAKASMMQQPARSVSEKINSALKKAVGGVALSLSTGLSVGVFFLQFLDWWYSSENQETIKSLTALPTPPPPVHLDYNSDSPLLPKMKTVCPLCRKTRVNDTVLA

Which output/error did you get?

No response

Keyerror: "0000:query"

i have used example sequence in the alphafold module and it works fine however when I give it a custom sequence it give the said error
Keyerror: "0000:query".
Please can you guide regarding the matter

Improve 'curl is missing' error for gget ref -d (include link to install instructions)

What happened?

When running gget ref in the biocontainer (quay.io/biocontainers/gget:0.27.2--pyhdfd78af_0, built automatically from conda) it fails because curl is not installed.

Solution:

  • Add curl to conda recipe

Related to #53

gget version

0.27.2

User interface

Command-line

Any additional comments on the user interface?

No response

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

gget ref -d -w dna homo_sapiens

Which output/error did you get?

Wed Jan 25 15:21:53 2023 INFO Fetching reference information for homo_sapiens from Ensembl release: 108.
sh: curl: command not found

AlphaFold Multimer

Hi!

I'm enquiring if it is planned to include support for AlphaFold multimer in a future release. Would be greatly appreciated!

Esa-Pekka

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.