Giter Site home page Giter Site logo

pyuniprot's People

Contributors

cebel avatar christianebeling avatar cthoyt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pyuniprot's Issues

zcat fails on mac

on MACOS zcat needs to be run as
zcat < filename.gz
instead of just zcat filename.gz

Remove sudo from the installation docs

It is generally not recommended to use sudo for package installation.

I wonder if there are really use cases where the "System wide installation" as noted in the documentation is useful/desired. On the other hand, using sudo with pip on a system Python installation can change packages that are used in system tooling which may be detrimental and hard to recover from.

If sudo really needs to remain in the docs, it would be great to ship along when to use it and a warning message cautioning users not to use it unless they know the dangers that come with using sudo.

Add EC Hierarchy Table

This table should contain information about the hierarchy underlying the ECNumber table.

If this information isn't available in UniProt, then we might have to write a parser (perhaps also as a separate package) for ftp://ftp.expasy.org/databases/enzyme/enzclass.txt and match them

Error on update

This happened after about 4 minutes:

In [1]: import pyuniprot

In [2]: pyuniprot.update(taxids=[9606])
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-89507300518a> in <module>()
----> 1 pyuniprot.update(taxids=[9606])

~/dev/pyuniprot/src/pyuniprot/manager/database.py in update(connection, urls, force_download, taxids)
    479 
    480     db = DbManager(connection)
--> 481     db.db_import_xml(urls, force_download, taxids)
    482     db.session.close()
    483 

~/dev/pyuniprot/src/pyuniprot/manager/database.py in db_import_xml(self, url, force_download, taxids)
    175         xml_gzipped_file_path = DbManager.download(url, force_download)
    176         self._create_tables()
--> 177         self.import_xml(xml_gzipped_file_path, taxids)
    178         self.session.close()
    179 

~/dev/pyuniprot/src/pyuniprot/manager/database.py in import_xml(self, xml_gzipped_file_path, taxids)
    184         interval = 100
    185         start = False
--> 186         with open(xml_gzipped_file_path[:-3], 'r') as fd:
    187             for line in fd:
    188                 end_of_file = line.startswith("</uniprot>")

FileNotFoundError: [Errno 2] No such file or directory: '/Users/cthoyt/.pyuniprot/data/uniprot_sprot.xml'

build database failed

$ /usr/bin/pyuniprot update --taxids 9606
WARNING: Update is very time consuming and can take several
hours depending which organisms you are importing!

Import 160939723 lines: 0%| | 0/160939723 [00:00<?, ?it/s]Exception KeyError: KeyError(<weakref at 0x7eff46857b50; to 'tqdm' at 0x7eff468e3c90>,) in <bound method tqdm.del of Import 160939723 lines: 0%| | 0/160939723 [00:00<?, ?it/s]> ignored
Traceback (most recent call last):
File "/usr/bin/pyuniprot", line 11, in
sys.exit(main())
File "/usr/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/pyuniprot/cli.py", line 72, in update
database.update(taxids=taxids, connection=conn, force_download=force_download, silent=silent)
File "/usr/lib64/python2.7/site-packages/pyuniprot/manager/database.py", line 836, in update
db.db_import_xml(urls, force_download, taxids, silent)
File "/usr/lib64/python2.7/site-packages/pyuniprot/manager/database.py", line 160, in db_import_xml
self.import_xml(xml_gzipped_file_path, taxids, silent)
File "/usr/lib64/python2.7/site-packages/pyuniprot/manager/database.py", line 233, in import_xml
self.insert_entries(entry_xml, taxids)
File "/usr/lib64/python2.7/site-packages/pyuniprot/manager/database.py", line 254, in insert_entries
entries = etree.fromstring(entries_xml)
File "", line 124, in XML
cElementTree.ParseError: mismatched tag: line 127, column 2

Database population fails

After running the update function from a fresh installation in a virtual environment, the following error occurs:

(test_venv) [511] [11:16] [cthoyt@wlan-185:~/dev/pyuniprot]
$ pyuniprot update --taxids 9606,10090,10116
WARNING: Update is very time consuming and can take several hours depending which organisms you are importing!
Traceback (most recent call last):
  File "/Users/cthoyt/.virtualenvs/test_venv/bin/pyuniprot", line 11, in <module>
    load_entry_point('PyUniProt', 'console_scripts', 'pyuniprot')()
  File "/Users/cthoyt/.virtualenvs/test_venv/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cthoyt/.virtualenvs/test_venv/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/cthoyt/.virtualenvs/test_venv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cthoyt/.virtualenvs/test_venv/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cthoyt/.virtualenvs/test_venv/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/cthoyt/dev/pyuniprot/src/pyuniprot/cli.py", line 72, in update
    database.update(taxids=taxids, connection=conn, force_download=force_download, silent=silent)
  File "/Users/cthoyt/dev/pyuniprot/src/pyuniprot/manager/database.py", line 826, in update
    db.db_import_xml(urls, force_download, taxids, silent)
  File "/Users/cthoyt/dev/pyuniprot/src/pyuniprot/manager/database.py", line 155, in db_import_xml
    self.import_xml(xml_gzipped_file_path, taxids, silent)
  File "/Users/cthoyt/dev/pyuniprot/src/pyuniprot/manager/database.py", line 197, in import_xml
    number_of_lines = int(getoutput("{} {} | wc -l".format(zcat_command, xml_gzipped_file_path)))
ValueError: invalid literal for int() with base 10: 'gzcat: /Users/cthoyt/.pyuniprot/data/uniprot_sprot.xml.gz: unexpected end of file\ngzcat: /Users/cthoyt/.pyuniprot/data/uniprot_sprot.xml.gz: uncompress failed\n 125198601'

Here's the pip freeze

(test_venv) [512] [11:16] [cthoyt@wlan-185:~/dev/pyuniprot]
$ pip freeze
certifi==2017.11.5
chardet==3.0.4
click==6.7
configparser==3.5.0
flasgger==0.8.0
Flask==0.12.2
idna==2.6
itsdangerous==0.24
Jinja2==2.10
jsonschema==2.6.0
MarkupSafe==1.0
mistune==0.8.1
numpy==1.13.3
pandas==0.21.0
passlib==1.7.1
PyMySQL==0.7.11
python-dateutil==2.6.1
pytz==2017.3
-e git+https://github.com/cebel/pyuniprot.git@9462a6042c7c9295415a5eb589b77b27cb7c142b#egg=PyUniProt
PyYAML==3.12
requests==2.18.4
six==1.11.0
SQLAlchemy==1.1.15
tqdm==4.19.4
urllib3==1.22
Werkzeug==0.12.2
WTForms==2.1

Keep track of version of data

Inside the UniProt data directory, there's a file called ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt that stores the release information. It would be good to download this and store it in a "meta" table in the PyUniProt database so it knows if there's been an update. This could be as simple as hashing the file and checking if the current hash is the same as the hash from the last update

sqlite3.ProgrammingError in query functions

example

q.get_entry()

ProgrammingError: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140377400882944 and this is thread id 140377636747008 [SQL: 'SELECT pyuniprot_entry.id AS pyuniprot_entry_id, pyuniprot_entry.dataset AS pyuniprot_entry_dataset, pyuniprot_entry.created AS pyuniprot_entry_created, pyuniprot_entry.modified AS pyuniprot_entry_modified, pyuniprot_entry.version AS pyuniprot_entry_version, pyuniprot_entry.name AS pyuniprot_entry_name, pyuniprot_entry.recommended_full_name AS pyuniprot_entry_recommended_full_name, pyuniprot_entry.recommended_short_name AS pyuniprot_entry_recommended_short_name, pyuniprot_entry.taxid AS pyuniprot_entry_taxid, pyuniprot_entry.gene_name AS pyuniprot_entry_gene_name \nFROM pyuniprot_entry'] [parameters: [{}]]

update database with only specific accession ids

In synthetic biology one often focuses on a subset of proteins across different taxids. Ideally we would have something like

pyuniprot update --accessionids Q14738, A4FU37, ...

and in python

pyuniprot.update(accessionids=['Q14738', 'A4FU37', ...])

AttributeError during package import

My enviroment:

Python 3.11.5
pyuniprot 0.0.10
numpy 1.26.1

The issue:

When importing the package, I encounter the following error:

>>> import pyuniprot
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mypath/miniconda3/envs/treva/lib/python3.11/site-packages/pyuniprot/__init__.py", line 15, in <module>
    from . import manager
  File "/mypath/miniconda3/envs/treva/lib/python3.11/site-packages/pyuniprot/manager/__init__.py", line 7, in <module>
    from . import database
  File "/mypath/miniconda3/envs/treva/lib/python3.11/site-packages/pyuniprot/manager/database.py", line 41, in <module>
    sqltypes.Text: np.unicode,
                   ^^^^^^^^^^
  File "/mypath/miniconda3/envs/treva/lib/python3.11/site-packages/numpy/__init__.py", line 333, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'unicode'. Did you mean: 'unicode_'?

Remove .idea folder from project

I can't open the project in my PyCharm because there are different settings. It would be nice if you could remove this folder and add it to the .gitignore

Memory problem with update

Machines with low memory (<=8GB) have already problems with

import pyuniprot
pyuniprot.update(taxids=(9606,10090,10116))

Perhaps caching should be avoided and/or lxml (causes memory leak) have to be exchanged against python lib.

AttributeError: 'DbManager' object has no attribute 'session'

Hi,

I'm super new to programming so I believe I'm missing something VERY obvious, so I apologize in advance for this question. In case these are relevant, I'm using a Mac and the PyCharm IDE. I downloaded the pyUniProt via the project interpreter package install. I don't want to bother downloading MySQL/MariaDB, so I plan to use SQLlite. So I've put in import pyuniprot and hoped to just proceed to update (although I have also tried with pyuniprot.set_mysql_connection and gotten the same error).
When I put in pyuniprot.update(force_download=True, taxids=[9606, 10090, 10116]) or just pyuniprot.update() I get the below error. Sorry to post with such an elementary question, but there must be something I'm missing. Thanks!!
Screen Shot 2019-06-06 at 6 04 13 PM

Error on Update (Redux)

I got farther this time (since the resolution to #1) but I am running into a new problem:

In [1]: import pyuniprot

In [2]: pyuniprot.update(taxids=[9606])
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-89507300518a> in <module>()
----> 1 pyuniprot.update(taxids=[9606])

~/dev/pyuniprot/src/pyuniprot/manager/database.py in update(connection, urls, force_download, taxids)
    567 
    568     db = DbManager(connection)
--> 569     db.db_import_xml(urls, force_download, taxids)
    570     db.session.close()
    571 

~/dev/pyuniprot/src/pyuniprot/manager/database.py in db_import_xml(self, url, force_download, taxids)
    204         xml_gzipped_file_path = DbManager.download(url, force_download)
    205         self._create_tables()
--> 206         self.import_xml(xml_gzipped_file_path, taxids)
    207         self.session.close()
    208 

~/dev/pyuniprot/src/pyuniprot/manager/database.py in import_xml(self, xml_gzipped_file_path, taxids)
    214         start = False
    215 
--> 216         xml_file = gunzip_file(xml_gzipped_file_path)
    217 
    218         with xml_file.open('r') as fd:

~/dev/pyuniprot/src/pyuniprot/manager/database.py in gunzip_file(path)
     80         gzipped_file = gzip.open(str(gzipped_path), 'rb')
     81         extracted_file = extracted_path.open('wb')
---> 82         extracted_file.write(gzipped_file.read())
     83         gzipped_file.close()
     84         extracted_file.close()

OSError: [Errno 22] Invalid argument

Is there a way to do this with context managers? I would love to know if you could write the following code when you're using paths

with gzip.open(str(gzipped_path), 'rb') as gzipped_file, extracted_path.open('wb') as extracted_file:
   extracted_file.write(gzipped_file.read())

Btw, there might also be something already built into shutil for copying a file from one place to another.

pip installs broken version of code

On Windows 10 with Python 3.8.5 I've set up a MySQL database and can successfully connect to it from command prompt

> mysql --user=gszep  --password=station --host=localhost pyuniprot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 78
Server version: 8.0.25 MySQL Community Server - GPL

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

successfully connected it to the python library

>pyuniprot mysql
server name/ IP address database is hosted [localhost]:
MySQL/MariaDB user [pyuniprot_user]: gszep
MySQL/MariaDB password [pyuniprot_passwd]: station
database name [pyuniprot]:
character set [utf8]:
Connection was successful

however when attempt to update the database with a small virus

>pyuniprot update --taxids 133704
WARNING: Update is very time consuming and can take several
hours depending which organisms you are importing!

bin was anderes
346934it [00:11, 30587.45it/s]
Traceback (most recent call last):
  File "c:\programdata\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\programdata\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\Scripts\pyuniprot.exe\__main__.py", line 7, in <module>
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\click\core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\click\core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\click\core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\click\core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\cli.py", line 72, in update
    database.update(taxids=taxids, connection=conn, force_download=force_download, silent=silent)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\manager\database.py", line 836, in update
    db.db_import_xml(urls, force_download, taxids, silent)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\manager\database.py", line 160, in db_import_xml
    self.import_xml(xml_gzipped_file_path, taxids, silent)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\manager\database.py", line 233, in import_xml
    self.insert_entries(entry_xml, taxids)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\manager\database.py", line 258, in insert_entries
    self.insert_entry(entry, taxids)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\manager\database.py", line 280, in insert_entry
    taxid = self.get_taxid(entry)
  File "C:\Users\gszep\AppData\Roaming\Python\Python38\site-packages\pyuniprot\manager\database.py", line 628, in get_taxid
    return int(entry.find(query).get('id'))
AttributeError: 'NoneType' object has no attribute 'get'

on an Ubuntu 18.04 with Python 3.6 we get the same error

Import 173479167 lines:   0%|                         | 346934/173479167
[00:01<12:09, 237194.02it/s]

Traceback (most recent call last):
  File "/home/gszep/.local/bin/pyuniprot", line 8, in <module>
    sys.exit(main())
  File "/home/gszep/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/gszep/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/gszep/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/gszep/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/gszep/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/cli.py", line 72, in update
    database.update(taxids=taxids, connection=conn, force_download=force_download, silent=silent)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/manager/database.py", line 836, in update
    db.db_import_xml(urls, force_download, taxids, silent)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/manager/database.py", line 160, in db_import_xml
    self.import_xml(xml_gzipped_file_path, taxids, silent)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/manager/database.py", line 233, in import_xml
    self.insert_entries(entry_xml, taxids)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/manager/database.py", line 258, in insert_entries
    self.insert_entry(entry, taxids)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/manager/database.py", line 280, in insert_entry
    taxid = self.get_taxid(entry)
  File "/home/gszep/.local/lib/python3.6/site-packages/pyuniprot/manager/database.py", line 628, in get_taxid
    return int(entry.find(query).get('id'))

Fix documentation syntax

:param str,tuple name: UniProt entry name(s)
:param str,tuple recommended_full_name: recommended full protein name(s)
:param str,tuple recommended_short_name: recommended short protein name(s)
:param str,tuple tissue_in_reference: tissue mentioned in reference
:param str,tuple subcellular_location: subcellular location(s)
:param str,tuple keyword: keyword
:param str,tuple pmid: PubMed identifier
:param str,tuple tissue_specificity: tissue specificities
:param str,tuple disease_comment: disease_comments
:param str,tuple alternative_name:
:param str,tuple db_reference: cross reference identifier
:param str,tuple ec_number: enzyme classification number, e.g. 1.1.1.1
:param str,tuple function_: description of protein functions
:param str,tuple feature_type: feature types
:param str,tuple organism_host: organism hosts
:param str,tuple accession: UniProt accession number
:param str,tuple disease_name: disease name
:param str,tuple gene_name: gene name
:param str,tuple taxid: NCBI taxonomy identifier
:param int,tuple limit: maximum number of results
:param str,tuple sequence: Amino acid sequence

each of these instances where it can be multiple things can should be written as

:param str or tuple[str] name: UniProt entry name(s) , etc.

If we ever get awesome and write Python 3.6+ only code, we can use real type annotations too :)

Use flask-security for logins

I saw that you completely rolled your own login system - it would be much easier and more consistent to use Flask-Security, which takes care of everything (session management, data models, and templates) for you.

Let's pair program this?

AttributeError: 'NoneType' object has no attribute 'get'

Hello,

I am running into an error while trying to retrieve human protein_IDs. I installed the package from the master branch as suggested in Issue #26. Here is the read out from my code:

`pyuniprot.set_connection("sqlite:///" + db_name)
pyuniprot.update(force_download = True, taxids=[9606])

Database uniprot.db formed.
Import 174181022 lines: 0%| | 345071/174181022 [00:02<24:58, 116040.97it/s]


AttributeError Traceback (most recent call last)
/var/folders/hw/0w064hss2mv96tfz78bybyrc0000gn/T/ipykernel_37297/3370700064.py in
15 # Pull Uniprot data into created database and filter for gene names
16 pyuniprot.set_connection("sqlite:///" + db_name)
---> 17 pyuniprot.update(force_download = True, taxids=[9606])

~/opt/anaconda3/lib/python3.8/site-packages/pyuniprot/manager/database.py in update(connection, urls, force_download, taxids, silent)
834 config.write(config_file)
835 log.info('create configuration file {}'.format(cfp))
--> 836 else:
837 config.read(cfp)
838 config.set('database', 'sqlalchemy_connection_string', connection)

~/opt/anaconda3/lib/python3.8/site-packages/pyuniprot/manager/database.py in db_import_xml(self, url, force_download, taxids, silent)
158 self._drop_tables()
159 xml_file_path, version_file_path = self.download_and_extract(url, force_download)
--> 160 self._create_tables()
161 self.import_version(version_file_path)
162 self.import_xml(xml_file_path, taxids, silent)

~/opt/anaconda3/lib/python3.8/site-packages/pyuniprot/manager/database.py in import_xml(self, xml_gzipped_file_path, taxids, silent)
231
232 # @Profile
--> 233 def update_entry_dict(self, entry, entry_dict, taxid):
234 """
235

~/opt/anaconda3/lib/python3.8/site-packages/pyuniprot/manager/database.py in insert_entries(failed resolving arguments)
256 ec_numbers = self.get_ec_numbers(entry)
257 alternative_full_names = self.get_alternative_full_names(entry)
--> 258 alternative_short_names = self.get_alternative_short_names(entry)
259 disease_comments = self.get_disease_comments(entry)
260 tissue_specificities = self.get_tissue_specificities(entry)

~/opt/anaconda3/lib/python3.8/site-packages/pyuniprot/manager/database.py in insert_entry(self, entry, taxids)
278 keywords=keywords,
279 ec_numbers=ec_numbers,
--> 280 alternative_full_names=alternative_full_names,
281 alternative_short_names=alternative_short_names,
282 disease_comments=disease_comments,

~/opt/anaconda3/lib/python3.8/site-packages/pyuniprot/manager/database.py in get_taxid(cls, entry)
626 pmid_dict = dict(citation.attrib)
627 if not re.search('^\d+$', pmid_dict['volume']):
--> 628 pmid_dict['volume'] = -1
629
630 del pmid_dict['type'] # not needed because already filtered for PubMed

AttributeError: 'NoneType' object has no attribute 'get'`

I noticed that there is no .xml.gz anywhere in the package site if that might be an issue.

Side note - I am only interested in getting ~150 protein IDs so is there a way to update the database with only gene names?

Progress bar for update

Since the update function takes so long, it would be nice to give some kind of feedback like on every 5% of the file that is completed

Memory leak in update function

import pyuniprot
pyuniprot.update(taxids=[9606,10090,10116]) # human, mouse, rat

After ~42k entries in the database the update process consumes ~12Gb of memory. I assume a problem in lxml. Found several articles describing similar problems with big XML files. BUT here I avoid to load the whole document in the iterparser (tested is and directly starts with 5Gb of memory consumption and then constantly increases). If the problem won't be solved it seem not feasible to load the whole UniProt.

PYUNIPROT_DIR and PYUNIPROT_DATA_DIR should be more easily configurable

These are hard-coded to use the users home directory, but there are cases where the quota on the home directory is limited.

PYUNIPROT_DIR and PYUNIPROT_DATA_DIR should be configurable as settings and/or as an environment variable.

eg pyuniprot.set_data_dir("/scratch/data/")

or

export PYUNIPROT_DATA_DIR=/scratch/data

captured in Python (eg in constants.py) by:

PYUNIPROT_DATA_DIR = os.environ.get('PYUNIPROT_DATA_DIR', os.path.join(PYUNIPROT_DIR, 'data'))


It's worth noting that you can override these variables by directly setting pyuniprot.manager.database.PYUNIPROT_DIR = 'some/path' and pyuniprot.manager.database.PYUNIPROT_DATA_DIR = 'some/path/data'` but this feels like a workaround.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.