shawhahnlab / vquest Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 3.0 123 KB

Automate IMGT V-QUEST usage on imgt.org

License: GNU Affero General Public License v3.0

Python 100.00%

bioinformatics imgt immunology

vquest's People

Contributors

Stargazers

Watchers

Forkers

menchant donglg1309 gokalpcelik

vquest's Issues

Command-line options from set of integers not parsed correctly

VQUEST options that come from a set of possible integer values need to be handled as such when parsing the command-line arguments, not left as strings. (For example --xv_outputtype 3 should work as the possible values are [0, 1, 2, 3], but argparse defaults to a string which doesn't match the integers in the list.)

Command-line options don't match case of actual V-QUEST options

The V-QUEST-provided command-line arguments as defined here (see vquest/data/options.yml) have many all-lowercase entries, whereas the V-QUEST options have mixed case. It does appear that they're case sensitive (at least in the latest version, 3.5.25) and they should match up to avoid conflicting entries in the defaults anyway.

Command line help for main options is confusing

"The options in this section must be specified while the others are optional." This isn't quite right in the context of the command line options since there are defaults applied for some of these. Only a few have to be specified. The description in data/options.yml should be made consistent with data/defaults.yml.

name 'layer_configs' is not defined

this is the code:

from vquest import *
config = layer_configs(DEFAULTS, {"species": "rhesus-monkey", "receptorOrLocusType": "IG", "fileSequences": "test.fastq"})
result = vquest(config)
result.keys()

but I've this error:

name 'layer_configs' is not defined

???

Package install should not duplicate data directory in sys.prefix

I'm currently using data_files in setup.py to try to include the configuration options, defaults, and conda environment inside the package install, but I think package_data is more applicable, and even then I'm not sure I'm using it right.

https://docs.python.org/3/distutils/setupscript.html#installing-package-data

Often, additional files need to be installed into a package. These files are often data that’s closely related to the package’s implementation, or text files containing documentation that might be of interest to programmers using the package. These files are called package data.
Package data can be added to packages using the package_data keyword argument to the setup() function. The value must be a mapping from package name to a list of relative path names that should be copied into the package.

https://docs.python.org/3/distutils/setupscript.html#installing-additional-files

The data_files option can be used to specify additional files needed by the module distribution: configuration files, message catalogs, data files, anything which doesn’t fit in the previous categories.
[...]
The directory should be a relative path. It is interpreted relative to the installation prefix (Python’s sys.prefix for system installations; site.USER_BASE for user installations). Distutils allows directory to be an absolute installation path, but this is discouraged since it is incompatible with the wheel packaging format. No directory information from files is used to determine the final location of the installed file; only the name of the file is used.

As it stands I end up with both $CONDA_PREFIX/data and $CONDA_PREFIX/lib/python3.9/site-packages/vquest/data (with my conda-based installs). The former is from the sys.prefix handling described above, but I'm not sure why the latter copy (the one that actually is use by the package) ends up installed.

From https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html:

[include_package_data=True] tells setuptools to install any data files it finds in your packages. The data files must be specified via the distutils’ MANIFEST.in file.

I do have include_package_data=True, but don't have a MANIFEST.in. Is include_package_data doing nothing, and the data directory is automatically included for some other reason? This could all use some cleanup.

Catch-22 with dependencies in setup.py

I import the package in setup.py to check the version, but that imports the dependencies... which maybe haven't been installed yet. From a quick check it may work to move what's in __init__.py elsewhere, make a separate module for __version__, and just import that.

HTTPS is needed now

The IMGT server didn't used to (I think?) support HTTPS for V-QUEST requests but now requires it. Otherwise a stub HTML file is given in the POST response.

vquest.util.VquestError: Please provide your FASTA formatted sequences in the text area

I got an error when running vquest with following command:

vquest --species rhesus-monkey --receptorOrLocusType IG --fileSequences test.fasta
Traceback (most recent call last):
  File "/home/user/anaconda3/bin/vquest", line 8, in <module>
    sys.exit(main())
  File "/home/user/anaconda3/lib/python3.7/site-packages/vquest/__main__.py", line 51, in main
    output = request.vquest(config_full)
  File "/home/user/anaconda3/lib/python3.7/site-packages/vquest/request.py", line 68, in vquest
    raise VquestError("; ".join(errors), errors)
vquest.util.VquestError: Please provide your FASTA formatted sequences in the text area

I'm sure the file I entered is a FASTA file, this is my test file.

Output directory should be configurable

Right now the output directory is implicitly the current working directory. It should be a command-line option.

Empty config file should be handled

Currently if an empty config file is used it results in a cryptic AttributeError: 'NoneType' object has no attribute 'items' from __setup_config, but this should be handled better.

Automated tests should include command-line program usage

Right now it's just unit tests for the Python code but we can do a basic test of the program with something like python -m vquest -h.

Download in a zip archive (xv_outputtype=1) should be supported

Currently only xv_outputtype=3 ("Download AIRR formatted results") is supported, but option 1 includes much more information across a set of TSV files.

Link to HighV-QUEST

IMGT also has HighV-QUEST, a high-throughput interface that requires user registration for use. That should at least get a mention here.

Un-combined output should be available

Currently for submissions with more than 50 sequences the output files are modified to include all results as though they went through in a single submission. Disabling this behavior should be an option, so that separate files are written for each batch.

Errors received via HTML should be handled

If something's wrong with the request, VQUEST will return HTML instead of a zip file, and div.form_error will describe the problem. This should be handled in vquest. Using HTML from requests-html this could be something roughly like:

          if "text/html" in response.headers["Content-Type"]:                                                                                                                                  
              html = HTML(html=response.content)
              errors = [div.text for div in html.find("div.form_error")]
              if errors:
                  raise ValueError("VQUEST errors: %s" % "; ".join(errors))