shawhahnlab / vquest Goto Github PK
View Code? Open in Web Editor NEWAutomate IMGT V-QUEST usage on imgt.org
License: GNU Affero General Public License v3.0
Automate IMGT V-QUEST usage on imgt.org
License: GNU Affero General Public License v3.0
VQUEST options that come from a set of possible integer values need to be handled as such when parsing the command-line arguments, not left as strings. (For example --xv_outputtype 3
should work as the possible values are [0, 1, 2, 3]
, but argparse defaults to a string which doesn't match the integers in the list.)
The V-QUEST-provided command-line arguments as defined here (see vquest/data/options.yml
) have many all-lowercase entries, whereas the V-QUEST options have mixed case. It does appear that they're case sensitive (at least in the latest version, 3.5.25) and they should match up to avoid conflicting entries in the defaults anyway.
"The options in this section must be specified while the others are optional." This isn't quite right in the context of the command line options since there are defaults applied for some of these. Only a few have to be specified. The description in data/options.yml
should be made consistent with data/defaults.yml
.
this is the code:
from vquest import *
config = layer_configs(DEFAULTS, {"species": "rhesus-monkey", "receptorOrLocusType": "IG", "fileSequences": "test.fastq"})
result = vquest(config)
result.keys()
but I've this error:
name 'layer_configs' is not defined
???
I'm currently using data_files
in setup.py to try to include the configuration options, defaults, and conda environment inside the package install, but I think package_data
is more applicable, and even then I'm not sure I'm using it right.
https://docs.python.org/3/distutils/setupscript.html#installing-package-data
Often, additional files need to be installed into a package. These files are often data that’s closely related to the package’s implementation, or text files containing documentation that might be of interest to programmers using the package. These files are called package data.
Package data can be added to packages using the package_data keyword argument to the setup() function. The value must be a mapping from package name to a list of relative path names that should be copied into the package.
https://docs.python.org/3/distutils/setupscript.html#installing-additional-files
The data_files option can be used to specify additional files needed by the module distribution: configuration files, message catalogs, data files, anything which doesn’t fit in the previous categories.
[...]
The directory should be a relative path. It is interpreted relative to the installation prefix (Python’s sys.prefix for system installations; site.USER_BASE for user installations). Distutils allows directory to be an absolute installation path, but this is discouraged since it is incompatible with the wheel packaging format. No directory information from files is used to determine the final location of the installed file; only the name of the file is used.
As it stands I end up with both $CONDA_PREFIX/data
and $CONDA_PREFIX/lib/python3.9/site-packages/vquest/data
(with my conda-based installs). The former is from the sys.prefix
handling described above, but I'm not sure why the latter copy (the one that actually is use by the package) ends up installed.
From https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html:
[
include_package_data=True
] tells setuptools to install any data files it finds in your packages. The data files must be specified via the distutils’ MANIFEST.in file.
I do have include_package_data=True
, but don't have a MANIFEST.in. Is include_package_data
doing nothing, and the data directory is automatically included for some other reason? This could all use some cleanup.
I import the package in setup.py to check the version, but that imports the dependencies... which maybe haven't been installed yet. From a quick check it may work to move what's in __init__.py
elsewhere, make a separate module for __version__
, and just import that.
The IMGT server didn't used to (I think?) support HTTPS for V-QUEST requests but now requires it. Otherwise a stub HTML file is given in the POST response.
I got an error when running vquest with following command:
vquest --species rhesus-monkey --receptorOrLocusType IG --fileSequences test.fasta
Traceback (most recent call last):
File "/home/user/anaconda3/bin/vquest", line 8, in <module>
sys.exit(main())
File "/home/user/anaconda3/lib/python3.7/site-packages/vquest/__main__.py", line 51, in main
output = request.vquest(config_full)
File "/home/user/anaconda3/lib/python3.7/site-packages/vquest/request.py", line 68, in vquest
raise VquestError("; ".join(errors), errors)
vquest.util.VquestError: Please provide your FASTA formatted sequences in the text area
I'm sure the file I entered is a FASTA file, this is my test file.
Right now the output directory is implicitly the current working directory. It should be a command-line option.
Currently if an empty config file is used it results in a cryptic AttributeError: 'NoneType' object has no attribute 'items'
from __setup_config
, but this should be handled better.
Right now it's just unit tests for the Python code but we can do a basic test of the program with something like python -m vquest -h
.
Currently only xv_outputtype=3 ("Download AIRR formatted results") is supported, but option 1 includes much more information across a set of TSV files.
IMGT also has HighV-QUEST, a high-throughput interface that requires user registration for use. That should at least get a mention here.
Currently for submissions with more than 50 sequences the output files are modified to include all results as though they went through in a single submission. Disabling this behavior should be an option, so that separate files are written for each batch.
If something's wrong with the request, VQUEST will return HTML instead of a zip file, and div.form_error
will describe the problem. This should be handled in vquest
. Using HTML from requests-html this could be something roughly like:
if "text/html" in response.headers["Content-Type"]:
html = HTML(html=response.content)
errors = [div.text for div in html.find("div.form_error")]
if errors:
raise ValueError("VQUEST errors: %s" % "; ".join(errors))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.