Giter Site home page Giter Site logo

nmdp-bioinformatics / py-ard Goto Github PK

View Code? Open in Web Editor NEW
16.0 12.0 13.0 2.22 MB

HLA ARD Reduction in Python

Home Page: https://py-ard.org/

License: GNU Lesser General Public License v3.0

Makefile 1.50% Python 74.66% Gherkin 23.60% Dockerfile 0.23%
hla

py-ard's Introduction

py-ard

Swiss army knife of HLA Nomenclature

PyPi Version

py-ard-logo.png

Note:

  • Python Version 3.8 is no longer supported with the latest py-ard versions due to latest Pandas library not supporting 3.8. Please use py-ard==1.2.1 if using Python 3.8

  • ping mode is default. When in ping mode, alleles that do not have a G group, their corresponding P group is used.

  • Release 1.1.1 has extensive Serolgy related updates and affects Serology related data. Please rebuild the cache database if there's a missing Serology error.

   pyard-import --re-install

Or set py-ard requirements to be py-ard<=1.1.1 for your dependency.


py-ard is ARD reduction for HLA in Python

Human leukocyte antigen (HLA) genes encode cell surface proteins that are important for immune regulation. Exons encoding the Antigen Recognition Domain (ARD) are the most polymorphic region of HLA genes and are important for donor/recipient HLA matching. The history of allele typing methods has played a major role in determining resolution and ambiguity of reported HLA values. Although HLA nomenclature has not always conformed to the same standard, it is now defined by The WHO Nomenclature Committee for Factors of the HLA System. py-ard is aware of the variation in historical resolutions and grouping and is able to translate from one representation to another based on alleles published quarterly by IPD/IMGT-HLA.

Table of Contents

  1. Installation
  2. Using py-ard
  3. Command Line Tools
  4. py-ard REST Webservice
  5. Docker Deployment

Installation

py-ard works with Python 3.8 and higher.

Install from PyPi

pip install py-ard

Note: With py-ard version 1.0.0 and higher, the redux API has changed. If your use requires the older API, please install with pip install py-ard==0.9.2

Install With Homebrew

On macOS, py-ard can be installed using Homebrew package manager. This is very handy for using the command line versions of the tool without having to create virtual environments.

First time, you'd need to tap the nmdp-bioinformatics tap.

brew tap nmdp-bioinformatics/tap

Install py-ard

brew install py-ard

Homebrew will notify you as new versions of py-ard are released.

Install from source

Checkout the py-ard source code.

git clone https://github.com/nmdp-bioinformatics/py-ard.git
cd py-ard

Create and activate virtual environment. Install the py-ard dependencies.

make venv

source venv/bin/activate

make install

See Our Contribution Guide for open source contribution to py-ard.

Using py-ard

Using py-ard from Python code

py-ard can be used in a program to reduce/expand HLA GL String representation. If pyard discovers an invalid Allele, it'll throw an Invalid Exception, not silently return an empty result.

Initialize py-ard

Import and initialize pyard package. The default initialization is to use the latest version of IPD-IMGT/HLA database.

import pyard

ard = pyard.init()

Initialize py-ard with a particular version of IPD/IMGT-HLA database.

import pyard

ard = pyard.init('3510')

When processing a large numbers of typings, it's helpful to have a cache of previously calculated reductions to make similar typings reduce faster. The cache size of pre-computed reductions can be changed from the default of 1,000 by setting cache_size argument. This increases the memory footprint but will significantly increase the processing times for large number of reductions.

import pyard

max_cache_size = 1_000_000
ard = pyard.init('3510', cache_size=max_cache_size)

By default, the IPD-IMGT/HLA data is stored locally in $TMPDIR/pyard. This may be removed when your computer restarts. You can specify a different, more permanent directory for the cached data.

import pyard.ard

ard = pyard.init('3510', data_dir='/tmp/py-ard')

As MAC data changes frequently, you can choose to refresh the MAC code for current IMGT HLA database version.

ard.refresh_mac_codes()

You can check the current version of IPD-IMGT/HLA database.

ard.get_db_version()

Reduce Typings

Note: Previous to version of 1.0.0 release of py-ard, there was redux and redux_gl methods on ard. They have been consolidated so that redux handles both GL Strings and individual alleles.

Reduce a single locus HLA Typing by specifying the allele/MAC/XX code and the reduction method to redux method.

allele = "A*01:01:01"

ard.redux(allele, 'G')
# >>> 'A*01:01:01G'

ard.redux(allele, 'lg')
# >>> 'A*01:01g'

ard.redux(allele, 'lgx')
# >>> 'A*01:01'

Reduce an ambiguous GL String

# Reduce GL String
#
ard.redux("A*01:01/A*01:01N+A*02:AB^B*07:02+B*07:AB", "G")
# 'B*07:02:01G+B*07:02:01G^A*01:01:01G+A*02:01:01G/A*02:02'

You can also reduce serology based typings.

ard.redux('B14', 'lg')
# >>> 'B*14:01g/B*14:02g/B*14:03g/B*14:04g/B*14:05g/B*14:06g/B*14:08g/B*14:09g/B*14:10g/B*14:11g/B*14:12g/B*14:13g/B*14:14g/B*14:15g/B*14:16g/B*14:17g/B*14:18g/B*14:19g/B*14:20g/B*14:21g/B*14:22g/B*14:23g/B*14:24g/B*14:25g/B*14:26g/B*14:27g/B*14:28g/B*14:29g/B*14:30g/B*14:31g/B*14:32g/B*14:33g/B*14:34g/B*14:35g/B*14:36g/B*14:37g/B*14:38g/B*14:39g/B*14:40g/B*14:42g/B*14:43g/B*14:44g/B*14:45g/B*14:46g/B*14:47g/B*14:48g/B*14:49g/B*14:50g/B*14:51g/B*14:52g/B*14:53g/B*14:54g/B*14:55g/B*14:56g/B*14:57g/B*14:58g/B*14:59g/B*14:60g/B*14:62g/B*14:63g/B*14:65g/B*14:66g/B*14:68g/B*14:70Qg/B*14:71g/B*14:73g/B*14:74g/B*14:75g/B*14:77g/B*14:82g/B*14:83g/B*14:86g/B*14:87g/B*14:88g/B*14:90g/B*14:93g/B*14:94g/B*14:95g/B*14:96g/B*14:97g/B*14:99g/B*14:102g'

Valid Reduction Types

Reduction Type Description
G Reduce to G Group Level
P Reduce to P Group Level
lg Reduce to 2 field ARD level (append g)
lgx Reduce to 2 field ARD level
W Reduce/Expand to full field(4,3,2) WHO nomenclature level
exon Reduce/Expand to 3 field level
U2 Reduce to 2 field unambiguous level
S Reduce to Serological level

Perform DRB1 blending with DRB3, DRB4 and DRB5

import pyard

pyard.dr_blender(drb1='HLA-DRB1*03:01+DRB1*04:01', drb3='DRB3*01:01', drb4='DRB4*01:03')
# >>> 'DRB3*01:01+DRB4*01:03'

MAC Codes

py-ard supports not only reducing to various types but helps in expanding and looking up MAC representation. See MAC Service UI for detail.

Expand MAC

You can also use py-ard to expand MAC codes. Use expand_mac method on ard.

ard.expand_mac('HLA-A*01:BC')
# 'HLA-A*01:02/HLA-A*01:03'

Lookup MAC

Find the corresponding MAC code for an allele list GL String.

ard.lookup_mac('A*01:02/A*01:01/A*01:03')
# A*01:MN

CWD Reduction

Reduce a MAC code or an allele list GL String to CWD reduced list.

ard.cwd_redux("B*15:01:01/B*15:01:03/B*15:04/B*15:07/B*15:26N/B*15:27")
# => B*15:01/B*15:07

The above 2 methods can be chained to get back a MAC code that has a CWD reduced version.

ard.lookup_mac(ard.cwd_redux("B*15:01:01/B*15:01:03/B*15:04/B*15:07/B*15:26N/B*15:27"))
# 'B*15:AH'

Using py-ard from R code

py-ard works well from R as well. Please see Using pyard from R language page for detailed walkthrough.

Command Line Tools

Various command line interface (CLI) tools are available to use for managing local IPD-IMGT/HLA cache database, running impromptu reduction queries and batch processing of CSV files.

For all tools, use --imgt-version and --data-dir to specify the IPD-IMGT/HLA database version and the directory where the SQLite files are created.

pyard-import Import the latest IPD-IMGT/HLA database

pyard-import helps with importing and reinstalling of prepared IPD-IMGT/HLA and MAC data.

Use pyard-import -h to see all the options available.

$ pyard-import -h
usage: pyard-import [-h] [--list] [-i IMGT_VERSION] [-d DATA_DIR] [--v2-to-v3-mapping V2_V3_MAPPING] [--refresh-mac] [--re-install] [--skip-mac]

py-ard tool to generate reference SQLite database. Allows updating db with custom V2 to V3 mappings. Displays the list of available IMGT database
versions.

options:
  -h, --help            show this help message and exit
  --list                Show Versions of available IMGT Databases
  -i IMGT_VERSION, --imgt-version IMGT_VERSION
                        Import supplied IMGT_VERSION DB Version
  -d DATA_DIR, --data-dir DATA_DIR
                        Data directory to store imported data
  --v2-to-v3-mapping V2_V3_MAPPING
                        V2 to V3 mapping CSV file
  --refresh-mac         Only refresh MAC data
  --re-install          reinstall a fresh version of database
  --skip-mac            Skip creating MAC mapping

Run pyard-import without any option to download and prepare the latest version of IPD-IMGT/HLA and MAC data.

$ pyard-import
Created Latest py-ard database

Import particular version of IMGT database

$ pyard-import --db-version 3.29.0
Created py-ard version 3290 database

Import particular version of IMGT database and replace the v2 to v3 mapping table from a CSV file.

$ pyard-import --imgt-version 3.29.0 --v2-to-v3-mapping map2to3.csv
Created py-ard version 3290 database
Updated v2_mapping table with 'map2to3.csv' mapping file.

Reinstall a particular IMGT database

pyard-import --imgt-version 3340 --re-install

Replace the Latest IMGT database with V2 mappings

$ pyard-import --v2-to-v3-mapping map2to3.csv

Refresh the MAC for the specified version

$ pyard-import --imgt-version 3450 --refresh-mac

Skip MAC loading

You can skip loading MAC if you don't need by using --skip-mac

$ pyard-import --imgt-version 3150 --skip-mac

pyard-status Show database status

Show the statuses of all py-ard databases

pyard-status goes through all the available databases and checks all the tables that should be available. This is very helpful to show all the databases, number of rows in each table, any missing tables and the stored IPD-IMGT/HLA version.

$ pyard-status

Use --data-dir to specify an alternate directory for cached database files.

$ pyard-status  --data-dir ~/.pyard/
IMGT DB Version: Latest (3440)
There is a newer IMGT release than version 3440
Upgrade to latest version '3510' with 'pyard-import --re-install'
File: /Users/pbashyal/.pyard/pyard-Latest.sqlite3
Size: 533.37MB
-------------------------------------------
|Table Name          |Rows                |
|-----------------------------------------|
|dup_g               |                  59|
|dup_lgx             |                   1|
|g_group             |               14223|
|p_group             |               18872|
|lgx_group           |               14223|
|exon_group          |               12934|
|p_not_g             |                1681|
|xx_codes            |                1517|
|who_group           |               30785|
|alleles             |               32504|
|exp_alleles         |                  60|
|who_alleles         |               30523|
|mac_codes           |             1089379|
-------------------------------------------

pyard Redux quickly

pyard command can be used for quick reductions from the command line. Use --help option to see all the available options.

$ pyard --help
usage: pyard [-h] [-v] [-d DATA_DIR] [-i IMGT_VERSION] [-g GL_STRING]
             [-r {G,P,lg,lgx,W,exon,U2}] [--splits SPLITS]

py-ard tool to redux GL String

options:
  -h, --help            show this help message and exit
  -v, --version         IPD-IMGT/HLA DB Version number
  -d DATA_DIR, --data-dir DATA_DIR
                        Data directory to store imported data
  -i IMGT_VERSION, --imgt-version IMGT_VERSION
                        IPD-IMGT/HLA db to use for redux
  -g GL_STRING, --gl GL_STRING
                        GL String to reduce
  -r {G,P,lg,lgx,W,exon,U2}, --redux-type {G,P,lg,lgx,W,exon,U2}
                        Reduction Method
  --splits SPLITS       Find Broad and Splits

Reduce from command line by specifying any typing with -g or --gl option and the reduction method with -r or --redux-type option.

$ pyard -g 'A*01:AB' -r lgx
A*01:01/A*01:02

$ pyard --gl 'DRB1*08:XX' -r G
DRB1*08:01:01G/DRB1*08:02:01G/DRB1*08:03:02G/DRB1*08:04:01G/DRB1*08:05/ ...

$ pyard -i 3290 --gl 'A1' -r lgx # For a particular version of DB
A*01:01/A*01:02/A*01:03/A*01:06/A*01:07/A*01:08/A*01:09/A*01:10/A*01:12/ ...

If the -r option is left out, pyard will print out the result of all reduction methods.

$ pyard -g 'A*01:01:01:01'
Reduction Method: G
-------------------
A*01:01:01G

Reduction Method: P
-------------------
A*01:01P

Reduction Method: lg
--------------------
A*01:01g

Reduction Method: lgx
---------------------
A*01:01

Reduction Method: W
-------------------
A*01:01:01:01

Reduction Method: exon
----------------------
A*01:01:01

Reduction Method: U2
--------------------
A*01:01

py-ard knows about the broad/splits of serology and DNA, you can find by using --splits option to pyard command.

$ pyard --splits "A*10"
A*10 = A*25/A*26/A*34/A*66

$ pyard --splits B14
B14 = B64/B65

pyard-csv-reduce Batch Reduce a CSV file

pyard-csv-reduce can be used to batch process a CSV file with HLA typings. See documentation for detailed information about all the options.

py-ard REST Web Service

Run py-ard as a service so that it can be accessed as a REST service endpoint.

To start in debug mode, you can run the app.py script. The endpoint should then be available at localhost:8080

$ python app.py
 * Serving Flask app 'app'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8080
 * Running on http://10.0.1.37:8080
Press CTRL+C to quit

Docker deployment of py-ard REST Web Service

For deploying to production, build a Docker image and use that image for deploying to a server.

Build the docker image:

make docker-build

builds a Docker image named pyard-service:latest

Build the docker and run it with:

make docker

The endpoint should then be available at localhost:8080

py-ard's People

Contributors

dependabot[bot] avatar jbrelsf2-nmdp avatar mmaiers-nmdp avatar pbashyal-nmdp avatar rsajulga-nmdp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-ard's Issues

Validation: Validate fp number

Because Excel:

$ pyard --gl "A*0.559722222" -r lgx
Traceback (most recent call last):
....
    if isinstance(flags, RegexFlag):
RecursionError: maximum recursion depth exceeded while calling a Python object

Validation: Fails for empty allele

When reducing with ard.redux_gl('A*', 'lgx'), py-ard fails validating empty allele
if isinstance(flags, RegexFlag):
RecursionError: maximum recursion depth exceeded while calling a Python object

Reducing XX codes for HLA-C returned by the NMDP MAC service returns a blank string.

#!/usr/bin/env python
import requests
from pyard import ARD # NMDP allele codes

# initialize ARD object
ard = ARD()

ac = "C*02:XX"

url = "https://hml.nmdp.org/mac/api/decode?typing=" + ac
response = requests.get(url)
gl = response.text
print ("Service Call: " + url)

gl_ars = ard.redux_gl(gl,'lgx')

print ("GL: " + gl)
print ("REDUX: " + gl_ars)

gl = "DRB1*14:01/DRB1*14:54"
gl_ars = ard.redux_gl(gl,'lgx')
print ("DRB1")
print ("GL: " + gl)
print ("REDUX: " + gl_ars)

Output:

Service Call: https://hml.nmdp.org/mac/api/decode?typing=C*02:XX
GL: C*02:02/C*02:03/C*02:04/C*02:05/C*02:06/C*02:07/C*02:08/C*02:09/C*02:10/C*02:11/C*02:12/C*02:13/C*02:14/C*02:15/C*02:16/C*02:17/C*02:18/C*02:19/C*02:20/C*02:21/C*02:22/C*02:23/C*02:24/C*02:25Q/C*02:26/C*02:27/C*02:28/C*02:29/C*02:30/C*02:31/C*02:32/C*02:33/C*02:34/C*02:35/C*02:36/C*02:37/C*02:38N/C*02:39/C*02:40/C*02:42/C*02:43/C*02:44/C*02:45/C*02:46/C*02:47/C*02:48/C*02:49/C*02:50/C*02:51/C*02:52N/C*02:53/C*02:54/C*02:55/C*02:56/C*02:57/C*02:58/C*02:59/C*02:60/C*02:61/C*02:62/C*02:63/C*02:64/C*02:65/C*02:66/C*02:67Q/C*02:68/C*02:69/C*02:70/C*02:71/C*02:72/C*02:73/C*02:74/C*02:75/C*02:76/C*02:77/C*02:78/C*02:79/C*02:80/C*02:81/C*02:82/C*02:83/C*02:84/C*02:85/C*02:86/C*02:87/C*02:88/C*02:89/C*02:90/C*02:91/C*02:92N/C*02:93/C*02:94/C*02:95/C*02:96/C*02:97/C*02:98/C*02:99/C*02:100/C*02:101/C*02:102/C*02:103/C*02:104/C*02:105N/C*02:106/C*02:107/C*02:108/C*02:109/C*02:110/C*02:111/C*02:112/C*02:113/C*02:114/C*02:115/C*02:116/C*02:117/C*02:118/C*02:119/C*02:120/C*02:121N/C*02:122/C*02:123/C*02:124/C*02:125/C*02:126/C*02:127/C*02:128/C*02:129/C*02:130/C*02:131/C*02:132/C*02:133/C*02:134/C*02:135N/C*02:136/C*02:137/C*02:138/C*02:139/C*02:140/C*02:141/C*02:142/C*02:143/C*02:144/C*02:145/C*02:146/C*02:147/C*02:148/C*02:149/C*02:150N/C*02:151/C*02:152/C*02:153/C*02:154/C*02:155/C*02:156/C*02:157/C*02:158/C*02:159/C*02:160/C*02:161/C*02:162/C*02:163/C*02:164/C*02:165N/C*02:166/C*02:167/C*02:168/C*02:169N/C*02:170/C*02:171/C*02:172/C*02:173/C*02:174/C*02:175/C*02:176/C*02:177/C*02:178/C*02:179/C*02:180/C*02:181/C*02:182/C*02:183/C*02:184/C*02:185/C*02:186/C*02:187/C*02:188/C*02:189/C*02:190/C*02:191/C*02:192N/C*02:193N/C*02:194/C*02:195/C*02:196/C*02:197/C*02:198/C*02:199
REDUX:
DRB1
GL: DRB1*14:01/DRB1*14:54
REDUX: DRB1*14:01

need to expand G when using a higher resolution

when expanding to a resolution like W or "exon" need to expand G alleles to the list

>>> ard.redux_gl(ard.redux_gl('C*12:02', 'W'),"exon")
'C*12:02:01/C*12:02:02/C*12:02:03/C*12:02:04/C*12:02:05/C*12:02:06/C*12:02:07/C*12:02:08/C*12:02:09/C*12:02:10/C*12:02:11/C*12:02:12/C*12:02:13/C*12:02:14/C*12:02:15/C*12:02:16/C*12:02:17/C*12:02:18/C*12:02:19/C*12:02:20/C*12:02:21/C*12:02:22/C*12:02:23/C*12:02:24/C*12:02:25/C*12:02:26/C*12:02:27/C*12:02:28/C*12:02:29/C*12:02:30/C*12:02:31/C*12:02:32/C*12:02:33/C*12:02:34/C*12:02:35/C*12:02:36/C*12:02:37/C*12:02:38/C*12:02:39/C*12:02:40/C*12:02:41/C*12:02:42/C*12:02:43'
>>> ard.redux_gl(ard.redux_gl('C*12:02:01G', 'W'),"exon")
'C*12:02:01'

Differntiate allelic expansion with MAC codes

Depending upon whether the expanded alleles contains a : or not, the expansion should be treated differently.

	BJ	02/13
	BK	02/14
*	BM	13:05/13:06/13:07/13:09/14:05/14:08
	BN	01/04/08/13
	BP	01/04/05/07/08

A*01:BJ expands to A*01:02/A*01:13
while
A*01:BM expands to A*13:05/A*13:06/A*13:07/A*13:09/A*14:05/14:08

The test "test_004_redux_gl" fails

Cloning the repository on local and trying to run the following command:

python -m unittest

Following test fails:
test_004_redux_gl

FAIL: test_004_redux_gl (tests.test_pyard.TestPyard)

Traceback (most recent call last):
File "~/pyARD/tests/test_pyard.py", line 89, in test_004_redux_gl
self.assertTrue(self.ard.redux_gl(glstring, ard_type) == expected_gl)
AssertionError: False is not true

"lgx" and "lg" should only return 2-fields

 >>> ard.redux_gl('DRB1*14:06:01', "lgx")
'DRB1*14:06:01'

>>> ard.redux_gl('DRB1*14:05:01', "lgx")
'DRB1*14:05'

14:06:01 returns 3-field because this allele is not in a G group
14:05:01 is so it returns 2-field

14:06:01 should return 2-field for "lgx" and "lg"

Cannot initialize ARD object

After I pip install pyard, I have tried both

ard = ARD()

and

ard = ARD('3290')

Both gave me this error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1392, in connect
    server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hzhou/git/pyARD/pyard/pyard.py", line 131, in __init__
    urllib.request.urlretrieve(ars_url, ars_file)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

Then when I try to install from sauce, it gives me huge amount of error message. And the last one is shown here.

Command "/Users/hzhou/git/pyARD/venv/bin/python3 /Users/hzhou/git/pyARD/venv/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /private/var/folders/lr/ck2pm9ws4s98mtzbs_4lmbpm0000gp/T/pip-build-env-79s2ysoq/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- wheel setuptools Cython "numpy==1.9.3; python_version=='3.5'" "numpy==1.12.1; python_version=='3.6'" "numpy==1.13.1; python_version>='3.7'"" failed with error code 1 in None

I am using Python 3.7.3 by the way

Validation: Switch to using ValueError

Switch to using ValueError with messages

  • for invalid glstring instead of returning an empty string. Returning an empty string doesn't signal any error.

Can't import pyARD on Python 3.7

GenericMeta is not Python 3.7.

Traceback (most recent call last):
File "ard_test.py", line 3, in
from pyard import ARD # NMDP allele codes
File "/usr/local/lib/python3.7/site-packages/pyard/init.py", line 25, in
from .pyard import ARD
File "/usr/local/lib/python3.7/site-packages/pyard/pyard.py", line 29, in
from .base_model_ import Model
File "/usr/local/lib/python3.7/site-packages/pyard/base_model_.py", line 27, in
from .util import deserialize_model
File "/usr/local/lib/python3.7/site-packages/pyard/util.py", line 27, in
from typing import GenericMeta
ImportError: cannot import name 'GenericMeta' from 'typing' (/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/typing.py)

Some relevant links I found:

python/typing#532

python/typing#533

make the data_dir configurable

the current working directory is used as the data_dir which leaves a >500k line mac file behind.
add a commandline option and or a config parameter to use a standard directory.

The test "test_003_mac" fails

Cloning the repository on local and trying to run the following command:

python -m unittest

Following test fails:
test_003_mac

FAIL: test_003_mac (tests.test_pyard.TestPyard)

Traceback (most recent call last):
File "~/pyARD/tests/test_pyard.py", line 80, in test_003_mac
self.assertTrue(self.ard.redux_gl("A01:AB", 'G') == "A01:01:01G/A*01:02")
AssertionError: False is not true

pyARD does not handle "XX" codes

Module should either handle XX codes or throw an error that gives instructions on how to do conversion with MAC service.

gl = ard.redux_gl("A01:AB","g")
gl
'A
01:01/A01:02'
gl = ard.redux_gl("A
01:XX","g")
gl

redux_gl for serology leads to recursion error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 159, in redux_gl
    return "^".join(sorted(set([self.redux_gl(a, redux_type) for a in glstring.split("^")]),
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 159, in <listcomp>
    return "^".join(sorted(set([self.redux_gl(a, redux_type) for a in glstring.split("^")]),
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 180, in redux_gl
    return self.redux_gl(glstring, redux_type)
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 180, in redux_gl
    return self.redux_gl(glstring, redux_type)
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 180, in redux_gl
    return self.redux_gl(glstring, redux_type)
  [Previous line repeated 491 more times]
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 155, in redux_gl
    if not self.isvalid_gl(glstring):
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 394, in isvalid_gl
    return self.isvalid(glstring)
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 358, in isvalid
    if not self.is_mac(allele) and \
  File "/Volumes/Fat/Dropbox/MacBookPro/2020/src/git/py-ard/pyard/pyard.py", line 246, in is_mac
    return re.search(r":\D+", gl) is not None
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py", line 291, in _compile
    if isinstance(flags, RegexFlag):
RecursionError: maximum recursion depth exceeded in __instancecheck__```

pyARD should handle serologic nomenclature

Would be useful to expand serology types using WMDA rel_ser_dna.txt file then collapse to a GL string with ARD reduction.

Trying to expand HapLogic CT validation data that looks like this.

9639949 1 CAU 0 2 32 0 51 38 1 11:01 13:01 0 2 X 1 06:03 03:01 1 01:01 02:02 0 0 502377203 CAU 0 2 32 0 0 38 51 0 1 11:XX 13:XX 0 0 2 4 0 0 3 6 0 1 01:GHT 0 0 0 0 0

ard.redux_gl - little 'g' redux does not work for GL strings from MAC service

Big 'G' appears to work okay, but not seeing any reduction for little 'g'.

ac = "A*01:XX"
url = "https://hml.nmdp.org/mac/api/decode?typing="
response = requests.get(url + ac)
gl = response.text
gl_ars = ard.redux_gl(gl,'g')

print (gl)
print (gl_ars)

GL Service output for A*01:XX

A01:01/A01:02/A01:03/A01:04N/A01:06/A01:07/A01:08/A01:09/A01:10/A01:11N/A01:12/A01:13/A01:14/A01:15N/A01:16N/A01:17/A01:18N/A01:19/A01:20/A01:21/A01:22N/A01:23/A01:24/A01:25/A01:26/A01:27N/A01:28/A01:29/A01:30/A01:31N/A01:32/A01:33/A01:35/A01:36/A01:37/A01:38/A01:39/A01:40/A01:41/A01:42/A01:43/A01:44/A01:45/A01:46/A01:47/A01:48/A01:49/A01:50/A01:51/A01:52N/A01:53N/A01:54/A01:55/A01:56N/A01:57N/A01:58/A01:59/A01:60/A01:61/A01:62/A01:63/A01:64/A01:65/A01:66/A01:67/A01:68/A01:69/A01:70/A01:71/A01:72/A01:73/A01:74/A01:75/A01:76/A01:77/A01:78/A01:79/A01:80/A01:81/A01:82/A01:83/A01:84/A01:85/A01:86/A01:87N/A01:88/A01:89/A01:90/A01:91/A01:92/A01:93/A01:94/A01:95/A01:96/A01:97/A01:98/A01:99/A01:100/A01:101/A01:102/A01:103/A01:104/A01:105/A01:106/A01:107/A01:108/A01:109/A01:110/A01:111/A01:112/A01:113/A01:114/A01:115/A01:116/A01:117/A01:118/A01:119/A01:120/A01:121/A01:122/A01:123N/A01:124/A01:125/A01:126/A01:127/A01:128/A01:129/A01:130/A01:131/A01:132/A01:133/A01:134/A01:135/A01:136/A01:137/A01:138/A01:139/A01:140/A01:141/A01:142/A01:143/A01:144/A01:145/A01:146/A01:147Q/A01:148/A01:149/A01:150/A01:151/A01:152/A01:153/A01:154/A01:155/A01:156/A01:157/A01:158/A01:159/A01:160N/A01:161/A01:162N/A01:163/A01:164/A01:165/A01:166/A01:167/A01:168/A01:169/A01:170/A01:171/A01:172/A01:173/A01:174/A01:175/A01:176/A01:177/A01:178N/A01:179N/A01:180/A01:181/A01:182/A01:183/A01:184/A01:185/A01:186N/A01:187/A01:188/A01:189/A01:190/A01:191/A01:192/A01:193/A01:194/A01:195/A01:196/A01:197/A01:198/A01:199/A01:200/A01:201/A01:202/A01:203/A01:204/A01:205/A01:206/A01:207/A01:208Q/A01:209/A01:210/A01:211/A01:212/A01:213/A01:214/A01:215/A01:216/A01:217/A01:218/A01:219/A01:220/A01:221/A01:222/A01:223/A01:224/A01:225/A01:226/A01:227/A01:228Q/A01:229/A01:230/A01:231/A01:232/A01:233/A01:234/A01:235/A01:236/A01:237/A01:238/A01:239/A01:240N/A01:241/A01:242/A01:243/A01:244/A01:245/A01:246/A01:247N/A01:248Q/A01:249/A01:250N/A01:251/A01:252/A01:253/A01:254/A01:255/A01:256/A01:257/A01:258N/A01:259/A01:260/A*01:261

Output from ard.redux_gl

A01:01/A01:02/A01:03/A01:04N/A01:06/A01:07/A01:08/A01:09/A01:10/A01:100/A01:101/A01:102/A01:103/A01:104/A01:105/A01:106/A01:107/A01:108/A01:109/A01:110/A01:111/A01:112/A01:113/A01:114/A01:115/A01:116/A01:117/A01:118/A01:119/A01:11N/A01:12/A01:120/A01:121/A01:122/A01:123N/A01:124/A01:125/A01:126/A01:127/A01:128/A01:129/A01:13/A01:130/A01:131/A01:132/A01:133/A01:134/A01:135/A01:136/A01:137/A01:138/A01:139/A01:14/A01:140/A01:141/A01:142/A01:143/A01:144/A01:145/A01:146/A01:147Q/A01:148/A01:149/A01:150/A01:151/A01:152/A01:153/A01:154/A01:155/A01:156/A01:157/A01:158/A01:159/A01:15N/A01:160N/A01:161/A01:162N/A01:163/A01:164/A01:165/A01:166/A01:167/A01:168/A01:169/A01:16N/A01:17/A01:170/A01:171/A01:172/A01:173/A01:174/A01:175/A01:176/A01:177/A01:178N/A01:179N/A01:180/A01:181/A01:182/A01:183/A01:184/A01:185/A01:186N/A01:187/A01:188/A01:189/A01:18N/A01:19/A01:190/A01:191/A01:192/A01:193/A01:194/A01:195/A01:196/A01:197/A01:198/A01:199/A01:20/A01:200/A01:201/A01:202/A01:203/A01:204/A01:205/A01:206/A01:207/A01:208Q/A01:209/A01:21/A01:210/A01:211/A01:212/A01:213/A01:214/A01:215/A01:216/A01:217/A01:218/A01:219/A01:220/A01:221/A01:222/A01:223/A01:224/A01:225/A01:226/A01:227/A01:228Q/A01:229/A01:22N/A01:23/A01:230/A01:231/A01:232/A01:233/A01:234/A01:235/A01:236/A01:237/A01:238/A01:239/A01:24/A01:240N/A01:241/A01:242/A01:243/A01:244/A01:245/A01:246/A01:247N/A01:248Q/A01:249/A01:25/A01:250N/A01:251/A01:252/A01:253/A01:254/A01:255/A01:256/A01:257/A01:258N/A01:259/A01:26/A01:260/A01:261/A01:27N/A01:28/A01:29/A01:30/A01:31N/A01:32/A01:33/A01:35/A01:36/A01:37/A01:38/A01:39/A01:40/A01:41/A01:42/A01:43/A01:44/A01:45/A01:46/A01:47/A01:48/A01:49/A01:50/A01:51/A01:52N/A01:53N/A01:54/A01:55/A01:56N/A01:57N/A01:58/A01:59/A01:60/A01:61/A01:62/A01:63/A01:64/A01:65/A01:66/A01:67/A01:68/A01:69/A01:70/A01:71/A01:72/A01:73/A01:74/A01:75/A01:76/A01:77/A01:78/A01:79/A01:80/A01:81/A01:82/A01:83/A01:84/A01:85/A01:86/A01:87N/A01:88/A01:89/A01:90/A01:91/A01:92/A01:93/A01:94/A01:95/A01:96/A01:97/A01:98/A*01:99

ac = "A*01:XX"
url = "https://hml.nmdp.org/mac/api/decode?typing="
response = requests.get(url + ac)
gl = response.text
gl_ars = ard.redux_gl(gl,'G')

print (gl)
print (gl_ars)

Output from ard.redux_gl:

A01:01:01G/A01:02/A01:03:01G/A01:06/A01:07/A01:08/A01:09:01G/A01:10/A01:100/A01:101/A01:102/A01:104/A01:105/A01:106/A01:108/A01:110/A01:111/A01:112/A01:113/A01:114/A01:115/A01:116/A01:117/A01:118/A01:119/A01:11N/A01:12/A01:120/A01:121/A01:122/A01:123N/A01:124/A01:125/A01:126/A01:127/A01:128/A01:129/A01:13/A01:130/A01:131/A01:133/A01:134/A01:135/A01:136/A01:137/A01:138/A01:139/A01:14/A01:140/A01:143/A01:144/A01:145/A01:146/A01:147Q/A01:148/A01:149/A01:150/A01:151/A01:152/A01:153/A01:154/A01:156/A01:157/A01:158/A01:159/A01:15N/A01:160N/A01:161/A01:162N/A01:163/A01:164/A01:165/A01:166/A01:167/A01:168/A01:169/A01:16N/A01:17/A01:170/A01:171/A01:172/A01:173/A01:174/A01:175/A01:176/A01:178N/A01:179N/A01:180/A01:181/A01:182/A01:183/A01:184/A01:185/A01:186N/A01:187/A01:188/A01:189/A01:18N/A01:19/A01:190/A01:191/A01:192/A01:193/A01:194/A01:195/A01:196/A01:197/A01:198/A01:199/A01:20/A01:200/A01:201/A01:202/A01:203/A01:204/A01:205/A01:206/A01:207/A01:208Q/A01:209/A01:21/A01:210/A01:211/A01:213/A01:214/A01:215/A01:216/A01:218/A01:219/A01:220/A01:221/A01:222/A01:223/A01:224/A01:225/A01:226/A01:227/A01:228Q/A01:229/A01:23/A01:230/A01:231/A01:232/A01:233/A01:235/A01:236/A01:238/A01:239/A01:24/A01:240N/A01:241/A01:242/A01:243/A01:244/A01:245/A01:247N/A01:25/A01:250N/A01:254/A01:255/A01:256/A01:257/A01:258N/A01:259/A01:26/A01:260/A01:27N/A01:28/A01:29/A01:30/A01:31N/A01:33/A01:35/A01:36/A01:38/A01:39/A01:40/A01:41/A01:42/A01:43/A01:44/A01:46/A01:47/A01:48/A01:49/A01:50/A01:51/A01:52N/A01:53N/A01:54/A01:55/A01:57N/A01:58/A01:59/A01:60/A01:61/A01:62/A01:63/A01:64/A01:65/A01:66/A01:67/A01:68/A01:69/A01:70/A01:71/A01:72/A01:73/A01:74/A01:75/A01:76/A01:77/A01:78/A01:79/A01:80/A01:82/A01:83/A01:84/A01:85/A01:86/A01:88/A01:89/A01:90/A01:91/A01:92/A01:93/A01:94/A01:95/A01:96/A01:97/A01:98/A01:99

serology doesn't recognize broad/split relationships

In addition to the relationships between individual alleles and serologic types (rel_dna_ser) py-ard also needs to take into account the broad split relationships.

For instance, DR6 is a broad specificity with splits of DR13 and DR14.
Yet looking up DR6 returns only a few alleles that somehow map only to the broad.

>>> ard.redux_gl('DR6', 'lgx')
'DRB1*14:16/DRB1*14:17/DRB1*14:18/DRB1*14:186'

The desired behavior is for DR6 to return these alleles plus all of the alleles that are in the expansion of DR13 and DR14 as well.

The broad-split relationships are defined here:

https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/wmda/rel_ser_ser.txt

Implement smartsort for GL strings

ard.redux_gl should not reorders GL strings entirely in alphanumerical order. 2nd field should be sorted numerically independently from 1st field.

ac = "A*01:XX"
url = "https://hml.nmdp.org/mac/api/decode?typing="
response = requests.get(url + ac)
gl = response.text
gl_ars = ard.redux_gl(gl,'g')

print (gl)
print (gl_ars)

GL Service output for A*01:XX

A01:01/A01:02/A01:03/A01:04N/A01:06/A01:07/A01:08/A01:09/A01:10/A01:11N/A01:12/A01:13/A01:14/A01:15N/A01:16N/A01:17/A01:18N/A01:19/A01:20/A01:21/A01:22N/A01:23/A01:24/A01:25/A01:26/A01:27N/A01:28/A01:29/A01:30/A01:31N/A01:32/A01:33/A01:35/A01:36/A01:37/A01:38/A01:39/A01:40/A01:41/A01:42/A01:43/A01:44/A01:45/A01:46/A01:47/A01:48/A01:49/A01:50/A01:51/A01:52N/A01:53N/A01:54/A01:55/A01:56N/A01:57N/A01:58/A01:59/A01:60/A01:61/A01:62/A01:63/A01:64/A01:65/A01:66/A01:67/A01:68/A01:69/A01:70/A01:71/A01:72/A01:73/A01:74/A01:75/A01:76/A01:77/A01:78/A01:79/A01:80/A01:81/A01:82/A01:83/A01:84/A01:85/A01:86/A01:87N/A01:88/A01:89/A01:90/A01:91/A01:92/A01:93/A01:94/A01:95/A01:96/A01:97/A01:98/A01:99/A01:100/A01:101/A01:102/A01:103/A01:104/A01:105/A01:106/A01:107/A01:108/A01:109/A01:110/A01:111/A01:112/A01:113/A01:114/A01:115/A01:116/A01:117/A01:118/A01:119/A01:120/A01:121/A01:122/A01:123N/A01:124/A01:125/A01:126/A01:127/A01:128/A01:129/A01:130/A01:131/A01:132/A01:133/A01:134/A01:135/A01:136/A01:137/A01:138/A01:139/A01:140/A01:141/A01:142/A01:143/A01:144/A01:145/A01:146/A01:147Q/A01:148/A01:149/A01:150/A01:151/A01:152/A01:153/A01:154/A01:155/A01:156/A01:157/A01:158/A01:159/A01:160N/A01:161/A01:162N/A01:163/A01:164/A01:165/A01:166/A01:167/A01:168/A01:169/A01:170/A01:171/A01:172/A01:173/A01:174/A01:175/A01:176/A01:177/A01:178N/A01:179N/A01:180/A01:181/A01:182/A01:183/A01:184/A01:185/A01:186N/A01:187/A01:188/A01:189/A01:190/A01:191/A01:192/A01:193/A01:194/A01:195/A01:196/A01:197/A01:198/A01:199/A01:200/A01:201/A01:202/A01:203/A01:204/A01:205/A01:206/A01:207/A01:208Q/A01:209/A01:210/A01:211/A01:212/A01:213/A01:214/A01:215/A01:216/A01:217/A01:218/A01:219/A01:220/A01:221/A01:222/A01:223/A01:224/A01:225/A01:226/A01:227/A01:228Q/A01:229/A01:230/A01:231/A01:232/A01:233/A01:234/A01:235/A01:236/A01:237/A01:238/A01:239/A01:240N/A01:241/A01:242/A01:243/A01:244/A01:245/A01:246/A01:247N/A01:248Q/A01:249/A01:250N/A01:251/A01:252/A01:253/A01:254/A01:255/A01:256/A01:257/A01:258N/A01:259/A01:260/A*01:261

Output from ard.redux_gl

A01:01/A01:02/A01:03/A01:04N/A01:06/A01:07/A01:08/A01:09/A01:10/A01:100/A01:101/A01:102/A01:103/A01:104/A01:105/A01:106/A01:107/A01:108/A01:109/A01:110/A01:111/A01:112/A01:113/A01:114/A01:115/A01:116/A01:117/A01:118/A01:119/A01:11N/A01:12/A01:120/A01:121/A01:122/A01:123N/A01:124/A01:125/A01:126/A01:127/A01:128/A01:129/A01:13/A01:130/A01:131/A01:132/A01:133/A01:134/A01:135/A01:136/A01:137/A01:138/A01:139/A01:14/A01:140/A01:141/A01:142/A01:143/A01:144/A01:145/A01:146/A01:147Q/A01:148/A01:149/A01:150/A01:151/A01:152/A01:153/A01:154/A01:155/A01:156/A01:157/A01:158/A01:159/A01:15N/A01:160N/A01:161/A01:162N/A01:163/A01:164/A01:165/A01:166/A01:167/A01:168/A01:169/A01:16N/A01:17/A01:170/A01:171/A01:172/A01:173/A01:174/A01:175/A01:176/A01:177/A01:178N/A01:179N/A01:180/A01:181/A01:182/A01:183/A01:184/A01:185/A01:186N/A01:187/A01:188/A01:189/A01:18N/A01:19/A01:190/A01:191/A01:192/A01:193/A01:194/A01:195/A01:196/A01:197/A01:198/A01:199/A01:20/A01:200/A01:201/A01:202/A01:203/A01:204/A01:205/A01:206/A01:207/A01:208Q/A01:209/A01:21/A01:210/A01:211/A01:212/A01:213/A01:214/A01:215/A01:216/A01:217/A01:218/A01:219/A01:220/A01:221/A01:222/A01:223/A01:224/A01:225/A01:226/A01:227/A01:228Q/A01:229/A01:22N/A01:23/A01:230/A01:231/A01:232/A01:233/A01:234/A01:235/A01:236/A01:237/A01:238/A01:239/A01:24/A01:240N/A01:241/A01:242/A01:243/A01:244/A01:245/A01:246/A01:247N/A01:248Q/A01:249/A01:25/A01:250N/A01:251/A01:252/A01:253/A01:254/A01:255/A01:256/A01:257/A01:258N/A01:259/A01:26/A01:260/A01:261/A01:27N/A01:28/A01:29/A01:30/A01:31N/A01:32/A01:33/A01:35/A01:36/A01:37/A01:38/A01:39/A01:40/A01:41/A01:42/A01:43/A01:44/A01:45/A01:46/A01:47/A01:48/A01:49/A01:50/A01:51/A01:52N/A01:53N/A01:54/A01:55/A01:56N/A01:57N/A01:58/A01:59/A01:60/A01:61/A01:62/A01:63/A01:64/A01:65/A01:66/A01:67/A01:68/A01:69/A01:70/A01:71/A01:72/A01:73/A01:74/A01:75/A01:76/A01:77/A01:78/A01:79/A01:80/A01:81/A01:82/A01:83/A01:84/A01:85/A01:86/A01:87N/A01:88/A01:89/A01:90/A01:91/A01:92/A01:93/A01:94/A01:95/A01:96/A01:97/A01:98/A*01:99

Resolve pandas requirement for compatibility with py-gfe

Installing py-ard 0.6.1 with py-gfe 1.1.0 results in an error:

ERROR: Cannot install py-ard==0.6.1 and py-gfe==1.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    py-ard 0.6.1 depends on pandas>=1.1.4
    py-gfe 1.1.0 depends on pandas==0.25.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Also see issue 32 in py-gfe repo:
nmdp-bioinformatics/py-gfe#32

Validation: incorrect processing of MAC designations

>>> ard.redux_gl("C*05:ARAX", "lg")
'C*05:ARAXg'
>>> ard.redux_gl("C*05:ARAXBUG", "lg")
'C*05:ARAXBUg'
>>> ard.redux_gl("C*05:ARAXBUGSUCKS", "lg")
'C*05:ARAXBUGSUCKSg'

All three cases should return invalid MAC.
Note what the 2nd case did with the "G"

memory hog

py-ard is a memory hog. Try to reduce memory footprint. Maybe tie the dictionary for MACs to a file?

pickling of the mac generates the file in different locations and leaves behind intermediate files

Related to issue #20, depending on how the code is called the location of the mac.pickle file can either be in the cwd or ./venv/lib/python3.7/site-packages/pyard-0.0.10-py3.7.egg/pyard/mac.pickle.

Pickling does seem to work but the code in util.py leaves a copy of numeric.v3.zip in the current directory (it doesn't use data_dir for the path) and then leaves a copy of numer.v3.txt in "data_dir" along with "mac.txt" and "mac.pickle" that puts 4 copies of the MAC data on the filesystem. Add a feature to remove the intermediate files.

py-ard considers 'DRB3*NNNN' v2 typing

DRB3*NNNN or DRB3:XXXX passes the v2 test and fails reduction.

% pyard --gl "DRB3*NNNN" -r lgx                                                                              
...
    if isinstance(flags, RegexFlag):
RecursionError: maximum recursion depth exceeded while calling a Python object

redux_gl works for lgx but not G

ard.redux_gl("A*02:01/A*02:786", "lgx")
'A*02:01'

ard.redux_gl("A*02:01/A*02:786", "G")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/b/py-ard/pyard/pyard.py", line 173, in redux_gl
    key=functools.cmp_to_key(smart_sort_comparator)))
  File "/private/tmp/b/py-ard/pyard/smart_sort.py", line 76, in smart_sort_comparator
    a1_f3 = int(a1_fields[2])
ValueError: invalid literal for int() with base 10: '01/A*02'

DRB1*02:XX

broad codes like DRB102:XX should be validated and expanded appropriately (as the result of expanding DRB115:XX and DRB1*16:XX combine)

force a rebuild of sqlite3 database on re-install

Any time the py-ard module changes (not just a release version but even minor changes) should force a rebuild of the sqlite3 database. There are many ways to do this but one is to have a build id generated each time py-ard is pip installed and then store this in the sqlite3 database.

Then if a the module is re-installed, the build id will be checked against the database and if it mismatches, blow it away and force a rebuild.

handle invald alleles

pyARD appears to fail ungracefully when reducing a glstring that contains in valid alleles

>>> ard.redux_gl("A*01:01:01:01^B*07:02:01", "G")
'A*01:01:01G^B*07:02:01G'
>>> ard.redux_gl("A*01:01:01:01^B*07:02:01:12", "G")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmaiers/src/git/pyARD-fork/venv/lib/python3.7/site-packages/pyard-0.0.10-py3.7.egg/pyard/pyard.py", line 393, in redux_gl
    return "^".join(sorted(set([self.redux_gl(a, redux_type) for a in glstring.split("^")]), key=functools.cmp_to_key(loci_sort)))
  File "/Users/mmaiers/src/git/pyARD-fork/venv/lib/python3.7/site-packages/pyard-0.0.10-py3.7.egg/pyard/pyard.py", line 57, in loci_sort
    lb = b.split(":")
AttributeError: 'NoneType' object has no attribute 'split'

This could be addressed by implementing glstring validation behavior (parallel redux or as part of it) and then graceful error handling.

It seems to deal with single alleles OK

>>> ard.redux("A*01:99:99", "G")
>>> ard.redux("A*01:99", "G")
'A*01:99'

and

'A*01:99'
>>> ard.redux_gl("A*01:99:99", "G")

but when glstring characters are included (^, +, /, |) it fails

bug in py-ard when the lowest numbered allele differs from the group name

>>> from pyard import ARD
>>> ard = ARD(imgt_version="3430")
>>> ard.redux_gl("C*02:10", "lgx")
'C*02:10'
>>> ard.redux_gl("C*02:02:37", "lgx")
'C*02:10'
>>> ard.redux_gl("C*02:02:37", "lg")
'C*02:10g'
>>> ard.redux_gl("C*02:BC", "lg")
'C*02:03g/C*02:10g'
>>> ard = ARD(imgt_version="3290")
>>> ard.redux_gl("C*02:BC", "lg")
'C*02:02g/C*02:03g'
>>> ard = ARD(imgt_version="3430")
>>> ard.redux_gl("C*02:02", "lg")
'C*02:10g'
>>> ard.redux_gl("C*02:02:02", "lg")
'C*02:02g'
>>> ard.redux_gl("C*02:70", "lg")
'C*02:02g'
>>> ard.redux_gl("C*02:168", "lg")
'C*02:10g'
>>> ard = ARD(imgt_version="3290")
>>> ard.redux_gl("C*02:02", "lg")
'C*02:02g'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.