poonlab / sierra-local Goto Github PK
View Code? Open in Web Editor NEWRetrieve HIVdb algorithm as XML and apply locally to HIV sequences
License: GNU General Public License v3.0
Retrieve HIVdb algorithm as XML and apply locally to HIV sequences
License: GNU General Public License v3.0
This algorithm assigns integer scores to a sequence for each antiretroviral drug (ARV) for a score-based classifier of resistance.
Running sudo python3 setup.py install
for macOS (Darwin 16.7) yields the following:
/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'include_package_data'
warnings.warn(msg)
running install
running build
running build_py
running build_scripts
running install_lib
running install_scripts
changing mode of /opt/local/Library/Frameworks/Python.framework/Versions/3.5/bin/sierralocal to 755
running install_egg_info
Removing /opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sierralocal-0.0.1-py3.5.egg-info
Writing /opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sierralocal-0.0.1-py3.5.egg-info
Hence the sierralocal
binary is installed to the /opt
directory where Python 3.5 lives, and not /usr/local/bin
. Since /opt/local/..
is not in PATH
, the user can't run the binary without specifying a path.
If network connection is available and newer ASI version exists, then we should automatically download it to the local filesystem. If no network connection is available then program should print a warning and then proceed with the existing ASI file. (Consider distributing ASI in repo?)
>python sierralocal.py seq.fasta -o OUTPUT.json
File "sierralocal.py", line 56
print "scoring",query
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(print "scoring",query)?
As noted by user, this is a Python2->3 issue.
>python sierralocal.py input.fasta -o OUTPUT.json
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\path..\\..\\sierra-local-master./data/HIVDB.xml'
I think the user needed to pull the XML before running sierralocal.py
- provide more informative error message?
This is the nucamino output for the sequence FJ199571:
['FJ199571.A143M.C.616', '595', '695', '1', '303', 'L607M:ATG,V621I:ATT,T625S:AGT,D626E:GAA,H638Q:CAG,L646S:TCA,S668N:AAT,S674N:AAC,Q679H:CAT,K685R:AGG,A689S:TCA', '']
Variable firstAA
takes value 595
. In nucaminohook.NucAminoAligner
, the dictionary self.gene_map
is:
{'PR': (56, 154), 'RT': (155, 594), 'IN': (715, 1003)}
so the sequence apparently falls between RT and IN.
For example, for a sequence with PR and RT:
"consensus": "V",
"position": 134,
"AAs": "T",
should be:
"consensus": "V",
"position": 35,
"AAs": "T",
Mixture mutations were being counted as present, and thus contributing to resistance, more than once in certain cases. This is due to the current implementation not checking if that residue has been counted in both the partial score for combination-DRMs and single DRMs.
This leads to overestimate of resistance with a small specific set of mutations.
Thinking about converting the data structures to Classes:
i.e. list of Query(), one per query sequence, that each have attributes including list of Mutation(), NASeq, scores[ ], length, etc.,
compared to the current (messier but faster) implementation of individuals lists of mutations, scores, sequence headers, etc.
However, using objects results in reduced performance (167 items per second, down from 280 items per second), but is more readable/extensible. It's easier and more clear to work with sequence attributes in this way. Thoughts on the performance/clarity trade-off, @ArtPoon ?
Currently a deletion is identified as K70X
, although the sierra pipeline labels the same feature as K70-
.
Expected behaviour should be to run with the new XML, without having to rerun the script.
This results in the starting codon of PR (CCT) being omitted, as NucAmino assigns this AA position 58 (not 57). Consequently a large number of such sequences go unsubtyped because the resulting sequence is out of alignment.
If the results are the same, then package release to PyPI as version 0.0.2 (#35) and switch to submitting the paper.
Does anyone good with regex know how to capture negative integers as part of a string?
The current expression is: scores = re.findall('[0-9]+(?=\W)', drm.strip())
, which captures only the positive numbers in the drug resistance mutation string. However, this excludes the negative scores which is the last barrier I'm encountering to producing validated results.
Example content which I need to extract '-10'
, among the other numbers from:
Fix hard-coded directory and file paths (e.g. hivdb_file = "./data/HIVDB.xml"
) using Python 3.4+'s pathlib.Path()
. May also need os.path
. This should make sure that all the filepaths work on every OS.
The format of GLOBALRANGE
in the XML file is:
<GLOBALRANGE>
(-INF TO 9 => 1, 10 TO 14 => 2, 15 TO 29 => 3, 30 TO 59 => 4, 60 TO INF => 5)
</GLOBALRANGE>
I have parsed this expression into a dictionary where the orders are keys and the ranges are values. Should the ranges and orders be represented numerically or as strings (as below)?
{'1': '-INF TO 9', '2': '10 TO 14', '3': '15 TO 29', '4': '30 TO 59', '5': '60 TO INF'}
Number of discordant calls went up from 0 to 147. PR and RT results are fine.
I have temporarily removed subtyping and sequence trimming (see #40) but this does not seem to be responsible for the discrepancy. Need to zero in on one of the discordant sequences and see why we're getting different results.
Post-processing pairwise alignment, we need some data structure for recording the location and composition of sequence insertions. For example:
AGTCAT---TTTGGA # reference
AGTCATCCCTTTGGA # query
where AGT
is codon location 0 (zero-indexing), we can record as:
{'rpos': 1, 'seq': 'CCC'}
Ubuntu workstation running 4.15.0-32-generic
sudo apt install python3-pip
in order to get pip3
. This also installed several other packages including gcc
and python3.6
:
The following NEW packages will be installed:
build-essential dh-python dpkg-dev fakeroot g++ g++-7 gcc gcc-7
libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl
libasan4 libatomic1 libc-dev-bin libc6-dev libcilkrts5 libexpat1-dev
libfakeroot libgcc-7-dev libitm1 liblsan0 libmpx2 libpython3-dev
libpython3.6-dev libquadmath0 libstdc++-7-dev libtsan0 libubsan0
linux-libc-dev make manpages-dev python-pip-whl python3-dev
python3-distutils python3-lib2to3 python3-pip python3-setuptools
python3-wheel python3.6-dev
pip3 install sierralocal
ran successfully:
Collecting sierralocal
Downloading https://files.pythonhosted.org/packages/ce/a8/2501b1b3ad9b8abc7994b7ece3d325b3d765401c4ce7f5760bda765f5267/sierralocal-0.0.1.tar.gz (1.7MB)
100% |████████████████████████████████| 1.7MB 1.2MB/s
Building wheels for collected packages: sierralocal
Running setup.py bdist_wheel for sierralocal ... done
Stored in directory: /home/art/.cache/pip/wheels/e1/2f/28/fab709a2a25b279a11774fbae50520a3dfeb370ad940e84c68
Successfully built sierralocal
Installing collected packages: sierralocal
Successfully installed sierralocal-0.0.1
sierralocal
is not in PATH
:
art@Lio:~$ sierralocal
sierralocal: command not found
~/.local
, which is not in the default PATH
after Ubuntu 16.04. Calling the script directly works fine. We can also run the following:
export PATH=~/.local/bin:$PATH
scripts/retrieve_hivdb_data.py
script and ran it:
python3 retrieve_hivdb_data.py RT RT.fa
art@Lio:~$ sierralocal RT.fa
Error: could not retrieve HIVDB XML. Updating...
Unable to update HIVDB XML. Try running this script from the root directory.
Error: could not retrieve APOBEC DRM data. Updating...
Unable to update APOBEC DRMs. Try running this script from the root directory.
Traceback (most recent call last):
File "/home/art/.local/bin/sierralocal", line 11, in <module>
sys.exit(main())
File "/home/art/.local/lib/python3.6/site-packages/sierralocal/__init__.py", line 56, in main
algorithm = HIVdb()
File "/home/art/.local/lib/python3.6/site-packages/sierralocal/hivdb.py", line 47, in __init__
self.root = xml.parse(self.xml_filename).getroot()
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
tree.parse(source, parser)
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 586, in parse
source = open(source, "rb")
TypeError: expected str, bytes or os.PathLike object, not NoneType
To make it easier to run sierralocal without sudo privileges.
See #42
The current implementation with HXB2 pol test sequence
jasper@asus:~/git/sierra-local$ python sierra-local.py -i hxb2pol.fasta
results in the following:
{'ETR': 10, 'didanosine': 10, 'abacavir': 5, 'DTG': 5, 'EVG': 10, 'saquinavir/r': 75, 'nevirapine': 60, 'rilpivirine': 10, 'DRV/r': 30, 'SQV/r': 75, 'lopinavir/r': 90, 'AZT': 20, 'IDV/r': 90, 'atazanavir/r': 90, 'indinavir/r': 90, '3TC': 0, 'fosamprenavir/r': 75, 'FPV/r': 75, 'lamivudine': 0, 'TPV/r': 105, 'etravirine': 10, 'D4T': 20, 'ABC': 5, 'raltegravir': 60, 'tenofovir': 5, 'emtricitabine': 0, 'stavudine': 20, 'EFV': 60, 'NFV': 90, 'DDI': 10, 'nelfinavir': 90, 'dolutegravir': 5, 'tipranavir/r': 105, 'elvitegravir': 10, 'LPV/r': 90, 'NVP': 60, 'FTC': 0, 'RPV': 10, 'TDF': 5, 'zidovudine': 20, 'ATV/r': 90, 'efavirenz': 60, 'darunavir/r': 30, 'RAL': 60}
Compared to the output from HIVdb, these scores are higher than the predicted resistances:
sierrapy
gives the correct output.
Double checked this with just score_alg.py, just in case this was an issue with my implementation. Will take a closer look this weekend.
Ideal use case
import sierra
json = sierra.score(<path to FASTA>)
Somehow I've managed to make things a lot slower :-P
Before (commit c51f610..f4684e5):
art@orolo:~/git/sierra-local$ sierralocal test.fa
sierralocal/data/HIVDB_8.6.1.ca26d72f.xml
Found NucAmino binary nucamino-linux-amd64
AF013440.143P8W0.B.3 0 1
AF021278.JIDP2W0.B.14 2 0
Y16151.RO-BCI9.F.24 0 4
AY802655.CA9914.B.51 0 1
AF169102.Vent31B.B.60 0 6
AF153404.MilazzoTO1.B.69 0 3
100 sequences found in file test.fa.
sierralocal/data/HIVDB_8.6.1.ca26d72f.xml
Found HIVdb file sierralocal/data/HIVDB_8.6.1.ca26d72f.xml
Writing JSON to file test-local.json
Time elapsed: 1.8673 seconds (53.553 it/s)
After (last commit cb737b4ef2a99bcbaaafdefecb06c5d15b09dddb
):
art@orolo:~/git/sierra-local$ sierralocal test.fa
HIVdb version 8.6.1
Found NucAmino binary nucamino-linux-amd64
test.fa
100 sequences found in file test.fa.
Writing JSON to file test_results.json
Time elapsed: 6.5858 seconds (15.356 it/s)
https://github.com/hivdb/NucAmino
It's written in the Go programming language, with the option of using a Docker container..
PR outputs are reasonable but RT and INT are not aligned to the references.
Time permitting. I've done this before in other Python scripts:
https://github.com/cfe-lab/mixed-hcv/blob/master/mixed-hcv.py
import Tkinter as tk
import ttk
import tkFileDialog
class App:
def __init__(self, master):
self.__version__ = '0.1'
self.master = master
frame = tk.Frame(master) # simple container to hold other widgets
frame.pack() # resize to contents and make visible
Running sudo python setup.py install
on macOS 10.13.3 (High Sierra) with Python 3.5.4 (via MacPorts) installs the sierralocal
binary to /opt/local/Library/Frameworks/Python.framework/Versions/3.5/bin/sierralocal
.
Looks like pip will do the same thing. See pypa/pip#113
See https://hivdb.stanford.edu/page/hiv-subtyper/ for the process:
- If the distance to the closest matching reference sequence is greater than 11%, the subtype is reported as unknown.
- If the distance to the closest matching reference sequence is below the distance upper-limit for that subtype and the closest matching reference is a pure subtype, CRF01_AE, CRF02_AG, or other non-simple CRF (as defined in the subtype properties table), then this subtype is reported to the user.
- If the distance to the closest matching reference sequence is below the distance upper-limit for that subtype and the closest matching reference is a simple CRF, then the program determines whether the submitted sequence contains sufficient coverage on either side of the established breakpoint(s) for that CRF. If there is sufficient coverage, then the program reports this CRF. Otherwise, the program reports the parent subtype for the region covered by the submitted sequence.
- If the distance to the closest matching reference sequence is above the distance upper-limit for that subtype and the closest matching reference is a CRF01_AE, CRF02_AG, or a simple CRF, we examine the original table for the next-closest matching parent sequence. If the distance to the parent sequence is within 1% of the closest matching sequence, then the parent sequence is reported. If the distance to the parent sequence is more than 1% greater than the distance to the recombinant, then we consider the subtype to be a URF and each of the parents of the recombinant are reported separated by a plus sign. Subtype A is considered to be the parent of CRF01_AE and subtypes A and G are considered to be the parents of CRF02_AG.
- If the distance to the closest matching reference sequence is above the distance upper-limit for that subtype and the closest matching reference is a pure subtype, then this pure subtype is reported to the user with a warning indicating that further analysis using a more sophisticated subtyping program is indicated.
The benchmarks in the first version of the paper used subtype classifications from the sequence data (HIVdb predictions) - this skips the local subtyping step so naturally the algorithm runs much faster. For a fair comparison we need to measure computing time with local subtyping.
Sierra interprets this as a mutation to X (unknown).
NucAmino, the base of our implementation, outputs the gap as a mixture (i.e. not X) which is handled by the rest of the algorithm as the same as any other mixture.
In limited cases, this may lead to a DRM being 'present' at the deletion locus and contributing to the penalty score where the original algorithm would not.
Hi @ArtPoon
I'm trying to run sierra-local, but I get errors if I don't include an XML:
Error: could not retrieve HIVDB XML
Unable to update HIVDB XML. Try manually downloading the HIVdb ASI2.
Traceback (most recent call last):
File "/usr/local/bin/sierralocal", line 11, in <module>
exit_code = main.main()
File "/usr/local/lib/python3.5/dist-packages/sierralocal/main.py", line 157, in main
count, time_elapsed = sierralocal(args.fasta, args.outfile, args.xml, args.skipalign, args.forceupdate)
File "/usr/local/lib/python3.5/dist-packages/sierralocal/main.py", line 111, in sierralocal
algorithm = HIVdb(path=xml, forceupdate=forceupdate)
File "/usr/local/lib/python3.5/dist-packages/sierralocal/hivdb.py", line 58, in __init__
if not os.path.isfile(Path(os.path.dirname(__file__))/'data'/'apobec.tsv'):
File "/usr/lib/python3.5/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: argument should be string, bytes or integer, not PosixPath
while including the HIVdb XML gives the following error
Traceback (most recent call last):
File "/usr/local/bin/sierralocal", line 11, in <module>
exit_code = main.main()
File "/usr/local/lib/python3.5/dist-packages/sierralocal/main.py", line 157, in main
count, time_elapsed = sierralocal(args.fasta, args.outfile, args.xml, args.skipalign, args.forceupdate)
File "/usr/local/lib/python3.5/dist-packages/sierralocal/main.py", line 111, in sierralocal
algorithm = HIVdb(path=xml, forceupdate=forceupdate)
File "/usr/local/lib/python3.5/dist-packages/sierralocal/hivdb.py", line 58, in __init__
if not os.path.isfile(Path(os.path.dirname(__file__))/'data'/'apobec.tsv'):
File "/usr/lib/python3.5/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: argument should be string, bytes or integer, not PosixPath
Looks like a user permissions problem:
searching path /usr/local/lib/python3.6/dist-packages/sierralocal/data/HIVDB*.xml
Error: could not find local copy of HIVDB XML, attempting download...
Unable to update HIVDB XML. Try manually downloading the HIVdb ASI2.
[Errno 13] Permission denied: '/usr/local/lib/python3.6/dist-packages/sierralocal/data/HIVDB_8.7.9e470b87.xml'
/usr/local/lib/python3.6/dist-packages/sierralocal/data/apobec.tsv
Error: could not retrieve APOBEC DRM data
Unable to update APOBEC DRMs. Try manually downloading the APOBEC DRM TSV into data/apobec.tsv
Traceback (most recent call last):
File "/usr/local/bin/sierralocal", line 11, in <module>
exit_code = main.main()
File "/usr/local/lib/python3.6/dist-packages/sierralocal/main.py", line 174, in main
cleanup=args.cleanup, forceupdate=args.forceupdate)
File "/usr/local/lib/python3.6/dist-packages/sierralocal/main.py", line 107, in sierralocal
algorithm = HIVdb(path=xml, forceupdate=forceupdate)
File "/usr/local/lib/python3.6/dist-packages/sierralocal/hivdb.py", line 76, in __init__
self.root = xml.parse(str(self.xml_filename)).getroot()
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
tree.parse(source, parser)
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 586, in parse
source = open(source, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'None'
Installation history:
sudo apt install python3-pip
sudo -H python3 -m pip install --index-url https://test.pypi.org/simple/ sierralocal
If the sequence spans the entire reading frame, we get the correct mutations:
Sequence Name | POL FirstAA | POL LastAA | POL FirstNA | POL LastNA | POL Mutations |
---|---|---|---|---|---|
DJ264.France | 716 | 1003 | 1 | 864 | K729R:AGA,V746I:ATA,S754C:TGC,L816I:ATA,T827V:GTA,T839A:GCT,T840A:GCA,G849N:AAT,I850V:GTC,K851T:ACA,V916I:ATA,T921S:TCA,T933I:ATA,L949I:ATT,N969K:AAG,R984K:AAA,S998G:GGT |
This works fine and leads to correct output.
However, if the sequence is truncated, the mutations we get are off by a constant shift. Trying to figure out the math here to correct this dynamically.
Sequence Name | POL FirstAA | POL LastAA | POL FirstNA | POL LastNA | POL Mutations |
---|---|---|---|---|---|
T29INT.Norway | 795 | 1003 | 1 | 627 | Y798F:TTC,G821A:GCA,T827V:GTA,S834P:CCT,T840A:GCC,V841M:ATG,A848T:ACA,K851Q:CAA,Q852H:CAT,S868A:GCC,K875S:TCC,G878Q:CAG,I897V:GTT,R902RG*:NGA,S910T:ACT,I915L:TTA,V916I:ATA,I919L:TTA,T921S:TCA,D922Q:CAA,K926T:ACA,T933L:TTA,N937K:AAA,L949I:ATT,K955Q:CAG,N969K:AAA,S970G:GGG,K973E:GAG,K988Q:CAA,D993T:ACT,D1001T:ACA,D1003S:AGT |
What we should get from this:
Y83F, G106A, T112V, S119P, T125A, V126M, A133T, K136Q, Q137H, S153A, K160S, G163Q, I182V, R187R*G, S195T, I200L, V201I, I204L, T206S, D207Q, K211T, T218L, N222K, L234I, K240Q, N254K, S255G, K258E, K273Q, D278T, D286T
What we get instead:
D128Q K132T D209S T139L N175K N143K Y4F G27A T33V L155I S40P D199T T46A V47M S176G K179E A54T K57Q Q58H K194Q K161Q S74A D207T K81S G84Q I103V R108R S116T I121L V122I I125L T127S
They're off by 79. This leads to the wrong mutations being scored/omitted in the algorithm
Using the testing script, I ran ~1000 consecutive sequences from HIVdb of each of PR, IN, RT on both sierra-local and sierrapy, and compared the outputs:
Gene | Absolute Match Ratio |
---|---|
PR | 0.983 |
IN | 1.000 |
RT | 0.950 |
Remaining issues are related to polymorphisms and idiosyncrasies in how sierrapy gets its information versus what I get. When an error exists, they are systematically higher than expected by several specific intervals, leading me to suspect that it's the accidental inclusion of a set of specific DRMs in cases where sierrapy would not
See sierra/NucAminoAligner.java
for the method:
/**
* Input sequence may contain non-POL NAs in the beginning and the end, e.g. AF442565, KF134931.
*
* Following code did two things:
*
* 1. Remove large (length > SEQUENCE_TRIM_SITES_CUTOFF) low quality pieces from gene sequence
* 2. Keep small (length <= SEQUENCE_TRIM_SITES_CUTOFF) low quality pieces from gene sequence
*
* A site is considered "low quality" if it:
* - is unusual mutation;
* - has "X" in aas; or
* - has stop codon
*/
private static int[] trimLowQualities(
Sequence sequence, int firstAA, int lastAA,
Collection<Mutation> mutations, Collection<FrameShift> frameShifts) {
int badPcnt;
int trimLeft = 0;
int trimRight = 0;
int problemSites = 0;
int sinceLastBadQuality = 0;
int proteinSize = lastAA - firstAA + 1;
List<Integer> candidates = new ArrayList<>();
List<Boolean> invalidSites = new ArrayList<>(Collections.nCopies(proteinSize, false));
for (Mutation mut : mutations) {
int idx = mut.getPosition() - firstAA;
if (!mut.isUnsequenced() && (
mut.getHighestMutPrevalence() < SEQUENCE_SHRINKAGE_BAD_QUALITY_MUT_PREVALENCE
|| mut.getAAs().equals("X") || mut.isApobecMutation() || mut.hasStop())) {
invalidSites.set(idx, true);
}
}
for (FrameShift fs : frameShifts) {
int idx = fs.getPosition() - firstAA;
invalidSites.set(idx, true);
}
// forward scan for trimming left
for (int idx=0; idx < proteinSize; idx ++) {
if (sinceLastBadQuality > SEQUENCE_SHRINKAGE_WINDOW) {
break;
} else if (invalidSites.get(idx)) {
problemSites ++;
trimLeft = idx + 1;
badPcnt = trimLeft > 0 ? problemSites * 100 / trimLeft : 0;
if (badPcnt > SEQUENCE_SHRINKAGE_CUTOFF_PCNT) {
candidates.add(trimLeft);
}
sinceLastBadQuality = 0;
} else {
sinceLastBadQuality ++;
}
}
trimLeft = candidates.size() > 0 ? candidates.get(candidates.size() - 1) : 0;
candidates.clear();
// backward scan for trimming right
problemSites = 0;
sinceLastBadQuality = 0;
for (int idx=proteinSize-1; idx > -1; idx --) {
if (sinceLastBadQuality > SEQUENCE_SHRINKAGE_WINDOW) {
break;
} else if (invalidSites.get(idx)) {
problemSites ++;
trimRight = proteinSize - idx;
badPcnt = trimRight > 0 ? problemSites * 100 / trimRight : 0;
if (badPcnt > SEQUENCE_SHRINKAGE_CUTOFF_PCNT) {
candidates.add(trimRight);
}
sinceLastBadQuality = 0;
} else {
sinceLastBadQuality ++;
}
}
trimRight = candidates.size() > 0 ? candidates.get(candidates.size() - 1) : 0;
return new int[]{trimLeft, trimRight};
}
Testing on LANL subtype reference pol sequence set:
HIVdb version 8.6.1
Found NucAmino binary /usr/local/lib/python3.6/dist-packages/sierralocal/bin/nucamino-linux-amd64
Aligned pol.fa
Traceback (most recent call last):
File "/usr/local/bin/sierralocal", line 11, in <module>
exit_code = main.main()
File "/usr/local/lib/python3.6/dist-packages/sierralocal/main.py", line 160, in main
count, time_elapsed = sierralocal(args.fasta, args.outfile, args.xml, args.skipalign, args.forceupdate)
File "/usr/local/lib/python3.6/dist-packages/sierralocal/main.py", line 132, in sierralocal
file_trims, subtypes = scorefile(input_file, database, skipalign)
File "/usr/local/lib/python3.6/dist-packages/sierralocal/main.py", line 75, in scorefile
aligner.get_mutations(input_file, do_subtyping=True)
File "/usr/local/lib/python3.6/dist-packages/sierralocal/nucaminohook.py", line 178, in get_mutations
assert gene is not None, "Fatal error in get_mutations"
AssertionError: Fatal error in get_mutations
esp. after testing on Windows and macOS
This would require the following steps:
I've tracked down most of the bugs and errors. Currently sierra-local can process 97-100% of a large sample size verified against sierrapy. However, the main issue seems to be with polymorphic variations.
Example:
Nucamino returns this mutation list for 71-1.Cameroon :
[...], Q95KNRSQH:MRK, [...]
I interpret this currently as Q at residue 95 -> any one of KNRSQH. However, sierrapy interprets this as Q95X. This causes me to wrongfully add a score if any of Q95KNRSQH are associated with a drug resistance. In other cases, sierrapy interprets polymorphisms in the way I currently do, and there are no issues there.
Is there a standard way to determine when an amino acid is considered "unknown--X" vs just a list of polymorphisms?
There's also an issue with sierrapy choosing some polymorphisms over others.
Let's say a sequence has the mutation V82TA:RCC. Sierrapy sometimes chooses to exclude the score for V82T alone, and other times uses either V82A or V82T in its scoring. I'm not exactly sure how they determine this. Will keep digging.
This should be a dictionary. We can base the structure of this on the XML output of the HIVdb algorithm:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<DrugResistance_Interpretation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://hivdb.stanford.edu/DR/schema/sierra.xsd">
<algorithmName>HIVDB</algorithmName>
<algorithmVersion>8.4</algorithmVersion>
<webServiceVersion>2.0</webServiceVersion>
<schemaVersion>1.1</schemaVersion>
<submissionName/>
<dateTime>2017-08-22 18:19:47.978</dateTime>
<result>
<success>true</success>
<inputSequence>
<md5sum>f8a9b4e6cfb7de1b4be1cb0eabd1a1cd</md5sum>
<name>userinput unamed sample: 1</name>
<sequence>CCTCAGGTCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTCAGATTTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAG</sequence>
</inputSequence>
<GAHypermutated>false</GAHypermutated>
<geneData>
<gene>PR</gene>
<present>true</present>
<consensus>PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF</consensus>
<alignedNASequence>CCTCAGGTCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTT</alignedNASequence>
<alignedAASequence>PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF</alignedAASequence>
<firstAA>1</firstAA>
<lastAA>99</lastAA>
<subtype>
<type>B (0.00%)</type>
<percentSimilarity>100</percentSimilarity>
</subtype>
<mutation>
<classification>OTHER</classification>
<type>mutation</type>
<mutationString>I3V</mutationString>
<wildType>I</wildType>
<position>3</position>
<nucleicAcid>GTC</nucleicAcid>
<translatedNA>V</translatedNA>
</mutation>
<mutation>
<classification>OTHER</classification>
<type>mutation</type>
<mutationString>N37S</mutationString>
<wildType>N</wildType>
<position>37</position>
<nucleicAcid>AGT</nucleicAcid>
<translatedNA>S</translatedNA>
</mutation>
<quality/>
</geneData>
<geneData>
<gene>RT</gene>
<present>true</present>
<consensus>PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL</consensus>
<alignedNASequence>CCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTCAGATTTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAG</alignedNASequence>
<alignedAASequence>PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAE</alignedAASequence>
<firstAA>1</firstAA>
<lastAA>300</lastAA>
<subtype>
<type>B (0.00%)</type>
<percentSimilarity>100</percentSimilarity>
</subtype>
<mutation>
<classification>OTHER</classification>
<type>mutation</type>
<mutationString>K122E</mutationString>
<wildType>K</wildType>
<position>122</position>
<nucleicAcid>GAA</nucleicAcid>
<translatedNA>E</translatedNA>
</mutation>
<mutation>
<classification>OTHER</classification>
<type>mutation</type>
<mutationString>F214L</mutationString>
<wildType>F</wildType>
<position>214</position>
<nucleicAcid>CTT</nucleicAcid>
<translatedNA>L</translatedNA>
</mutation>
<mutation>
<classification>OTHER</classification>
<type>mutation</type>
<mutationString>A272P</mutationString>
<wildType>A</wildType>
<position>272</position>
<nucleicAcid>CCA</nucleicAcid>
<translatedNA>P</translatedNA>
</mutation>
<mutation>
<classification>OTHER</classification>
<type>mutation</type>
<mutationString>K277R</mutationString>
<wildType>K</wildType>
<position>277</position>
<nucleicAcid>AGG</nucleicAcid>
<translatedNA>R</translatedNA>
</mutation>
<quality/>
</geneData>
<geneData>
<gene>IN</gene>
<present>false</present>
</geneData>
<sequenceQualityCounts>
<insertions>0</insertions>
<deletions>0</deletions>
<ambiguous>0</ambiguous>
<stops>0</stops>
<frameshifts>0</frameshifts>
</sequenceQualityCounts>
<drugScore>
<drugCode>ATV/r</drugCode>
<genericName>atazanavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>DRV/r</drugCode>
<genericName>darunavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>FPV/r</drugCode>
<genericName>fosamprenavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>IDV/r</drugCode>
<genericName>indinavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>LPV/r</drugCode>
<genericName>lopinavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>NFV</drugCode>
<genericName>nelfinavir</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>SQV/r</drugCode>
<genericName>saquinavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>TPV/r</drugCode>
<genericName>tipranavir/r</genericName>
<type>PI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>ABC</drugCode>
<genericName>abacavir</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>AZT</drugCode>
<genericName>zidovudine</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>D4T</drugCode>
<genericName>stavudine</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>DDI</drugCode>
<genericName>didanosine</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>FTC</drugCode>
<genericName>emtricitabine</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>3TC</drugCode>
<genericName>lamivudine</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>TDF</drugCode>
<genericName>tenofovir</genericName>
<type>NRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>EFV</drugCode>
<genericName>efavirenz</genericName>
<type>NNRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>ETR</drugCode>
<genericName>etravirine</genericName>
<type>NNRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>NVP</drugCode>
<genericName>nevirapine</genericName>
<type>NNRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<drugScore>
<drugCode>RPV</drugCode>
<genericName>rilpivirine</genericName>
<type>NNRTI</type>
<score>0</score>
<resistanceLevel>1</resistanceLevel>
<resistanceLevelText>Susceptible</resistanceLevelText>
<threeStepResistanceLevel>S</threeStepResistanceLevel>
</drugScore>
<scoreTable>
<scoreRow>
<score value="PI"/>
<score value="ATV/r"/>
<score value="DRV/r"/>
<score value="FPV/r"/>
<score value="IDV/r"/>
<score value="LPV/r"/>
<score value="NFV"/>
<score value="SQV/r"/>
<score value="TPV/r"/>
</scoreRow>
<scoreRow>
<score value="Total:"/>
<score class="PI" drug="ATV/r" value="0"/>
<score class="PI" drug="DRV/r" value="0"/>
<score class="PI" drug="FPV/r" value="0"/>
<score class="PI" drug="IDV/r" value="0"/>
<score class="PI" drug="LPV/r" value="0"/>
<score class="PI" drug="NFV" value="0"/>
<score class="PI" drug="SQV/r" value="0"/>
<score class="PI" drug="TPV/r" value="0"/>
</scoreRow>
</scoreTable>
<scoreTable>
<scoreRow>
<score value="NRTI"/>
<score value="ABC"/>
<score value="AZT"/>
<score value="D4T"/>
<score value="DDI"/>
<score value="FTC"/>
<score value="3TC"/>
<score value="TDF"/>
</scoreRow>
<scoreRow>
<score value="Total:"/>
<score class="NRTI" drug="ABC" value="0"/>
<score class="NRTI" drug="AZT" value="0"/>
<score class="NRTI" drug="D4T" value="0"/>
<score class="NRTI" drug="DDI" value="0"/>
<score class="NRTI" drug="FTC" value="0"/>
<score class="NRTI" drug="3TC" value="0"/>
<score class="NRTI" drug="TDF" value="0"/>
</scoreRow>
</scoreTable>
<scoreTable>
<scoreRow>
<score value="NNRTI"/>
<score value="EFV"/>
<score value="ETR"/>
<score value="NVP"/>
<score value="RPV"/>
</scoreRow>
<scoreRow>
<score value="Total:"/>
<score class="NNRTI" drug="EFV" value="0"/>
<score class="NNRTI" drug="ETR" value="0"/>
<score class="NNRTI" drug="NVP" value="0"/>
<score class="NNRTI" drug="RPV" value="0"/>
</scoreRow>
</scoreTable>
</result>
</DrugResistance_Interpretation>
The root of the definitions is the 'DEFINITIONS' element in root.getchildren() (index 3).
>>> root.getchildren()
[<Element 'ALGNAME' at 0x10355e3b8>, <Element 'ALGVERSION' at 0x10355e318>, <Element 'ALGDATE' at 0x10355e368>, <Element 'DEFINITIONS' at 0x10355e408>, <Element 'DRUG' at 0x1037800e8>, <Element 'DRUG' at 0x103780368>, <Element 'DRUG' at 0x1037805e8>, <Element 'DRUG' at 0x103780868>, <Element 'DRUG' at 0x103780ae8>, <Element 'DRUG' at 0x103780d68>, <Element 'DRUG' at 0x103784f48>, <Element 'DRUG' at 0x103784d68>, <Element 'DRUG' at 0x103784ae8>, <Element 'DRUG' at 0x103784868>, <Element 'DRUG' at 0x1037845e8>, <Element 'DRUG' at 0x103784368>, <Element 'DRUG' at 0x1037840e8>, <Element 'DRUG' at 0x103786e08>, <Element 'DRUG' at 0x103786b88>, <Element 'DRUG' at 0x103786908>, <Element 'DRUG' at 0x103786688>, <Element 'DRUG' at 0x103786408>, <Element 'DRUG' at 0x103786188>, <Element 'DRUG' at 0x10377dea8>, <Element 'DRUG' at 0x10377dc28>, <Element 'DRUG' at 0x10377d9a8>, <Element 'MUTATION_COMMENTS' at 0x10377d728>]
>>> root.getchildren()[3]
<Element 'DEFINITIONS' at 0x10355e408>
Currently, I iterate over the definition elements with:
for element in root.getchildren()[3]:
It would be better to get the index of the 'DEFINITIONS' element from the list rather than hardcoding [3]
. The issue I'm having is that I am not able to use a regular expression to look for the item in the list that contains DEFINITION
, since the items are not str type.
>>> type(root.getchildren()[3])
<class 'xml.etree.ElementTree.Element'>
Use ArtPoon/gotoh2 for pairwise alignment to reference
sierralocal.py
doesn't exist any moresubprocess
calls with direct calls to moduleThere are some resources that we have obtained once from the Stanford site, e.g., whether mutations are major or minor resistance variants -- ideally sierra-local should check the existing files against online resources and update dynamically when necessary.
From HIVdb FAQ:
The HIV Drug Resistance Database uses several different classification schemes. The HIVdb program and several of the query pages (such as detailed treatment queries, detailed mutation queries, advanced queries, detailed phenotype queries, and reference page) use a 3-way classification scheme that classifies protease mutations into "Major", "Minor", and "Other", RT mutations into "NRTI", "NNRTI", and "Other" and integrase mutations into "Major", "Minor" and "Other". According to this scheme, "NRTI" and "NNRTI" mutations cannot be further subdivided without the output becoming too unwieldy requiring 5 rather than 3 categories. Moreover, the original sub-classification of mutations into "Major" and "Minor" categories began with the PIs and so most HIV drug resistance researchers and clinicians are comfortable with this classification scheme.
Make sure to process the mutations when writing the JSON into these classifications, which depends on their location. There isn't a readily-available dataset that allows for this. I either have to scrape the HIVdb website (example) or find a way to do it using the current XML-based algo.
@jzpero wrote a small test method under subtyper.main()
that calculates the closest reference sequence to each sequence in the same set -- in other words, a sequence's closest match should be itself. The following sequence fails this criterion for both v0.0.1
and the current refactoring:
AH011123|A|HXB22253-5093|year:1997|country:CM|author:None CRF22_01A1 A
It isn't clear whether the original HIVdb subtyper would obtain the same result.
There's a minor bug where overlapping DRM combinations aren't being selected to give the highest penalty score.
Example:
Combination | DRMs | Score |
---|---|---|
1 | M41L + T215F | 15 |
2 | M41L + T215Y | 5 |
Here, combo 1 should be chosen if the mutation at 215 is a mixture of FY, since the penalty score is the highest. However, the algorithm encounters combo 2 first, then excludes combo 1 since a mutation at 41 and 215 has already been recorded.
solution being explored: record ALL mutations that have scores and are present, and then if overlaps are present, select the one with the maximal absolute score, leaving the other ones behind
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.