genedx / phenopy Goto Github PK
View Code? Open in Web Editor NEWPhenotype comparison tools using semantic similarity.
License: Other
Phenotype comparison tools using semantic similarity.
License: Other
Hello,
Thank you for the package.
Two issues when trying to use this in anaconda on Windows 10:
config_directory = os.path.join(os.environ.get('HOME'), f'.{__project__}')
will give an error and should be changed to
config_directory = os.path.join(os.environ.get('CONDA_PREFIX'), f'.{__project__}')
Hope this helps other people!
Many thanks
HPO added a "#" to the header line with column names in the phenotype.hpoa
file.
# compile list of HPO terms to include in the calculation, term plus children
hpo_id_plus_children = [hpo_id] + list(nx.ancestors(hpo_network, hpo_id))
Related to this, what would the impact be of taking only immediate children rather than all children? That is, are we giving too much IC to less specific terms with this method?
fresh install from pip results in
FileNotFoundError: [Errno 2] No such file or directory: '/miniconda3/lib/python3.7/site-packages/phenopy/data/phenopy.wv.model.txt.gz'
Dear Kevin,
I would like to calculate the similarity for a few genes (~2000).
I annotated these genes with the HPO codes from the human phenotype ontology webpage (http://compbio.charite.de/jenkins/job/hpo.annotations/lastSuccessfulBuild/artifact/util/annotation/genes_to_phenotype.txt).
I obtained reshaped and got a file like this:
A4GALT . HP:0010970|HP:0000006
AAAS . HP:0040281|HP:0040282|HP:0040283|HP:0011463|HP:0001278|HP:0000972|HP:0012332|HP:0008259|HP:0004322|HP:0001251|HP:0000648|HP:0000007|HP:0002571|HP:0004319|HP:0001263|HP:0008163|HP:0001249|HP:0009916|HP:0003487|HP:0007002|HP:0000252|HP:0001347|HP:0000522|HP:0003676|HP:0000649|HP:0001324|HP:0000953|HP:0001260|HP:0000846|HP:0001250|HP:0007440|HP:0000505|HP:0000982|HP:0001761|HP:0010486|HP:0000830|HP:0007556|HP:0002093|HP:0001430|HP:0001252|HP:0002376|HP:0000612|HP:0000407
AASS . HP:0000119|HP:0000752|HP:0001083|HP:0001903|HP:0003593|HP:0001250|HP:0002161|HP:0000736|HP:0001252|HP:0100543|HP:0000007|HP:0001256|HP:0000750|HP:0001249
ABAT . HP:0025356|HP:0000278|HP:0000098|HP:0007291|HP:0000007|HP:0002415|HP:0001321|HP:0000494|HP:0001347|HP:0006829|HP:0001263|HP:0001274|HP:0001250|HP:0001254|HP:0025430|HP:0003819
ABCA4 . HP:0040280|HP:0040281|HP:0040282|HP:0040283|HP:0040284|HP:0000006|HP:0007663|HP:0000662|HP:0001133|HP:0000608|HP:0000512|HP:0000543|HP:0000007|HP:0007737|HP:0007722|HP:0000510|HP:0007984|HP:0007843|HP:0000548|HP:0000580|HP:0000572|HP:0008035|HP:0000639|HP:0000618|HP:0000405|HP:0000603|HP:0000135|HP:0000493|HP:0000463|HP:0001249|HP:0007703|HP:0000613|HP:0000987|HP:0030329|HP:0000649|HP:0000648|HP:0000551|HP:0008046|HP:0000407|HP:0007704|HP:0007814|HP:0008736|HP:0000035|HP:0008002|HP:0007675|HP:0000431|HP:0000610|HP:0000518|HP:0000602|HP:0001513|HP:0008059|HP:0000501|HP:0000563|HP:0000842|HP:0030500|HP:0001347|HP:0000505|HP:0005978|HP:0011504|HP:0011462|HP:0011463|HP:0003621|HP:0007994
ABCB11 . HP:0040283|HP:0000989|HP:0002014|HP:0003155|HP:0000952|HP:0001081|HP:0003593|HP:0001394|HP:0001744|HP:0001046|HP:0002240|HP:0002630|HP:0002908|HP:0000007|HP:0003819|HP:0004322|HP:0001508|HP:0001406|HP:0001402
which I think is the correct format for phenopy. I then used the command:
phenopy score gene_lists_with_HPO.txt --threads 12 --self
and I got as output something like this:
#query entity_id score
A4GALT A4GALT 1.0
A4GALT ABCD1 0.0
A4GALT ACAT1 0.010405043493187662
A4GALT ACVRL1 0.03336405048957507
A4GALT ADGRG1 0.0
A4GALT AGXT 0.009234121604447244
A4GALT AKT1 0.003509945769583653
A4GALT ALG1 0.0
A4GALT AMER1 0.0
However, the identity for some genes are not 1 as I was expecting. For instance:
ABCB7 ABCB7 0.5558528984777618
Would you expect something like this? How would you explain it?
Should I use a different --summarization-method ?
Best regards,
Luca
020-03-04 01:40:59,906 - phenopy - INFO - Incorrect url specified for HPO files: http://compbio.charite.de/jenkins/job/hpo.annotations.current/lastSuccessfulBuild/artifact/misc_2018/phenotype.hpoa
Hey @vgainullin, can you add the w2v binary to the phenoseries branch?
https://cran.r-project.org/package=SimReg
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827100/
should we add this as a similarity metric?
We should be returning the full float.
G.node
has been removed from the library in 2.4 and should be refactored here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.