cthoyt / cthoyt.github.io Goto Github PK

My personal website, served at https://cthoyt.com

License: Creative Commons Attribution 4.0 International

HTML 88.00% Python 12.00%

biological-expression-language machine-learning knowledge-graphs knowledge-graph-embeddings drug-discovery drug-repurposing target-prioritization target-validation biocuration ontologies bioinformatics cheminformatics toxicology wikidata obofoundry deep-learning proteochemometrics utterances drug-synergy artificial-intelligence

cthoyt.github.io's Introduction

cthoyt.github.io

My personal website, served at https://cthoyt.com

Serve Locally

docker run --rm --volume="$PWD:/srv/jekyll" -p 4000:4000 -it jekyll/jekyll:4.2.0 jekyll serve

Note that the 4.2.0 tag is important - 4.2.2 (latest, released ~2022) does not work.

License

CC BY 4.0

cthoyt.github.io's People

Contributors

Stargazers

Watchers

Forkers

hillarychoyt alhoyt senecacreek mazpar

cthoyt.github.io's Issues

Unlocking UMLS | Biopragmatics

The Unified Medical Language System (UMLS) is a widely used biomedical and clinical vocabulary maintained by the United States National Library of Medicine. However, it is notoriously difficult to access and work with due to licensing restrictions and its complex download system. In the same vein as my previous posts about DrugBank and ChEMBL, this post describes open source software I’ve developed for downloading and working with this data. It also works for RxNorm, SemMedDB, SNOMED-CT, and any other data accessible through the UMLS Terminology Services (UTS) ticket granting system.

https://cthoyt.com/2023/09/01/umls.html

Add funding list

Making DrugBank Reproducible | Biopragmatics

If you’re reading my blog, there’s a pretty high chance you’ve used DrugBank, a database of drug-target interations, drug-drug interactions, and other high-granularity information about clinically-studied chemicals. DrugBank has two major problems, though: its data are password-protected, and its license does not allow redistribution. Time to solve these problems once and for all.

https://cthoyt.com/2020/12/14/taming-drugbank.html

Create issues page on github

Discussions and Follow-ups from Biocuration 2024 | Biopragmatics

I’ve just returned from the 17th Annual International Biocuration Conference at the Indian Biological Data Centre (IBDC) in Faridabad, India. I wanted to highlight some of the interesting conversations I had while I was there, and ideas for follow-up. Most were centered around the Bioregistry and the Semantic Mapping Assembler and Reasoner (SeMRA), which I gave an oral presentation on.

https://cthoyt.com/2024/03/11/biocuration2024-discussions.html

Pythagorean Mean Rank Metrics | Biopragmatics

The mean rank (MR) and mean reciprocal rank (MRR) are among the most popular metrics reported for the evaluation of knowledge graph embedding models in the link prediction task. While they are reported on very different intervals ($\text{MR} \in [1,\infty)$ and $\text{MRR} \in (0,1]$, their deep theoretical connection can be elegantly described through the lens of Pythagorean means. This blog post describes ideas Max Berrendorf shared with me that I recently implemented in PyKEEN and later wrote up as a full manuscript.

https://cthoyt.com/2021/04/19/pythagorean-mean-ranks.html

Biosemantics vs. Biopragmatics | Biopragmatics

In language, semantics describe the names and meanings of words. The bioinformatics community has aptly adopted biosemantics as a concept that encompasses the issues with the names and meanings of biological entities, usually in natural language processing and data integration. However, semantics does not capture the context of words, and biosemantics fails to describe the biological context and complex relationships between biological entities.

https://cthoyt.com/2020/01/22/biosemantics-versus-biopragmatics.html

Re-implementing the N2T ARK Resolver | Biopragmatics

Archival Resource Keys (ARKs) are flavor of persistent identifiers like DOIs, URNs, and Handles that have the benefit of being free, flexible with what metadata gets attached, and natively able to resolve to web pages. Name-to-Thing (N2T) implements a resolver for a variety of ARKs, so this blog post is about how that resolver can be re-implemented with the curies Python package.

https://cthoyt.com/2023/04/11/n2t-ark-resolver.html

Curating Publications on Wikidata | Biopragmatics

This blog post is a tutorial on how to curate the links between a researcher and scholarly works (e.g., pre-prints, publications, presentations) on Wikidata using Scholia and the Author Disambiguator tool.

https://cthoyt.com/2022/02/12/wikidata-publications.html

Add collaborator list

externalise css

add github rss

write post on the banana problem

A local unique identifier is the value within a semantic space. For example, MONDO has the local unique identifier 0005301 for "multiple sclerosis". If you want to make a URI, you take the MONDO URI prefix (http://purl.obolibrary.org/obo/MONDO_) and concatenate the local unique identifier on the end to make a URI (i.e., http://purl.obolibrary.org/obo/MONDO_0005301). Similarly, if you want to make a compact URI (CURIE), you take the MONDO CURIE prefix (MONDO) and concatenate a semicolon : then the local unique identifier (i.e., MONDO:0005301)
Unfortunately, there are a lot of places where people mistakenly write a whole CURIE in a place where a local unique identifier should go. This means someone writes MONDO:0005301 where they should have written 0005301. We call this a redundant prefix in the local unique identifier. This is also colloquially called the "banana problem"
Wikidata is one place where this happens. Identifiers.org also has propagated this mistake to many places (though MONDO does not appear in Identifiers.org, it might be the case that the submitter for the Wikidata property was influenced by how other properties did it, which were in turn influenced by Identifirs.org)
TL;DR, Wikidata has a lot of wrong ways of writing LUIDs in its properties referring to ontologies, MONDO being one example

add soundcloud rss

You Should Use a Private Email on Publications | Biopragmatics

While we were recently preparing to submit a manuscript, the lead author said they looked at my last few papers and noticed I always used a private email address instead of an institutional email address. They asked, perplexed, if they should also use my private email address with our submission. The answer was a resounding yes; always use a private email address. Here’s why.

https://cthoyt.com/2022/02/06/use-your-personal-email.html

Archive to Wayback Machine

https://github.com/caltechlibrary/waystation

Comment section

https://utteranc.es/

Inspector Javert’s Xref Database | Biopragmatics

On top the issue of resolving identifiers to their names, the bioinformatics community has a hard time figuring out when two identifiers from different databases are equivalent. You know who else has the same problem? Inspector Javert. Get ready for a Les Miserables-themed post on how to address this long-standing problem.

https://cthoyt.com/2020/04/19/inspector-javerts-xref-database.html

Add missing events

update personal webpage

EOSC workshop
Bioregistry workshop
2023 Ontology Summit
2nd Mapping Commons workshop

Connecting Preprints to Peer-reviewed Articles on Wikidata | Biopragmatics

After the BioCypher preprint went up on the arXiv, I checked in on the missing co-author items list on the Scholia page that reflects my Wikidata entry. In addition to the several co-authors of the BioCypher manuscript that I don’t know personally, I was curious to see which other papers of mine did not have fully complete co-author annotations. This post has a few SPARQL queries that I used to look into this as well as a few ongoing questions I have about the relationship between distinct entries for preprints and published articles.

https://cthoyt.com/2023/01/02/wikidata-preprints.html