My personal website, served at https://cthoyt.com
docker run --rm --volume="$PWD:/srv/jekyll" -p 4000:4000 -it jekyll/jekyll:latest jekyll serve
CC BY 4.0
My personal website, served at https://cthoyt.com
Home Page: https://cthoyt.com/
License: Creative Commons Attribution 4.0 International
My personal website, served at https://cthoyt.com
docker run --rm --volume="$PWD:/srv/jekyll" -p 4000:4000 -it jekyll/jekyll:latest jekyll serve
CC BY 4.0
A local unique identifier is the value within a semantic space. For example, MONDO has the local unique identifier 0005301 for "multiple sclerosis". If you want to make a URI, you take the MONDO URI prefix (http://purl.obolibrary.org/obo/MONDO_) and concatenate the local unique identifier on the end to make a URI (i.e., http://purl.obolibrary.org/obo/MONDO_0005301). Similarly, if you want to make a compact URI (CURIE), you take the MONDO CURIE prefix (MONDO) and concatenate a semicolon : then the local unique identifier (i.e., MONDO:0005301)
Unfortunately, there are a lot of places where people mistakenly write a whole CURIE in a place where a local unique identifier should go. This means someone writes MONDO:0005301 where they should have written 0005301. We call this a redundant prefix in the local unique identifier. This is also colloquially called the "banana problem"
Wikidata is one place where this happens. Identifiers.org also has propagated this mistake to many places (though MONDO does not appear in Identifiers.org, it might be the case that the submitter for the Wikidata property was influenced by how other properties did it, which were in turn influenced by Identifirs.org)
TL;DR, Wikidata has a lot of wrong ways of writing LUIDs in its properties referring to ontologies, MONDO being one example
Archival Resource Keys (ARKs) are flavor of persistent identifiers like DOIs, URNs, and Handles that have the benefit of being free, flexible with what metadata gets attached, and natively able to resolve to web pages. Name-to-Thing (N2T) implements a resolver for a variety of ARKs, so this blog post is about how that resolver can be re-implemented with the curies Python package.
update personal webpage
While we were recently preparing to submit a manuscript, the lead author said they looked at my last few papers and noticed I always used a private email address instead of an institutional email address. They asked, perplexed, if they should also use my private email address with our submission. The answer was a resounding yes; always use a private email address. Here’s why.
If you’re reading my blog, there’s a pretty high chance you’ve used DrugBank, a database of drug-target interations, drug-drug interactions, and other high-granularity information about clinically-studied chemicals. DrugBank has two major problems, though: its data are password-protected, and its license does not allow redistribution. Time to solve these problems once and for all.
On top the issue of resolving identifiers to their names, the bioinformatics community has a hard time figuring out when two identifiers from different databases are equivalent. You know who else has the same problem? Inspector Javert. Get ready for a Les Miserables-themed post on how to address this long-standing problem.
https://cthoyt.com/2020/04/19/inspector-javerts-xref-database.html
The mean rank (MR) and mean reciprocal rank (MRR) are among the most popular metrics reported for the evaluation of knowledge graph embedding models in the link prediction task. While they are reported on very different intervals (
This blog post is a tutorial on how to curate the links between a researcher and scholarly works (e.g., pre-prints, publications, presentations) on Wikidata using Scholia and the Author Disambiguator tool.
I’ve just returned from the 17th Annual International Biocuration Conference at the Indian Biological Data Centre (IBDC) in Faridabad, India. I wanted to highlight some of the interesting conversations I had while I was there, and ideas for follow-up. Most were centered around the Bioregistry and the Semantic Mapping Assembler and Reasoner (SeMRA), which I gave an oral presentation on.
https://cthoyt.com/2024/03/11/biocuration2024-discussions.html
In language, semantics describe the names and meanings of words. The bioinformatics community has aptly adopted biosemantics as a concept that encompasses the issues with the names and meanings of biological entities, usually in natural language processing and data integration. However, semantics does not capture the context of words, and biosemantics fails to describe the biological context and complex relationships between biological entities.
https://cthoyt.com/2020/01/22/biosemantics-versus-biopragmatics.html
After the BioCypher preprint went up on the arXiv, I checked in on the missing co-author items list on the Scholia page that reflects my Wikidata entry. In addition to the several co-authors of the BioCypher manuscript that I don’t know personally, I was curious to see which other papers of mine did not have fully complete co-author annotations. This post has a few SPARQL queries that I used to look into this as well as a few ongoing questions I have about the relationship between distinct entries for preprints and published articles.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.