Comments (8)
Indeed wrong, thanks for spotting! As you can see we neglected references as usual...
from checklistbank.
Will be deployed to prod in 1-2 hours
from checklistbank.
I'm currently exploring ways to aggregate all taxonomic references (for various values of "all") into a single pile of CSL-JSON documents, then trying to cluster them into sets of the same references. CoL is an obvious source of data. hence the flurry of bug reports regarding references. Thanks for quickly fixing this issue.
from checklistbank.
fix deployed
from checklistbank.
I'm currently exploring ways to aggregate all taxonomic references (for various values of "all") into a single pile of CSL-JSON documents, then trying to cluster them into sets of the same references. CoL is an obvious source of data. hence the flurry of bug reports regarding references. Thanks for quickly fixing this issue.
If you just want all COL references as CSL JSON you should be able to use the reference.json file inside an extended ColDP download like https://api.checklistbank.org/dataset/3LR/export.zip?extended=true&format=ColDP
from checklistbank.
I tend to work dataset by dataset as there are a lot of references in CoL that are little more than a name and a year, or are cryptic page citations. For example there's no point looking at Index Fungorum and IPNI (which is why I built Index Fungorum with Literature and IPNI Literature. For the CSL-JSON database I want full bibliographic records. I also have my own citation parser that I find often does better than CoL in converting strings to CSL-JSON.
from checklistbank.
I also have my own citation parser that I find often does better than CoL in converting strings to CSL-JSON.
we don't parse citations strings to CSL JSON much. I had given up on that after starting with anystyle for all input. But adding false parsing results to data someone else publishes was worse than the benefits you get from it so we stopped doing that. The various input formats unfortuantely have a varying deegree of parsing. Some just come in as a citation string which we keep as such. Others come in half parsed as ACEF or DC records from the DwC bibliography extension and have title, authors & year usually extracted, but the rest kinda lumped. I try to disentangle that a little, but its in its infancy and can be improved a lot I am aware. And only a few come with proper parsing via ColDP. I also added an option to resolve DOIs and store that csl-json, but as its slow its usually not active on nearly all datasets. I was hoping to build a central reference index at some point that does DOI resolution and ultimately provides cleaner versions.
from checklistbank.
This is pretty much the rationale for "Ten years and a million links: building a global taxonomic library connecting persistent identifiers for names, publications and people" https://doi.org/10.3897/BDJ.11.e107914
I have mappings between IPNI name ids and Index Fungorum ids and the corresponding DOIs (and other identifiers). These have been tricky to assemble as those databases don't have full citations, just "micro citations". For animals I have BioNames, which has full bibliographic citations (mostly). I've also been playing with mapping between CoL datasets that have good bibliographic data and external identifiers, such as DOIs, etc. A lot of this mapping data has been added to ChecklistBank.
It seems to me that one way forward would be to have dataset of names to DOIs (and Wikidata, etc.) with CSL-JSON as well, for all names in CoL that have bibliographic data (whether that bibliographic data comes from the CoL providers themselves, or is external, e.g., the mappings I'm making).
This could be an added benefit of contributing to CoL - the prospect of potentially improved bibliographic data. It would also add value to CoL as users would have more links to the evidence for the names. Part of what I'm doing is capturing PDFs and making sure they are backed up in Internet Archive and or the Wayback Machine. Obviously, bibliographic data isn't terribly exciting unless it leads to being able to read the articles.
I built a toy connecting ChecklistBank to the literature Species Cite. It's proof of concept, and depends on Internet Archive not being flaky (which it is this morning).
Anyway, I know you've lots on your plate as it is, but when and if CoL wants to tackle the literature side of things, I think you might be surprised about just how much data is already available.
from checklistbank.
Related Issues (20)
- Tools page for the Taxonomic alignment tool
- Bug: source dataset search doesn't allow to filter some xrelease sources HOT 1
- metadata - The quota has been exceeded HOT 1
- Unable to select root taxon for download HOT 6
- Show project release history HOT 3
- Bug: Pop message "Cannot read properties of null (reading 'key')
- Left menu gets stock with last dataset info HOT 2
- e is null error HOT 1
- Show latest sync states HOT 1
- Sector sync history link broken
- Issues not shown on synonym page
- ID not escaped in links
- Issue flags shown twice HOT 1
- blank screen when searching for specific release source
- White source metrics page HOT 1
- Patch edition option leads to a blank page HOT 1
- Question: Identifier mapping HOT 4
- Issues not shown for COL releses HOT 1
- Bug: Source of a name is not always included when listing duplicate names in the xrelease
- CLB: problem with global ITIS sectors vs merged sectors HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from checklistbank.