Giter Site home page Giter Site logo

Comments (8)

mdoering avatar mdoering commented on July 4, 2024 1

Indeed wrong, thanks for spotting! As you can see we neglected references as usual...

from checklistbank.

mdoering avatar mdoering commented on July 4, 2024 1

Will be deployed to prod in 1-2 hours

from checklistbank.

rdmpage avatar rdmpage commented on July 4, 2024

I'm currently exploring ways to aggregate all taxonomic references (for various values of "all") into a single pile of CSL-JSON documents, then trying to cluster them into sets of the same references. CoL is an obvious source of data. hence the flurry of bug reports regarding references. Thanks for quickly fixing this issue.

from checklistbank.

mdoering avatar mdoering commented on July 4, 2024

fix deployed

from checklistbank.

mdoering avatar mdoering commented on July 4, 2024

I'm currently exploring ways to aggregate all taxonomic references (for various values of "all") into a single pile of CSL-JSON documents, then trying to cluster them into sets of the same references. CoL is an obvious source of data. hence the flurry of bug reports regarding references. Thanks for quickly fixing this issue.

If you just want all COL references as CSL JSON you should be able to use the reference.json file inside an extended ColDP download like https://api.checklistbank.org/dataset/3LR/export.zip?extended=true&format=ColDP

from checklistbank.

rdmpage avatar rdmpage commented on July 4, 2024

I tend to work dataset by dataset as there are a lot of references in CoL that are little more than a name and a year, or are cryptic page citations. For example there's no point looking at Index Fungorum and IPNI (which is why I built Index Fungorum with Literature and IPNI Literature. For the CSL-JSON database I want full bibliographic records. I also have my own citation parser that I find often does better than CoL in converting strings to CSL-JSON.

from checklistbank.

mdoering avatar mdoering commented on July 4, 2024

I also have my own citation parser that I find often does better than CoL in converting strings to CSL-JSON.

we don't parse citations strings to CSL JSON much. I had given up on that after starting with anystyle for all input. But adding false parsing results to data someone else publishes was worse than the benefits you get from it so we stopped doing that. The various input formats unfortuantely have a varying deegree of parsing. Some just come in as a citation string which we keep as such. Others come in half parsed as ACEF or DC records from the DwC bibliography extension and have title, authors & year usually extracted, but the rest kinda lumped. I try to disentangle that a little, but its in its infancy and can be improved a lot I am aware. And only a few come with proper parsing via ColDP. I also added an option to resolve DOIs and store that csl-json, but as its slow its usually not active on nearly all datasets. I was hoping to build a central reference index at some point that does DOI resolution and ultimately provides cleaner versions.

from checklistbank.

rdmpage avatar rdmpage commented on July 4, 2024

This is pretty much the rationale for "Ten years and a million links: building a global taxonomic library connecting persistent identifiers for names, publications and people" https://doi.org/10.3897/BDJ.11.e107914

I have mappings between IPNI name ids and Index Fungorum ids and the corresponding DOIs (and other identifiers). These have been tricky to assemble as those databases don't have full citations, just "micro citations". For animals I have BioNames, which has full bibliographic citations (mostly). I've also been playing with mapping between CoL datasets that have good bibliographic data and external identifiers, such as DOIs, etc. A lot of this mapping data has been added to ChecklistBank.

It seems to me that one way forward would be to have dataset of names to DOIs (and Wikidata, etc.) with CSL-JSON as well, for all names in CoL that have bibliographic data (whether that bibliographic data comes from the CoL providers themselves, or is external, e.g., the mappings I'm making).

This could be an added benefit of contributing to CoL - the prospect of potentially improved bibliographic data. It would also add value to CoL as users would have more links to the evidence for the names. Part of what I'm doing is capturing PDFs and making sure they are backed up in Internet Archive and or the Wayback Machine. Obviously, bibliographic data isn't terribly exciting unless it leads to being able to read the articles.

I built a toy connecting ChecklistBank to the literature Species Cite. It's proof of concept, and depends on Internet Archive not being flaky (which it is this morning).

Anyway, I know you've lots on your plate as it is, but when and if CoL wants to tackle the literature side of things, I think you might be surprised about just how much data is already available.

from checklistbank.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.