Giter Site home page Giter Site logo

trias-project / rinse-registry-checklist Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 3.99 MB

🐞 RINSE - Registry of non-native species in the Two Seas region countries

Home Page: https://trias-project.github.io/rinse-registry-checklist

License: MIT License

CSS 100.00%
checklist dataset gbif invasive-species oscibio r rstats

rinse-registry-checklist's People

Contributors

lienreyserhove avatar peterdesmet avatar

Watchers

 avatar  avatar  avatar  avatar

rinse-registry-checklist's Issues

Wrong dates in biological invasions paper

The dates for records of Rhododendron ponticum Linnaeus in the Zieritz et al. (2016) dataset need cleaning: for observance in GB the date is 17634, and for Belgium it is 19204. I assume the latter should be 1904, as this is the date of observance in the Netherlands. However, for GB, I have no clue --> I will contact the authors for this and take 1763 for now.

How to handle infraspecificEpithets

As mentioned in #3, infraspecificEpithet is combined with specificEpithet in species in the source data. Together with the genus, this creates a scientific name we can submit to GBIF for parsing.

The resulting name is fine for animal names, which are written as:

Genus specificEpithet infraspecificEpithet

but not for plant names, where the infraspecific rank has to be known and included in the scientific name

Genus specificEpithet subsp. infraspecificEpithet
Genus specificEpithet var. infraspecificEpithet

Since we do not have the rank of the taxa in this checklist, it is impossible to build proper scientific names for the plants. The best course of action would be:

  • To write the names as Genus specificEpithet infraspecificEpithet, even for plants
  • To not provide a taxonRank for (plant) infraspecific names
  • To not provide a nomenclaturalCode
  • To hope that the name matching at GBIF will match with something valid
  • To indicate this issue in the metadata

Add source "data based solely on DAISIE portal" to distribution

For records with the note:

data based solely on DAISIE portal

Add that exact phrasing as source in the distribution extension. It will allow to differentiate between distributions for the same taxon or to exclude those for the unified checklist. Even though it is not the best way to write a source (i.e. not a proper citation), it is referenced as such in the paper (search on "solely") and thus more easily findable. Alternatively, we could use the proper citation:

DAISIE European Invasive Alien Species Gateway (http://www.europe-aliens.org/)

But that doesn't clarify that that it was solely based on those or use the field occurrenceRemarks, but is seems to fit better with source. @qgroom preferences?

nomenclatural code

This field is obligatory, but not available in the raw data. As the raw dataset is a compilation of different datasets and phyla/kingdoms, I would be surprised that one single nomenclatural code could be used. Suggestions?

Add scientific name

The scientific name of a species in RINSE is a compilation of the columns genus and species. To generate the taxonID, we need the full scientific name. However, when I paste genus and species into scientificName, some of the species are rather odd:

Brachyglottis x 'Sunshine'
Cotoneaster x 'Hybridus pendulus'
Triticosecale x
Hosta agg.
Medicago x varia
Larix x marschlinsii hyb.

Just to be sure that I'm generating scientificName correctly by pasting genus and species together. I don't think we should change the content, unless it is horribly wrong.

Phylum vs Division

There's this column phylum_devision. As I am an ecologist and not an taxonomist, I kind of forgot what a division exactly is (although I knew that it's a higher level of classification 😅 ). A quick search gave me this answer:

Phylum vs Division Phylum and division are two taxonomic levels that are used in the biological classification of organisms. Both phylum and division occur below kingdom and above class. The main difference between phylum and division is that phylum is a classification level of the animal kingdom whereas division is the alternative classification level to the phylum in the kingdom Plantae and Fungi. Sometimes the term division is used as the lower classification level of the kingdom Protista and kingdom Monera. The kingdom Animalia comprises 36 phyla. The kingdom Plantae comprises 12 phyla, and the kingdom Fungi comprises 7 phyla.

So, can I put all devisions under the Darwin Core term phylum?

Duplicates in scientific name

It appears that some taxa in the checklist appear twice 😒 (see table bellow for a summary of all duplicated taxa). This is because:

  • most duplicated species have contrasting presence information for one country (e.g. Anchusa arvensis). For some of these species, one record has the note "data based solely on DAISIE portal", while the other records lacks this note. In the main article, I found this:

data based solely on DAISIE portal - taxon listed as present by the DAISIE portal but not by any of the other databases consulted; no additional portal was consulted regarding geographical distribution (also see Methods section)

So, does this mean that, in the one case, only the DAISIE portal was consulted, and in the others, all portals were consulted? Which distribution information do we use then? I see that this also affects species presence for Belgium.

  • Some duplicated species are assigned to two kingdoms, i.e. Elachista and Acrophyllum dentatum. I can find information that verifies both cases...So if this is information is in fact true, then we will get the same taxonID for both species, which is problematic.

In case of the first problem, I think it is best to contact the authors for advice. In the second problem, I'm not sure what to do. I can generate a taxonID based on the combination of phylum and scientifc name in this case?

phylum_division class genus species great_britain france belgium netherlands environment notes scientificName
Angiospermae Eudicotyledoneae Acrophyllum dentatum present not confirmed not confirmed not confirmed terrestrial NA Acrophyllum dentatum
Angiospermae Eudicotyledoneae Anchusa arvensis present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Anchusa arvensis
Angiospermae Eudicotyledoneae Anchusa arvensis not confirmed not confirmed present not confirmed terrestrial NA Anchusa arvensis
Angiospermae Monocotyledoneae Avena sterilis present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Avena sterilis
Angiospermae Monocotyledoneae Avena sterilis not confirmed not confirmed present not confirmed terrestrial NA Avena sterilis
Angiospermae Monocotyledoneae Avena strigosa present present not confirmed not confirmed terrestrial data based solely on DAISIE portal Avena strigosa
Angiospermae Monocotyledoneae Avena strigosa not confirmed not confirmed present not confirmed terrestrial NA Avena strigosa
Angiospermae NA Beta vulgaris not confirmed present not confirmed not confirmed terrestrial NA Beta vulgaris
Angiospermae NA Beta vulgaris not confirmed not confirmed present not confirmed terrestrial NA Beta vulgaris
Angiospermae NA Brassica elongata present present not confirmed not confirmed terrestrial data based solely on DAISIE portal Brassica elongata
Angiospermae NA Brassica elongata not confirmed not confirmed present not confirmed terrestrial NA Brassica elongata
Angiospermae NA Chenopodium berlandieri present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Chenopodium berlandieri
Angiospermae NA Chenopodium berlandieri not confirmed not confirmed present not confirmed terrestrial NA Chenopodium berlandieri
Angiospermae NA Chenopodium strictum present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Chenopodium strictum
Angiospermae NA Chenopodium strictum not confirmed not confirmed present not confirmed terrestrial NA Chenopodium strictum
Angiospermae Monocotyledoneae Cynodon incompletus present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Cynodon incompletus
Angiospermae Monocotyledoneae Cynodon incompletus not confirmed not confirmed present not confirmed terrestrial NA Cynodon incompletus
Angiospermae Eudicotyledoneae Epilobium x novae-civitatis present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Epilobium x novae-civitatis
Angiospermae Eudicotyledoneae Epilobium x novae-civitatis not confirmed not confirmed present not confirmed terrestrial NA Epilobium x novae-civitatis
Angiospermae NA Hypericum hircinum present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Hypericum hircinum
Angiospermae NA Hypericum hircinum not confirmed present not confirmed not confirmed terrestrial NA Hypericum hircinum
Angiospermae NA Lythrum junceum present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Lythrum junceum
Angiospermae NA Lythrum junceum not confirmed not confirmed present not confirmed terrestrial NA Lythrum junceum
Angiospermae NA Mentha x piperita present not confirmed present not confirmed terrestrial data based solely on DAISIE portal Mentha x piperita
Angiospermae NA Mentha x piperita not confirmed not confirmed present not confirmed terrestrial NA Mentha x piperita
Angiospermae Monocotyledoneae Papaver atlanticum present not confirmed present not confirmed terrestrial data based solely on DAISIE portal Papaver atlanticum
Angiospermae Monocotyledoneae Papaver atlanticum present not confirmed present present terrestrial NA Papaver atlanticum
Angiospermae NA Populus nigra present not confirmed not confirmed not confirmed terrestrial data based solely on DAISIE portal Populus nigra
Angiospermae NA Populus nigra not confirmed not confirmed present not confirmed terrestrial NA Populus nigra
Angiospermae NA Salix x sepulcralis present present not confirmed not confirmed terrestrial data based solely on DAISIE portal Salix x sepulcralis
Angiospermae NA Salix x sepulcralis not confirmed not confirmed present not confirmed terrestrial NA Salix x sepulcralis
Arthropoda Insecta Cinara pini present present not confirmed present terrestrial NA Cinara pini
Arthropoda Insecta Cinara pini present not confirmed not confirmed not confirmed terrestrial NA Cinara pini
Arthropoda Insecta Elachista sp. present present present present marine NA Elachista sp.
Bryophyta Eudicotyledoneae Acrophyllum dentatum present not confirmed not confirmed not confirmed terrestrial NA Acrophyllum dentatum
Heterokontophyta Phaeophyceae Elachista sp. not confirmed not confirmed not confirmed present marine NA Elachista sp.
Nematoda Adenophorea Xiphinema rivesi not confirmed present not confirmed not confirmed terrestrial NA Xiphinema rivesi
Nematoda Adenophorea Xiphinema rivesi not confirmed present not confirmed not confirmed terrestrial NA Xiphinema rivesi
Pteridophyta Pteridopsida Dicksonia antarctica present not confirmed not confirmed not confirmed terrestrial NA Dicksonia antarctica
Pteridophyta Pteridopsida Dicksonia antarctica present not confirmed not confirmed not confirmed terrestrial NA Dicksonia antarctica

Update hybrid taxonRank

All taxa with taxonRank hybrid are hybrid formulas (unlike those in the MAP: trias-project/alien-plants-belgium#39), although they are not all written out as full hybrid formulas. hybrid is not recognized as a taxonRank however. Once we have guidance on how to populate the taxonRank for hybrid formulas we should update this in this dataset.

Taxa:

scientificName
Acaena anserinifolia x inermis
Aconitum napellus x variegatum
Agrostis gigantea x stolonifera
Artemisia verlotiorum x vulgaris
Avena sativa x sterilis
Chenopodium album x amaranticolor
Chenopodium album x hircinum
Chenopodium album x missouriense
Chenopodium album x probstii
Corylus avellana x maxima
Crataegus heterophylla x monogyna
Crocosmia aurea x pottsii
Crocus tommasinianus x vernus
Dianthus caryophyllus x gratianopolitanus
Dianthus gratianopolitanus x plumarius
Dipsacus fullonum x sativus
Epilobium ciliatum x lanceolatum
Festuca arundinacea x Lolium multiflorum
Festuca rubra x Vulpia myuros
Fumaria densiflora x officinalis
Fumaria officinalis x parviflora
Galanthus elwesii x nivalis
Galanthus elwesii x plicatus
Galanthus nivalis x plicatus
Helianthus rigidus x tuberosus
Heracleum mantegazzianum x sphondylium
Juncus inflexus x pallidus
Lupinus arboreus x nootkatensis x polyphyllus
Lysichiton americanus x camtschatcensis
Nothofagus nervosa x obliqua
Oenothera biennis x cambrica
Populus canadensis x jackii
Populus jackii x nigra
Quercus cerris x robur
Rumex conglomeratus x patientia
Rumex cristatus x palustris
Sorbus aucuparia x intermedia
Tripleurospermum inodorum x maritimum
Verbascum blattaria x phoeniceum
Verbascum nigrum x pyramidatum
Verbascum pyramidatum x thapsus

Cleaning steps references

Some feedback needed:

The Zieritz et al. (2016) checklist has a reference column containing numbers. Two things with respect to that:

  1. The numbers are separated by comma's and hyphens. The hyphen is used to indicate a sequence, i.e. 1-4 refers to references 1, 2, 3 and 4. We need the latter. I didn't figure out yet how I can generate these sequences in an way that makes the code readable. Thus, I suggest to generate the sequences in the raw data file, rather than performing the cleaning in the R script (which makes it more messy). As this is a dead dataset, I think the cleaning step won't harm.

  2. For some species, about 12 reference numbers are provided, which is a lot. Just to be sure, is it really necessary to integrate the full reference? The fields will be full of text, but I guess there's no other way around that right?

Publish under University of Cambridge

The three main authors of the original paper on which this dataset is based, are affiliated with:

Department of Zoology, University of Cambridge, Cambridge, United Kingdom
University of Cambridge, Cambridge, United Kingdom

Which is why we also opted to use University of Cambridge in institutionCode (see #9).

The university is not yet registered with GBIF as a data publisher, but University Museum of Zoology Cambridge is: https://www.gbif.org/publisher/d9ccac00-9bc7-11de-a329-b8a03c50a862

I would:

  1. Ask @dbloom and @tucotuco (to ask their contacts) if it would be OK to update https://www.gbif.org/publisher/d9ccac00-9bc7-11de-a329-b8a03c50a862 to the higher level University of Cambridge itself. It won't affect any existing dataset, because none have been published.
  2. If 1 doesn't work, ask the authors if it is OK to register under the zoology museum at cam
  3. If that is not OK, see if we can make a new University of Cambridge publisher, confusing matters at GBIF
  4. If that doesn't work, give up and publish under INBO, which is incorrect

license, rightsHolder and institutionCode

The Neobiota paper is published under the CC BY 4.0 license (https://neobiota.pensoft.net/article/4007/list/8/)

With respect to the data in supplementary material I found the following:

This dataset is made available under the Open Database License (http://opendatacommons.org/licenses/odbl/1.0/). The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use this Dataset while maintaining this same freedom for others, provided that the original source and author(s) are credited.

So use http://opendatacommons.org/licenses/odbl/1.0/ for accessRights and license?

change taxonID

Ideally, the taxonID for this checklist should be rinse-registry-checklist:taxon:hash. Now, it's still the old version rinse-checklist:taxon:hash.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.