Giter Site home page Giter Site logo

Comments (17)

sckott avatar sckott commented on August 17, 2024

Hey @dmcglinn , answer in a second

from rgbif.

sckott avatar sckott commented on August 17, 2024

Note that occurrencelist and occurencelist_many now return S3 objects, so you gotta use gbifdata to get the data (or convert it yourself I guess).

So, the problem is that the search is saying I want exactly "Aristolochia serpentaria", when what it seems like you want is that, but with variants, right?

Try this:

library(rgbif)
out <- gbifdata(occurrencelist(scientificname = 'Aristolochia serpentaria*', coordinatestatus = TRUE, maxresults = 1000))

unique(out$taxonName)

[1] Aristolochia serpentaria l. Aristolochia serpentaria   
Levels: Aristolochia serpentaria Aristolochia serpentaria l.

nrow(out)

[1] 96

Notice the asterisk after the taxon name, and that you get two names returned, one with l. , presumably for Linnaeus.

Gives 96 georeferenced records though, where GBIF gives 116 (GBIF does give 431 records as you said, but not all have lat/long data)

from rgbif.

sckott avatar sckott commented on August 17, 2024

Hi again @dmcglinn Here's the responsible line in the code https://github.com/ropensci/rgbif/blob/master/R/methods.r#L44

It removes rows that have NA's for both lat and long. And 20 of the 116 records have zeros for both lat and long, even on the GBIF site, see here http://data.gbif.org/ws/rest/occurrence/list?scientificname=Aristolochia%20serpentaria*&coordinatestatus=TRUE

So those zeros get converted to NA's and removed in rgbif since I assumed that people wouldn't be interested in records without lat/long data.

What do you think?

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

Hey @schamberlain thanks for the help and speedy replies on these issues. It does look like adding the '*' to the species name so that variants were returned was the primary issue, but also changing coordinatestatus to FALSE increased the number of returns as well. The query:

occurrencelist(scientificname = 'Aristolochia serpentaria*', coordinatestatus = FALSE, maxresults = 1e6)

returns 306 records which is identical to

http://data.gbif.org/ws/rest/occurrence/list?scientificname=Aristolochia%20serpentaria*&coordinatestatus=FALSE

but these queries do not return the full 431 items that a normal gbif species query on Aristolochia serpentaria returns. It appears that this may be do the fact that a GBIF query returns a broader range of names, specifically

library(rgbif)

out = occurrencelist(scientificname = 'Aristolochia serpentaria*', coordinatestatus = FALSE, maxresults = 1e6)

unique(out$taxonName)

[1] "aristolochia serpentaria"                                    
[2] "Aristolochia serpentaria L."                                 
[3] "Aristolochia serpentaria"                                    
[4] "Aristolochia serpentaria L. var. hastata (Nutt.) Duchartre"  
[5] "Aristolochia serpentaria L. var. hastata (Nuttall) Duchartre"
[6] "Aristolochia serpentaria var. hastata Duch."                 
[7] "Aristolochia serpentaria var. serpentaria"                   
[8] "Aristolochia serpentaria BARTON"  

whereas the GBIF query at http://data.gbif.org/ returns these names as well as synonym names such as:

Aristolochia hastata
Aristolochia nashii
Aristolochia convolvulacea Small
Aristolochia serpentaria var. hastata (Nutt.) Duchartre

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

I checked that those additional names were indeed synonyms here: http://www.itis.gov/

from rgbif.

sckott avatar sckott commented on August 17, 2024

Interesting. So GBIF.org is giving back synonyms as well as actual matches of the query string, whereas their API does not do that. Let me see if there is a parameter that we could fiddle with to get exactly what they give back.

from rgbif.

sckott avatar sckott commented on August 17, 2024

The API docs says

scientificname - count only records where the scientific name matches that supplied - this is based on the scientific name found in the original record from the data provider and does not make use of extra knowledge of possible synonyms or of child taxa. For these functions, use taxonconceptkey.

and

taxonconceptkey - return only records which are for the taxon identified by the supplied numeric key, including any records provided under synonyms of the taxon concerned, and any records for child taxa (e.g. all genera and species within a family).

So the taxonconceptkey parameter does give back synonyms. However, from a user perspective, you would first have to get the taxonconceptkey, which is not ideal.

I'm guessing GBIF.org gets a taxonconceptkey based on your search, then looks up synonyms - but doesn't do this with the API - weird.

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

Yea that's unfortunate, but I suppose one solution is the following?

library(rgbif)

gbifkey = taxonsearch(scientificname='Aristolochia serpentaria')$gbifkey
name_lkup = taxonget(key = as.numeric(as.character(gbifkey)))
sciname = as.character(subset(name_lkup, select='sciname', subset= rank == 'species' | rank == 'variety')[ ,1])
## to include variates add '*'
sciname = paste(sciname, '*', sep='')

out = occurrencelist_many(scientificname = sciname, coordinatestatus = FALSE, maxresults = 1e6)

out
$NumberFound
[1] 1845

However this now returns many more records than the original gbif.org query.

from rgbif.

sckott avatar sckott commented on August 17, 2024

Hmmm, was trying getting synonyms from ITIS, and feeding those in to GBIF, but GBIF has different synonyms! Anyway, would be nice if GBIF had a synonyms API.

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

The problem with the approach I proposed is that it does not guarantee that duplicate records are not returned. Does
occurrencelist return a unique record identifier field we could filter results on to ensure lack of duplication?

from rgbif.

sckott avatar sckott commented on August 17, 2024

going out for a bit...

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

just posted this pull request to include unique id's with the query results: #29

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

Once #29 is merged the following query will return the same number of results as the GBIF web portal:

library(rgbif)

gbifkey = taxonsearch(scientificname='Aristolochia serpentaria')$gbifkey
name_lkup = taxonget(key = as.numeric(as.character(gbifkey)))

sciname = unique(as.character(subset(name_lkup, select='sciname',
                   subset= rank == 'species' | rank == 'variety')[ ,1]))

sciname = paste(sciname, '*', sep='')

out = occurrencelist_many(scientificname = sciname, maxresults = 1e6)

out 
$NumberFound
[1] 431

431 results matches the number of results returned when you do a simple web query for this species.

from rgbif.

sckott avatar sckott commented on August 17, 2024

merged your pull, thanks for that!

What do you think @dmcglinn ? Should functions try to match exactly what happens in the GBIF web interface? Or not?

from rgbif.

dmcglinn avatar dmcglinn commented on August 17, 2024

I think you should provide the option for this with a new function, see my suggested solution in #30

The primary benefit in my mind is that if someone doesn't want to do the work of sorting out synonymy on their own and then querying each name individually you can provide the option of using GBIF's internal synonym mapping to complete the query. There is also the added benefit of the similarity between the web interface and the R query but that seems relatively minor (you'll probably just get less users complaining that something may have gone wrong). However, more functions in the package results in more effort maintaining so you may ultimately decide its not worth it.

from rgbif.

sckott avatar sckott commented on August 17, 2024

Thanks for the new function!

Right, we should definitely strive to make it easier for users, which your function does.

I would like to have just one function that does everything with the occurencelist endpoint, but I imagine that is too difficult b/c there is a lot going on there. Another thing not included is the ability to specify many values for the same parameter, discussed here #28 . Hoping that they will change that since it's a lot of waste to used named params over and over again.

from rgbif.

sckott avatar sckott commented on August 17, 2024

closing this for now

from rgbif.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.