Giter Site home page Giter Site logo

rinat's Introduction

rinat: Access iNaturalist data with R

Edmund Hart, Stéphane Guillou

Build Status Build status codecov.io

R wrapper for iNaturalist APIs for accessing the observations. The detailed documentation of the API is available on the iNaturalist website and is part of our larger species occurrence searching packages SPOCC.

Installation

You can install the latest version available on CRAN with:

install.packages("rinat")

Alternatively, you can install the development version from Github with:

remotes::install_github("ropensci/rinat")

Usage

Get observations

get_inat_obs() is the primary function that retrieves observations from iNaturalist. The text or taxon search can be refined by observation date, record quality and location.

It is recommended to set the quality argument to "research" in order to get more reliable data that has been validated by several contributors.

Taxon search

To return only records of a specific species or taxonomic group, use the taxon_name argument. For example, to return observations of anything from the Nymphalidae family, and restricting the search to the year 2015:

library(rinat)
nymphalidae <- get_inat_obs(taxon_name  = "Nymphalidae", year = 2015)
# how many unique taxa?
length(unique(nymphalidae$scientific_name))
## [1] 72

Note that get_inat_obs() will return 100 observations by default. This can be controlled with the maxresults argument.

Text search

You can also search observations with any string. It will search the entire iNaturalist database, so the search below will return all entries that mention Monarch butterflies, not just Monarch observations.

monarchs <- get_inat_obs(query = "Monarch Butterfly", year = 2021)
# which taxa were returned?
unique(monarchs$scientific_name)
## [1] "Danaus plexippus" "Danaina"

You can combine the fuzzy search with the precise taxon search. For example, to get Monarch butterfly observations that also mention the term “chrysalis”:

monarch_chrysalis <- get_inat_obs(taxon_name = "Danaus plexippus", query = "chrysalis")

Bounding box search

You can also search within a bounding box by giving a simple set of coordinates.

## Search by area
bounds <- c(38.44047, -125, 40.86652, -121.837)
deer <- get_inat_obs(query = "Mule Deer", bounds = bounds)
plot(deer$longitude, deer$latitude)

Other functions

More functions are available, notably to access:

  • observations in a project with get_inat_obs_project()
  • details of a single observation with get_inat_obs_id()
  • observations from a single user with get_inat_obs_user()
  • taxa statistics with get_inat_taxon_stats()
  • user statistics with get_inat_user_stats()

More detailed examples are included in the vignette:

vignette("rinat-intro", package = "rinat")

Mapping

Basic maps can be created with the inat_map() function to quickly visualize search results. The plot = FALSE option can be used to avoid displaying the initial plot when further customising it with ggplot2 functions.

library(ggplot2)

## Map 100 spotted salamanders
a_mac <- get_inat_obs(taxon_name = "Ambystoma maculatum", year = 2021)
salamander_map <- inat_map(a_mac, plot = FALSE)

### Further customise the returned ggplot object
salamander_map + borders("state") + theme_bw()

inat_map() is useful for quickly mapping single-species data obtained with rinat. However, more complicated plots are best made from scratch. Here is an example of customised map that does not make use of it. (Note the use of quality = "research" to restrict the search to the more reliable observations.)

## A more elaborate map of Colibri sp.
colibri <- get_inat_obs(taxon_name = "Colibri",
                        quality = "research",
                        maxresults = 500)
ggplot(data = colibri, aes(x = longitude,
                         y = latitude,
                         colour = scientific_name)) +
  geom_polygon(data = map_data("world"),
                   aes(x = long, y = lat, group = group),
                   fill = "grey95",
                   color = "gray40",
                   size = 0.1) +
  geom_point(size = 0.7, alpha = 0.5) +
  coord_fixed(xlim = range(colibri$longitude, na.rm = TRUE),
              ylim = range(colibri$latitude, na.rm = TRUE)) +
  theme_bw()


rinat's People

Contributors

emhart avatar jeroen avatar karthik avatar ldalby avatar maelle avatar martin-jung avatar sckott avatar stevenysw avatar stragu avatar vijaybarve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rinat's Issues

failure when call returns >20K observations

Calls that return more than 20K observations fail with "Error in data$status_code : object of type 'closure' is not subsettable". Example (Santa Monica Mountains NRA including BioBlitz observations):
iNatHits <- get_inat_obs(bounds=c(37.43380, -122.93986, 38.22174, -122.27673), maxresults=1000000)
According to http://www.inaturalist.org/pages/api+reference#get-observations , the current iNaturalist API appears to stop after 100 pages (200*100=20k observations) without authentication.
The simple fix is to change the threshold in the error trapping in lines 128:129 to be 20000.

I see 2 approaches to the broader solution.
First, a separate function get_inat_Nobs could be built from this code down to line 124, then returning total_res. The user could use that ping to determine if they need to split their request by year, month, etc., to get each chunk under 20K. [I would document this on the same page as get_inat_obs, with the difference noted in Value, but that might not be your style.] {The above call has total_res == 107544 via this approach.}

The second approach would be to support credentials & authentication as in rgbif. I don't know how difficult this would be to implement. It would respect the intent of the iNaturalist limit to let them know who is scraping large amounts of data. Also, it might allow for the iNaturalist API to vend unobscured coordinates when authentication credentials match their internal project-based permissions. My use case is that NPS is using iNaturalist for all of our BioBlitzes, and may adopt it for all of our species observation data. That will make our data available to the wider community, albeit with obscured coordinates for sensitive species. If we can pass credentials through to get our own data with full coordinates, I can coerce many more NPS folks to use iNaturalist and make their observations visible to the public.

Fix the read.csv component to accomodate diff versions of httr

spocc is throwing errors when calling from=inat when httr > v0.2 is used. The current CRAN version of httr which most people likely have is v0.2. The dev version hadley will push soon is v0.3, which added a mime type text/csv so calling content() automatically parses a csv string to a data.frame.

The line read.csv(textConnection(content(data)),stringsAsFactors = FALSE) errors, as content(data) has already converted to a data.frame

Let's get this fixed. For now we can add line to spocc DESCRIPTION file to require httr < v0.3 - but when rinat is fixed we can remove that requirement from spocc

use NEWS, and NEWS in releases

hey @emhart - We want all rOpenSci pkgs to consistently keep track of changes, following https://github.com/ropensci/onboarding/blob/master/packaging_guide.md#-news

  • use a NEWS or NEWS.md file to keep track of changes with each version released to CRAN
  • You already git tag - thanks for that
  • use the releases tab on this repo to include the associated NEWS items for each tag/version

Can you please do the three above items from now on? If you don't have time, I'll be happy to do these bits, just lemme know

fxn get_inat_obs with invalid name produces error

> get_inat_obs(taxon_name ="abcxyz", geo = TRUE, quality = "research", maxresults = 10000000)
Error in get_inat_obs(taxon_name = "abcxyz", geo = TRUE, quality = "research",  : 
  Your search returned too many results, please consider breaking it up into smaller chunks by year or month

get_obs_inat returns names that weren't searched for

This call

library(rinat)
tmp <- get_obs_inat('Pinus contorta')
as.character(unique(tmp$Scientific.name))
[1] "Pinus contorta"           "Pinus contorta contorta"  "Pinus contorta murrayana" "Pinus"                   
[5] "Pinus contorta bolanderi" ""                         "Suillus brevipes"         "Amanita muscaria"        
[9] "Laccaria bicolor"  

returns lots of names that weren't searched. Any thoughts? We could post process by removing names that don't match exactly to the searched names?

inat observation for a whole class

Hi everyone,

I am trying to get all available occurrences for species in the class 'Actinopterygii' via rinat on R. When I search on the inaturalist.org website, I get 137889 observations for 'actinopterygii'. With the code : get_inat_obs(taxon_name="Actinopterygii", geo=TRUE), I get 100 observations on R. I am wondering what is the defaut number for the maxresult option. Do I need to force maxresult to be higher than 137889? (it does not seem to work).
Also, do 'taxon_name="actinopterygii" really is going to give me all observations for which the class of the organism is Actinopterygii ? The observation I am getting so far have a iconic_taxon_name=actinopterygii column, what does that mean ?

Thank you very much for your answers,

Luana

Error in get_inat_obs for taxa having large number of records

Following code gives error

test <- get_inat_obs(taxon_name = "Tracheophyta",maxresults = 100)
Error in get_inat_obs(taxon_name = "Tracheophyta", maxresults = 100) : 
  Your search returned too many results, please consider breaking it up into smaller chunks by year or month

but both of the following produce desired results.

test <- get_inat_obs(taxon_name = "Kirkiaceae",maxresults = 100)
test <- get_inat_obs(taxon_name = "Danaus",maxresults = 100)

request limits information

It would be helpful to have information about rate limits in the documentation.

"We throttle API usage to a max of 100 requests per minute, though we ask that you try to keep it to 60 requests per minute or lower. If we notice usage that has serious impact on our performance we may institute blocks without notification."

Not all functions have a maxresults that helps control this, and users may unknowingly trigger a block.

CRAN submission feedback: reset options if modified in vignette

From CRAN submission process feedback on 2020-09-17:

Please always make sure to reset to user's options(), working directory or par() after you changed it in examples and vignettes and demos.
e.g.: Vignette
old <- options(digits = 3)
...
options(old)

get_inat_obs_project(), error in downloading data from iNaturalist collection and umbrella projects

I am encountering an error when using get_inat_obs_project() function to download data from iNaturalist Collection projects and Umbrella Projects. The get_inat_obs_project() works fine for traditional projects. It seems like this bug is related to the new projects types and how the data are formatted differently in the iNaturalist API. This is the error I got: Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input.

See example codes here:

rm(list= ls())
library(rinat)
###Traditional Projects, can read in info and download data####
df <- get_inat_obs_project("slime", type = "info", raw = F)
df <- get_inat_obs_project("slime", type = "observations", raw = F)

df <- get_inat_obs_project("rascals", type = "info", raw = F)
df <- get_inat_obs_project("rascals", type = "observations", raw = F)

###Umbrella Projects, can read in info, but cannot download data####
df <- get_inat_obs_project("city-nature-challenge-2018", type = "info", raw = F)
df <- get_inat_obs_project("city-nature-challenge-2018", type = "observations", raw = F)

df <- get_inat_obs_project("city-nature-challenge-2019", type = "info", raw = F)
df <- get_inat_obs_project("city-nature-challenge-2019", type = "observations", raw = F)

###Collection Projects, can read in info, but cannot download data####
df <- get_inat_obs_project("global-amphibian-bioblitz", type = "info", raw = F)
df <- get_inat_obs_project("global-amphibian-bioblitz", type = "observations", raw = F)

df <- get_inat_obs_project("ucla-campus-biodiversity", type = "info", raw = F)
df <- get_inat_obs_project("ucla-campus-biodiversity", type = "observations", raw = F)

Push first version to CRAN

@emhart So that we can get spocc pushed soon to CRAN, do you think we can get rinat pushed to CRAN soon? It doesn't have to be perfect. I set a date for milestone v0.1 - do you think we can get it in by then?

untrappable error in get_inat_obs()

I'm trying to pull research grade observations in and around National Park units, looping across the units. When I get to Big South Fork in Kentucky & Tennessee, I get an error that I cannot protect myself from via try():

DL <- try(get_inat_obs(quality="research", 
                       bounds=c(29.93593, -95.03580,  30.96098, -93.85120), 
                       maxresults=100000, meta=FALSE))

The error message is:

Error in data$status_code : object of type 'closure' is not subsettable
In addition: Warning message:
In inat_handle(data) :
  Conent type incorrect, should be 'text/csv; charset=utf-8'

I think that the issue is there are 10854 results available (determined by maxresults=10,meta=TRUE which works). In limited testing calls with maxresults up to 10000 works.

Lines 128:131 test against 100000 results with bounds. Might iNaturalist have dropped that limit?

Without carefully looking, I suspect line 168 is a holdover from earlier versions:

warning(sprintf("Error: HTTP Status %s", data$status_code))

as all other occurrences seem to be x$status_code
Line 164 appears to have the typo "Conent" instead of "Content"


I tested with my proposed edits to lines 164 and 168 and the error message now is

DL <- try(get_inat_obs(quality="research",bounds=c(29.93593, -95.03580,  30.96098, -93.85120), maxresults=100000, meta=TRUE))
Error in textConnection(data) : invalid 'text' argument
In addition: Warning messages:
1: In inat_handle(data) :
  Content type incorrect, should be 'text/csv; charset=utf-8'
2: In inat_handle(data) : Error: HTTP Status 404

So a simple pull request for those 2 edits doesn't solve my problem. Is the 404 because iNaturalist reduced the limit of results using bounds from 100000 to 10000?
And, I still can't trap that error via try().

get_inat_obs_project: "Not found"

get_inat_obs_project does not return any results, even in developer version of rinat.

get_inat_obs_project(354, type = "observations")

gives

1362  Records
0-200-400-600-800-1000-1200-1400[1] "Not found"

Faulty "too many results" error

When I run:

di = rinat::get_inat_obs(taxon_name="Frageria chiloensis", maxresults=10, quality='research')

I get the following error.

Error in rinat::get_inat_obs(taxon_name = "Frageria chiloensis", maxresults = 10,  : 
  Your search returned too many results, please consider breaking it up into smaller chunks by year or month

I can get other queries to work for many more records but no matter what I set for maxresults with Frageria chiloensis as the taxon_names search term I get this error.

use `world` instead of `usa` for default basemap

The current default value for the map argument is usa in function inat_map(), which is a very US-centric choice and creates rubbish maps in many cases when the data is not located in or close to the US.

We should use the world basemap, also provided by the maps package, and mention the other basemap available by default (france, italy, nz and usa).

This is a breaking change as it would produce different maps in existing scripts using the default argument value (but would not produce an error or warning).

CRAN submission feedback: convert printed text to message() / warning() / others where relevant

From CRAN submission process feedback on 2020-09-17:

You write information messages to the console that cannot be easily suppressed.
It is more R like to generate objects that can be used to extract the information a user is interested in, and then print() that object.
Instead of print()/cat() rather use message()/warning() or if(verbose)cat(..) (or maybe stop()) if you really have to write text to the console. (except for print, summary, interactive functions)

Error in get_inat_obs : unused argument place_id

Hi all,
I'm using the function rinat::get_inat_obs().
Reading the doc the parameters place_id is supported, but when I launch:

get_inat_obs( query = "Lacerta Bilineata"
            , year = "2020"
            , maxresults = 10
            , place_id = 8670) 

I got:

Error in get_inat_obs(query = "Lacerta Bilineata", year = "2020", maxresults = 10, : unused argument (place_id = 8670)

EDIT:
I am using rinat_0.1.5 and R 4.0.4

Am I doing wrong something?

Thanks

get_inat_obs_project has no way to filter data when you reach 10000 limit.

I use get_inat_obs_project() to get inat records because it has built in taxon filters, area, etc all ready done. The problem is, my project reached over 10k records, which the API limits and as far as I can tell get_inat_obs_project() has no means to filter the data.

Adversely get_inat_obs() has those tools, but for some reason you cannot use project_id as an input.

Any suggestions on how to work around this?

Query annotations as in iNaturalist Search URL's

I would like to be able to use the rinat package to search for observations of a given species, which have the annotation value "Fruiting".

The iNaturalist export tool (https://forum.inaturalist.org/t/how-to-use-inaturalists-search-urls-wiki-part-2/18792) allows for you to pass the following additional terms into the query to achieve this: "&term_id=12", and "term_value_id=14", which represent the "Plant Phenology" and "Fruiting" terms respectively.

I was wondering whether this is something that can be achieved in the rinat package?

Thanks for all your work on the package, it's a great resource!

Different executing time of get_inat_obs_user

Hello,
I found out that time needed to download per user observations differs largely between users. Do you know the reason? Is it due to different geographic storage of the data or is it restricted by API request number (e.g. issue here)?
Is there any workaround to access single user observations faster?

Below are few tests, why it takes so long for friel with only 300?

Time to download 300 observations of the user friel
> system.time({ table <- rinat::get_inat_obs_user(username = "friel", maxresults = 300) })
user system elapsed
3.92 0.07 94.66

300 observations for zdekanovkov
> system.time({ table <- rinat::get_inat_obs_user(username = "zdekanovkov", maxresults = 300) })
user system elapsed
0.7 0.0 14.8

1500 observations for zdekanovkov - still shorter than for friel
> system.time({ table <- rinat::get_inat_obs_user(username = "zdekanovkov", maxresults = 1500) })
user system elapsed
0.71 0.00 15.67

respectable 50 000 observations for finatic (no. 3 user in total, note the time which is comparable to the first result)
> system.time({ table <- rinat::get_inat_obs_user(username = "finatic", maxresults = 300) })
user system elapsed
4.17 0.08 109.93

get_inat_obs_user error with certain users

Update: This error happens with users with many (>10K) records, though some have <20K, so I don't know that this is a repeat of issue #14.

I get an error with get_inat_obs_user with some users, for example:

temp = get_inat_obs_user("anudibranchmom",
maxresults = 100000)
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match

These are some other users for which I get the same error:
charlie
annikaml
berkshirenaturalist
rcurtis
erikamitchell

Thanks!

Allow bounds to be a spatial object

The get_inat_obs function accepts a bounds argument where the user can specify a bounding box within which to search for observations.

I wonder if it would be possible to allow this to be a spatial object, say a simple feature (sf)? I guess that you often have some kind of spatial object (country or state boundary for example) that you'd use for visualization after having downloaded the observations.

The downside would be that it adds sf as a dependency, but on the other hand if sf were imported, then inat_map could be modified to use geom_sf.

Just a suggestion, please feel free to close this issue if you think it is a bad idea. I'm happy to take a shot at a PR implementing it if you'd like.

maxresults

In the doc of the get_inat_obs function it's not written what the max value of maxresults is. Could you add it? 😸

Mismatch in records downloaded by get_inat_obs_user

The number of records downloaded with get_inat_obs_user can mismatch with the number of records displayed for a user by iNaturalist.

For example, username == "christiangross4", retrieves 390 records, but www.inaturalist.org shows 430 observations for this user.

Where does the mismatch come from? Is there an additional filter? If so, it would be great to indicate this in the documentation.

Thanks!

get_inat_obs() returns unrelated observations with taxon= some invalid synonyms

with rinat_0.1.4.99 get_inat_obs() using taxon= sometimes returns completely unrelated records. So far it appears that this happens with taxonomic names that are correct (in ITIS) but currently invalid. Note that submitting an accepted name can result in simpleError Your search returned zero results.
Vespertilio linereus
http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=946985

mimimum reproducible example (although different values for maxresults give me different unrelated results):

library(rinat)
oops1 <- get_inat_obs(taxon="Vespertilio linereus", maxresults=10, quality='research')
str(oops1)

library(rinat)
oops1 <- get_inat_obs(taxon="Vespertilio linereus", maxresults=10, quality='research')
str(oops1)
'data.frame': 10 obs. of 33 variables:
$ scientific_name : chr "Myiopsitta monachus" "Columbina inca" "Junonia coenia" "Columbina inca" ...
$ datetime : chr "2015-10-02 15:32:27 -0500" "2015-10-01 16:32:41 -0500" "2010-08-31 00:00:00 -0500" "2015-10-01 16:32:33 -0500" ...
$ description : chr "" "" "" "" ...
$ place_guess : chr "" "Paso de Ovejas, Veracruz, México" "Bedford Avenue, Raleigh NC" "Paso de Ovejas, Veracruz, México" ...
$ latitude : num 41.8 19.3 35.8 19.3 36 ...
$ longitude : num -87.7 -96.4 -78.7 -96.4 -75.6 ...
$ tag_list : chr "" "" "" "" ...
$ common_name : chr "Monk Parakeet" "Inca Dove" "Common Buckeye" "Inca Dove" ...
$ url : chr "http://www.inaturalist.org/observations/2034475" "http://www.inaturalist.org/observations/2034451" "http://www.inaturalist.org/observations/2034445" "http://www.inaturalist.org/observations/2034439" ...
$ image_url : chr "http://static.inaturalist.org/photos/2466260/medium.?1443818029" "http://static.inaturalist.org/photos/2466219/medium.JPG?1443817153" "http://static.inaturalist.org/photos/2466210/medium.JPG?1443816918" "http://static.inaturalist.org/photos/2466202/medium.JPG?1443816858" ...
$ user_login : chr "elfaulkner" "aureliomolinahdz" "coatlicue" "aureliomolinahdz" ...
$ id : int 2034475 2034451 2034445 2034439 2034432 2034436 2034401 2034395 2034381 2034342
$ species_guess : chr "Monk parakeet" "Tórtola cola larga" "Common Buckeye" "Tórtola cola larga" ...
$ iconic_taxon_name : chr "Aves" "Aves" "Insecta" "Aves" ...
$ taxon_id : int 19349 3544 48505 3544 48505 424575 366731 67731 67731 4956
$ id_please : chr "false" "true" "false" "true" ...
$ num_identification_agreements : int 1 1 1 1 1 1 1 1 1 1
$ num_identification_disagreements: int 0 0 0 0 0 0 0 0 0 0
$ observed_on_string : chr "2015-10-02 3:32:27 PM CDT" "2015-10-01 16:32:41" "2010-08-31" "2015-10-01 16:32:33" ...
$ observed_on : chr "2015-10-02" "2015-10-01" "2010-08-31" "2015-10-01" ...
$ time_observed_at : chr "2015-10-03 09:32:27 +1300" "2015-10-02 10:32:41 +1300" "" "2015-10-02 10:32:33 +1300" ...
$ time_zone : chr "Central Time (US & Canada)" "Central Time (US & Canada)" "Eastern Time (US & Canada)" "Central Time (US & Canada)" ...
$ positional_accuracy : int 8 NA 805 NA 192 NA 30 23 NA 52
$ geoprivacy : chr "" "" "" "" ...
$ positioning_method : chr "gps" "" "" "" ...
$ positioning_device : chr "gps" "" "" "" ...
$ out_of_range : chr "false" "false" "" "false" ...
$ user_id : int 19145 43120 62388 43120 62388 135242 56928 118621 62971 62971
$ created_at : chr "2015-10-03 09:33:33 +1300" "2015-10-03 09:19:07 +1300" "2015-10-03 09:14:32 +1300" "2015-10-03 09:14:12 +1300" ...
$ updated_at : chr "2015-10-03 09:39:15 +1300" "2015-10-03 09:30:46 +1300" "2015-10-03 09:30:26 +1300" "2015-10-03 09:24:20 +1300" ...
$ quality_grade : chr "research" "research" "research" "research" ...
$ license : chr "CC-BY-NC" "CC-BY-NC" "CC-BY-NC" "CC-BY-NC" ...
$ oauth_application_id : int NA NA NA NA NA NA NA NA NA NA

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rinat_0.1.4.99

loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 digest_0.6.8 MASS_7.3-43 R6_2.1.1 grid_3.2.2 plyr_1.8.3
[7] jsonlite_0.9.17 gtable_0.1.2 magrittr_1.5 scales_0.3.0 httr_1.0.0 ggplot2_1.0.1
[13] stringi_0.5-5 curl_0.9.3 reshape2_1.4.1 fortunes_1.5-2 proto_0.3-10 tools_3.2.2
[19] stringr_1.0.0 munsell_0.4.2 maps_2.3-11 colorspace_1.2-6

My use case is expanding my taxon names to all synonyms and children (subspecies), then submitting to rinat::get_inat_obs() as well as rgbif::get_occ() and rbison::bison(). Given that gbif, bison, and iNaturalist sometimes disagree on accepted v invalid taxa, and don't quite return all observations listed for subspecies when queried at the species level, I need to do this and discard duplicae records to get complete observation holdings.

[Bug] rbind error in get_inat_obs_project()

Followup from this issue #28
While downloading and reading from the table works with the current development build, there is still an rbind error at the end.

library(rinat)
obs <- get_inat_obs_project("flowering-plants-of-ethiopia", type = "observations", raw = T)

1170 Records
0-200-400-600-800-1000-1200Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match

Error in get_inat_obs: Your search returned too many results

Hi! I am looking to use iNaturalist data for searching for observations in specific areas, but I keep getting the following error code:
Your search returned too many results, please consider breaking it up into smaller chunks by year or month.
Is there any way to avoid this? Even when I set maxresults=1000 or even =1, I still am getting this error code.

Pull in from guides?

iNaturalist now has guides. For example, Plants of Mueller State Park. There doesn't seem to be an API to them or a way to get them from rinat. Not sure if I'm missing something, but if not, this could be a useful feature.

Error from inat_handle()

In trying to download all iNat data from Denmark in recent years I encounter an error from get_inat_obs() when I get to 2018. The reprex below takes a while to run - sorry.

library(rinat)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang

dk_bounds <- c(54.559029,  8.076389, 57.751526, 15.193056)

get_inat_obs(bounds = dk_bounds,
               maxresults = 14000,
               year = 2017) -> dk_inat_2017
get_inat_obs(bounds = dk_bounds,
             maxresults = 14000,
             year = 2018) -> dk_inat_2018
#> Warning in inat_handle(data): Content type incorrect, should be 'text/csv;
#> charset=utf-8'
#> Error: object of type 'closure' is not subsettable

Created on 2019-06-03 by the reprex package (v0.3.0)

Right, so the error I get is actually from inat_handle() which is being called from get_inat_obs().
The error object of type 'closure' is not subsettable is due to a bug in the one of the warning calls in inat_handle also reported in #22. If we ignore that (I still get the error) and strip down get_inat_handle() to only the bits that's needed to make the call to inat_handle() we have this:

  # Defined in get_inat_obs() line 160-175, but here with the 
  # bug in the warning call fixed:
  inat_handle <- function(x){
    res <- httr::content(x, as = "text")
    if(!x$headers$`content-type` == 'text/csv; charset=utf-8' || x$status_code > 202 || nchar(res)==0 ){
      if(!x$headers$`content-type` == 'text/csv; charset=utf-8'){
        warning("Conent type incorrect, should be 'text/csv; charset=utf-8'")
        NA
      }
      if(x$status_code > 202){
        warning(sprintf("Error: HTTP Status %s", x$status_code))
        NA
      }
      if(nchar(res)==0){
        warning("No data found")
        NA
      }
    } else { res }
  }
  
  
  # Set the arguments for get_inat_handle()
  year = 2018
  bounds =  c(54.559029,  8.076389, 57.751526, 15.193056)
  maxresults = 14000
  
  # This is the stripped down get_inat_obs() until the point where inat_handle() is called
  ## Parsing and error-handling of input strings
  search <- ""
  search <- paste(search,"&year=",year,sep="")
  search <- paste(search,"&swlat=",bounds[1],"&swlng=",bounds[2],"&nelat=",bounds[3],"&nelng=",bounds[4],sep="")
  
  base_url <- "http://www.inaturalist.org/"
  q_path <- "observations.csv"
 
# The bits below here is part of a for loop (lines 141-146). 
# Up until i = 51 everything goes fine:
  i <- 50
  page_query <- paste(search,"&per_page=200&page=",i,sep="")
  data <-  httr::GET(base_url,path = q_path, query = page_query)
  data <- inat_handle(data)
  class(data)
#> [1] "character"
# But then things go wrong:
  i <- 51
  page_query <- paste(search,"&per_page=200&page=",i,sep="")
  data <-  httr::GET(base_url,path = q_path, query = page_query)
  class(data)
#> [1] "response"
  data
#> Response [https://www.inaturalist.org/observations.csv?&year=2018&swlat=54.559029&swlng=8.076389&nelat=57.751526&nelng=15.193056&per_page=200&page=51]
#>   Date: 2019-06-03 21:17
#>   Status: 404
#>   Content-Type: text/html; charset=utf-8
#>   Size: 11 kB
#> <!DOCTYPE html>
#> <html xmlns="http://www.w3.org/1999/xhtml"
#>     xml:lang="en"
#>     lang="en"
#>     xmlns:fb="http://www.facebook.com/2008/fbml"
#>     xmlns:og="http://ogp.me/ns#">
#>   <head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# inaturali...
#>     <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
#>     <meta http-equiv="Content-Language" content="en">
#>     <title>404 Not Found
#> ...
  data <- inat_handle(data)
#> Warning in inat_handle(data): Conent type incorrect, should be 'text/csv;
#> charset=utf-8'
#> Warning in inat_handle(data): Error: HTTP Status 404

Created on 2019-06-03 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.5        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Copenhagen           
#>  date     2019-06-03                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
#>  backports     1.1.4   2019-04-10 [1] CRAN (R 3.6.0)
#>  callr         3.2.0   2019-03-15 [1] CRAN (R 3.6.0)
#>  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
#>  curl          3.3     2019-01-10 [1] CRAN (R 3.6.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
#>  devtools      2.0.2   2019-04-08 [1] CRAN (R 3.6.0)
#>  digest        0.6.19  2019-05-20 [1] CRAN (R 3.6.0)
#>  evaluate      0.13    2019-02-12 [1] CRAN (R 3.6.0)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.6.0)
#>  httr          1.4.0   2018-12-11 [1] CRAN (R 3.6.0)
#>  knitr         1.23    2019-05-18 [1] CRAN (R 3.6.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
#>  pkgbuild      1.0.3   2019-03-20 [1] CRAN (R 3.6.0)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
#>  processx      3.3.1   2019-05-08 [1] CRAN (R 3.6.0)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
#>  R6            2.4.0   2019-02-14 [1] CRAN (R 3.6.0)
#>  Rcpp          1.0.1   2019-03-17 [1] CRAN (R 3.6.0)
#>  remotes       2.0.4   2019-04-10 [1] CRAN (R 3.6.0)
#>  rlang         0.3.4   2019-04-07 [1] CRAN (R 3.6.0)
#>  rmarkdown     1.13    2019-05-22 [1] CRAN (R 3.6.0)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
#>  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
#>  testthat      2.1.1   2019-04-23 [1] CRAN (R 3.6.0)
#>  usethis       1.5.0   2019-04-07 [1] CRAN (R 3.6.0)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
#>  xfun          0.7     2019-05-14 [1] CRAN (R 3.6.0)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

So it appears that when we get to result 10000+ (200 results per page times 50), something is not working out. Looks like the API returns a 404, but I can't see exactly why and I am not sure how to debug further. Any suggestions?

EDIT: After posting I noticed that #22 is pretty much the same issue - sorry should probably have commented there instead. I've updated this issue we one of the bug fixes suggested by @tphilippi. I leave this issue open as it looks like the problem arose in #22 when getting past 100000 observations while I already see it at 10000, so it might not be 100% identical after all. Please feel free to close this issue if it is considered a duplicate.

Cheers,
Lars

Different results between get_inat_obs() and iNaturalist explore page

I am trying to download observations by research areas. I found the results of get_inat_obs() is different from the results of iNaturalist explore page searching.
Take a island (Yakushima island in Japan) as the example. If we use the function, we get the following results. There are 32 observations covering 31 species in 2022, the numbers become 8 and 8 when we only keep the research grade data.

# Get all the observation data in 2022. 
obs_yakushima_2022 <- get_inat_obs(
  query = "Yakushima, Kagoshima, Japan", year = 2022, maxresults = 10000
)
nrow(obs_yakushima_2022)
# 32
length(unique(obs_yakushima_2022$scientific_name))
# 31

# If we only want the research grade data. 
obs_yakushima_2022_research <- get_inat_obs(
  query = "Yakushima, Kagoshima, Japan", year = 2022, quality = "research", 
  maxresults = 10000
)
nrow(obs_yakushima_2022_research)
# 8
length(unique(obs_yakushima_2022_research$scientific_name))
# 8

However, if we do the inquiry on iNaturalist explore page, we get 163 observations covering 118 species in 2022. If we filter the data to keep the research grade data, we have 73 observations, covering 55 species.
image
image
image

Is there anyway to get the observation data of the iNaturalist explore pages?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.