limaraf / plantr Goto Github PK
View Code? Open in Web Editor NEWAn R Package for Managing Species Records from Biological Collections
License: GNU General Public License v3.0
An R Package for Managing Species Records from Biological Collections
License: GNU General Public License v3.0
plantR
default world map (country level) @LimaRAFplantR
default LatAm map (county level) @LimaRAF@saramortara @AndreaSanchezTapia
Oi Sara,
Estou finalizando formatOcc()
e buscando o ponto onde vc encontrou o erro da vinheta. Notei algumas coisas da mudança de fixFields()
para formatDwc()
:
formatDwc()
retorna começam com "extensions.http...rs.gbif.org.terms.1.0.Multimedia.http...purl.org.dc.terms."formatOcc()
está rodando e como ela não usa mais tdwgNames
, não consegui mais reproduzir o erro que vc enviou por email. Pf, faça o pull e veja se vc ainda encontra o mesmo erro.I get this error when running the command below from the package vignette:
occs <- validateCoord(occs)
Error in $<-.data.frame
(*tmp*
, "tmp.order", value = 1:0) :
replacement has 2 rows, data has 0
I have an error running the example from the tutorial:
library("plantR")
spp <- c("Casearia sylvestris",
"Euterpe edulis",
"Trema micrantha")
occs_gbif <- rgbif2(species = spp,
basisOfRecord = "PRESERVED_SPECIMEN",
remove_na = FALSE, limit = 500000)
This gives me an error
Making request to GBIF...
Making request to GBIF...
Making request to GBIF...
Error in names(gbif_data) <- species :
attribut 'names' [3] must be of same length than vector [2]
I run under:
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Trace back in which function the problem below occurrs:
plantR::prepName("AFA Lira")
"Lira, A.F.A." # ok
plantR::prepName("AF Araujo Lira")
"Lira, A.A." # not ok
plantR::prepName("Ferreira dos Santos, André")
"Ferreira Dos, A." # not ok
Hi Renato,
I am getting the following error when I try to use the validateDup function:
Error in x1$col.year[ids] <- as.character(sapply(strsplit(x1$col.year[ids], :
NAs não são permitidos em atribuições por subscritos
I found the piece of dataframe where the error occurs (here: SpeciesOccurrences.csv), but I can not understand why the error occurs.
Could you please help me to solve this error?
This is a question I don't want to forget: I see that NAs are being replaced everywhere with "NA_character_" but I haven't reached the point where I understand why.
formatFamily
: step currently done using ‘flora’ and taxonStand packages)returning " J\xFAnior" makes it parse as "J<fa>nior
"
All uniscape codes begin with \u in this case "\u00fa"
But most important: we should not be returning non-ascii characters, if anything it should be nice and plain "Junior". Update: I'm lying, it could be Júnior. The only thing is that Júnior is only for pt-br names, not universal.
Fields gbif
and splink
do not match with current output from their sources. Given that we are only dealing with gbif and speciesLink data, I suggest we keep only those data sources. We should have two other columns: dwc
with the current DarwinCore standard and a logical column required
to specify if the field is required in the data cleaning procedure. Suggestion: file formarField.csv
(and we should keep the script we used to generate it either here or in a different repo) containing the columns: gbif
, splink
, dwc
, and required
.
Oi pessoal,
Estou rodando as funções de padronização e validação do plantR com os dados de ocorrência das espécies da tribo Paullinieae e apareceram erros nas funções formatTax() e validateCoord().
Deixei os script com os erros abaixo. Os script está baixando todos os registros de ocorrência para os seis gêneros da tribo. Isso leva um tempinho (são pouco mais de 200 mil registros), mas assim é mais garantido que os erros vão aparecer.
A função formatTax() é a menos problemática porque já temos uma lista com os nomes das espécies padronizados e não precisamos mais rodar a função. Meu objetivo é apenas reportar o erro da função com esse conjunto de dados.
O erro que aparece é : Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 182344, 188400
Já na função validateCoord() o erro varia de acordo com difenrentes testes (vejam no final do script, por favor).
Muito obrigada mais uma vez!
### Clean the environment ####
rm(list=ls())
### Installation ####
if(!require("remotes")) install.packages("remotes")
if(!require("plantR")) remotes::install_github("LimaRAF/plantR")
if(!require("BIEN")) install.packages("BIEN")
if(!require("rgbif")) install.packages("rgbif")
if(!require("stringr")) install.packages("stringr")
### Download occurrence data ####
#### Download occurrence data from BIEN ####
occ.BIEN <- BIEN_occurrence_genus(genus = c("Cardiospermum",
"Lophostigma",
"Paullinia",
"Serjania",
"Thinouia",
"Urvillea"),
cultivated = F,
only.new.world = F,
all.taxonomy = F,
native.status = F,
natives.only = T,
observation.type = T,
political.boundaries = T,
collection.info = T)
dim(occ.BIEN) # 33415 24
## Standardization of character encoding
for (i in 1:ncol(occ.BIEN)){
if(is.character(occ.BIEN[,i])){
Encoding(occ.BIEN[,i]) <- "UTF-8"
}
}
### Download occurrence data from speciesLink ####
occ.splink <- rspeciesLink(basisOfRecord = "PreservedSpecimen",
family = "Sapindaceae",
species = c("Cardiospermum",
"Lophostigma",
"Paullinia",
"Serjania",
"Thinouia",
"Urvillea"),
Scope = "plants",
Synonyms = "species2000",
MaxRecords = 300000)
dim(occ.splink) # 62291 49
## Standardization of character encoding
c.right <- c("À", "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É", "Ê", "Ë",
"Ì", "Î", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö", "×", "Ø",
"Ù", "Ú", "Û", "Ü", "Þ", "ß", "á", "â", "ã", "ä", "å",
"æ", "ç", "è", "é", "ê", "ë", "ì", "î", "ï", "ð", "ñ", "ò",
"ó", "ô", "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ", "í")
c.wrong <- c("À", "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É",
"Ê", "Ë", "Ì", "Î", "Ñ", "Ò", "Ó", "Ô",
"Õ", "Ö", "×", "Ø", "Ù", "Ú", "Û", "Ü", "Þ", "ß",
"á", "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ê",
"ë", "ì", "î", "ï", "ð", "ñ", "ò", "ó", "ô", "õ",
"ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ", "Ã.")
for (i in 1:ncol(occ.splink)){
if(is.character(occ.splink[,i])){
Encoding(occ.splink[,i]) <- "UTF-8"
for(j in 1:length(c.right)){
occ.splink[,i] <- str_replace_all(occ.splink[,i],
pattern = c.wrong[j],
replacement = c.right[j])
}
}
}
### Download occurrence data from GBIF ####
occ.gbif <- rgbif2(dir = "data/plantR",
filename = "output.gbif",
species = c("Thinouia Triana & Planch.",
"Lophostigma Radlk.",
"Cardiospermum L.",
"Paullinia L.",
"Serjania Mill.",
"Urvillea Kunth"),
n.records = 300000,
force = T,
basisOfRecord = "PRESERVED_SPECIMEN")
dim(occ.gbif) # 134568 206
## Standardization of character encoding
for (i in 1:ncol(occ.gbif)){
if(is.character(occ.gbif[,i])){
Encoding(occ.gbif[,i]) <- "UTF-8"
}
}
### Combine different databases ####
### Formatting BIEN database before running the formatDwc() fuction ####
## Separate "date_collected" into year, month and day
occ.BIEN[,25] <- sapply(strsplit(as.character(occ.BIEN$date_collected), "-"), function(x) (x[1]))
occ.BIEN[,26] <- sapply(strsplit(as.character(occ.BIEN$date_collected), "-"), function(x) (x[2]))
occ.BIEN[,27] <- sapply(strsplit(as.character(occ.BIEN$date_collected), "-"), function(x) (x[3]))
colnames(occ.BIEN)[25:27] <- c("year", "month", "day")
## Prepare other required columns
occ.BIEN[,28] <- occ.BIEN$county
occ.BIEN[,29:30] <- NA
occ.BIEN[,31] <- "Sapindaceae"
colnames(occ.BIEN)[28:31] <- c("municipality", "typeStatus", "scientificNameAuthorship", "family")
## Standardize column names
colnames(occ.BIEN)[colnames(occ.BIEN) == "collection_code"] <- "collectionCode"
colnames(occ.BIEN)[colnames(occ.BIEN) == "catalog_number"] <- "catalogNumber"
colnames(occ.BIEN)[colnames(occ.BIEN) == "record_number"] <- "recordNumber"
colnames(occ.BIEN)[colnames(occ.BIEN) == "recorded_by"] <- "recordedBy"
colnames(occ.BIEN)[colnames(occ.BIEN) == "state_province"] <- "stateProvince"
colnames(occ.BIEN)[colnames(occ.BIEN) == "latitude"] <- "decimalLatitude"
colnames(occ.BIEN)[colnames(occ.BIEN) == "longitude"] <- "decimalLongitude"
colnames(occ.BIEN)[colnames(occ.BIEN) == "identified_by"] <- "identifiedBy"
colnames(occ.BIEN)[colnames(occ.BIEN) == "date_identified"] <- "dateIdentified"
colnames(occ.BIEN)[colnames(occ.BIEN) == "scrubbed_species_binomial"] <- "scientificName"
colnames(occ.BIEN)[colnames(occ.BIEN) == "custodial_institution_codes"] <- "institutionCode"
colnames(occ.BIEN)[colnames(occ.BIEN) == "X.U.FEFF.scrubbed_genus"] <- "scrubbed_genus"
occ.BIEN$dateIdentified <- as.character(occ.BIEN$dateIdentified)
### Combine database using formatDwc() function ####
occs.all <- formatDwc(user_data = occ.BIEN,
splink_data = occ.splink,
gbif_data = occ.gbif,
drop = T, bind_data = T)
dim(occs.all) # 230274 47 records
### Data editing ####
#### Collection codes, people names, collector number and dates ####
## Formatting strings before running formatOcc() fuction to avoid this error:
# Error in gsub(x, "", y, fixed = TRUE) : zero-length pattern
occs.all$recordNumber[which(occs.all$recordNumber == "938[=Diary No. 707]")] <- NA
occs.all$verbatimEventDate[which(occs.all$verbatimEventDate == "Sept. 4-'77")] <- NA
occs.all$verbatimEventDate[which(occs.all$verbatimEventDate == "label says \"1841/XIV\"")] <- NA
occs.all$recordedBy[which(occs.all$recordedBy == "M. Nadruz; ,J.F. Baumgratz, M. Bovini, D.S.P. Silva")] <- "M. Nadruz, J.F. Baumgratz, M. Bovini, D.S.P. Silva"
## Replacing "," by ";" to separete names of collectors and identifiers
## Caso 1: "M. A. Costa, J. Ribeiro, E. C. Pereira"
## recordedBy
pos.c2semic <- which(sapply(str_locate_all(pattern = "\\.", occs.all$recordedBy), function(x) (x[1])) == 2 &
!is.na(sapply(str_locate_all(pattern = "\\,", occs.all$recordedBy), function(x) (x[1]))))
occs.all$recordedBy[pos.c2semic] <- str_replace_all(occs.all$recordedBy[pos.c2semic], "\\,", "\\;")
## identifiedBy
pos.c2semic.I <- which(sapply(str_locate_all(pattern = "\\.", occs.all$identifiedBy), function(x) (x[1])) == 2 &
!is.na(sapply(str_locate_all(pattern = "\\,", occs.all$identifiedBy), function(x) (x[1]))))
occs.all$identifiedBy[pos.c2semic.I] <- str_replace_all(occs.all$identifiedBy[pos.c2semic.I], "\\,", "\\;")
### Replacing "|" by " | "
## Caso 2: "M. A. Costa|J. Ribeiro|E. C. Pereira"
occs.all$recordedBy <- str_replace_all(occs.all$recordedBy, "\\|", " | ")
occs.all$identifiedBy <- str_replace_all(occs.all$identifiedBy, "\\|", " | ")
occs.all.2 <- formatOcc(occs.all)
#### Locality information ####
occs.all.3 <- formatLoc(occs.all.2)
#### Geographical coordinates ####
occs.all.4 <- formatCoord(occs.all.3)
#### Species and family names ####
occs.all.5 <- formatTax(occs.all.4, db = "tpl")
## Error in data.frame(..., check.names = FALSE) :
# arguments imply differing number of rows: 182344, 188400
#### Locality information
occs.all.6 <- validateLoc(occs.all.4)
#### Geographical coordinates
## Test 1
occs.all.7 <- validateCoord(occs.all.6, output = "new.col")
## Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
# Evaluation error: Found 1 feature with invalid spherical geometry.
# [194] Loop 2 edge 8 crosses loop 3 edge 0.
## Test 2
sf::sf_use_s2(F)
occs.all.7 <- validateCoord(occs.all.6, output = "new.col")
## Error in `$<-.data.frame`(`*tmp*`, "geo.check", value = c("ok_county", :
# replacement has 182344 rows, data has 182346
# In addition: Warning messages:
# 1: In geo.check[is.na(geo.check)] <- `*vtmp*` :
# number of items to replace is not a multiple of replacement length
# 2: In geo.check[is.na(geo.check)] <- tmp1[is.na(geo.check)] :
# number of items to replace is not a multiple of replacement length
## Test 3
sf::sf_use_s2(F)
pos <- which(occs.all.6$recordedBy.new == "Busey, P." &
occs.all.6$recordNumber.new == "422") # Parece que esses registros estao causando o erro: "Error in `$<-.data.frame`(`*tmp*`, "geo.check", value = c("ok_county",..."
occs.all.6.a <- occs.all.6[-pos,]
occs.all.7 <- validateCoord(occs.all.6.a, output = "new.col", tax.name = "scientificName")
## Error in robustbase::covMcd(df1[use_these, c("lon2", "lat2")], alpha = 1/2, :
# n == p+1 is too small sample size for MCD
A lógica de missName está invertida.
missName <- function(x, # o string de entrada
type = NULL, # se é para ser "collector" ou "identificator" (deveria ser "determiner")
noName = "Anonymous") # o string de saída, predeterminado
tem muitas opções para type ("collector","coletor","colector","identificator","identificador","determinador")
mas não detecta esses strings,
Por exemplo:
missName("s/col", type = "collector")
devolve "Anonymous"
(até aí tudo bem)
missName("s/col", type = "colector")
devolve "Anonymous"
mãs:
missName("s/colector", type = "collector")
devolve "s/colector"
missName("s/colector", type = "colector")
devolve "s/colector"
então não vale a pena ter várias opções de type (isso pode ser apenas collector ou determiner, para o usuário do pacote) o que importa é que o string de entrada possa ter essas variações.
Falta ver como isso se encaixa dentro de formatOcc()
formatName()
is taking a vector but the very first logical test is not vectorized -and thus will just obey the first element of the vector.
if (!grepl("[a-z;A-Z], ", x)) {
nome <- x
}
will return
Warning message:
In if (!grepl("[a-z;A-Z], ", x)) { :
the condition has length > 1 and only the first element will be used
Tudo certo, @LimaRAF? Criarei tópicos separados para alguns problemas que estou tendo.
occs <- readData(file = "0428744-210914110416597.zip", path <- "https://api.gbif.org/v1/occurrence/download/request/")
Retorna o seguinte erro:
Error in if (grepl("verbatim.txt", all.files)) { :
the condition has length > 1
In addition: Warning message:
In data.table::fread(occ.path, na.strings = na.strings, quote = quote, :
Found and resolved improper quoting out-of-sample. First healed line 2040: <<1305097720 38360565-6485-48ac-8a91-10033ff89543 CC_BY_4_0 2016-03-10T11:23:54Z Centro Internacional de Agricultura Tropical - CIAT CEN OCCURRENCE 38360565-6485-48ac-8a91-10033ff89543 19495924 40723 Srgio Duarte Prat Kricun Native PRESENT "Arbol de 4m de altura, flores blanquecinas. Caa - vera"". Co Gilberti N 251.&nf;Doao Herbrio Yerba Mate y TE - EEA Cerro Azul. Cerro Azul, Misiones, Argentina.""". EMBRAPA Recursos Ge>>. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
Aqui o link desse meu dataset
Your download is available at the following address:
https://api.gbif.org/v1/occurrence/download/request/0428744-210914110416597.zip
@@@@@@@@@@
@@@@@@@@@@
Porém o mesmo também ocorre nos dados de exemplo da função ?readData
occs <- readData(file = "0227351-200613084148143.zip", path <- "https://api.gbif.org/v1/occurrence/download/request/")
Error in if (grepl("verbatim.txt", all.files)) { :
the condition has length > 1
Em ambos os casos, se eu colocar output = 'occurrence'
nos argumentos o erro não acontece.
PS: algo menor que não vale a pena abrir um issue: o gbif aparentemente não conta mais com uma coluna entitulada ''country'' no arquivo Dwc. Se eu der occs <- formatDwc(gbif_data = occs)
nos seus dados de exemplo ele retorna
Warning message:
Important columns were not found in the gbif_data:
country
Valeu!
(Tenho um outro problema rolando com o formatTax()
mas precisarei refazer o erro pois limpei o console...)
Estou tendo problemas ao usar a função rspeciesLink()
occs_splink <- rspeciesLink(species = sp,
Scope = "plants",
basisOfRecord = "PreservedSpecimen",
Synonyms = "flora2020")
Making request to speciesLink...
Erro: lexical error: invalid char in json text.
https://api.splink.org.br/recor
(right here) ------^
Hi!
I installed the latest version of plantR today and now the validateCoord() function is no longer working for my dataset.
Code:
occs.all.6 <- read.csv("output.occs.all.6.csv", encoding = "native.enc")
occs.all.7 <- validateCoord(occs.all.6, output = "new.col")
Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
Evaluation error: Found 1 feature with invalid spherical geometry.
[194] Loop 2 edge 8 crosses loop 3 edge 0.
The dataset is available at:
https://github.com/leilameyer08/plantR/blob/main/output.occs.all.6.csv
Thank you!
Estou tentando utilizar esta função seguindo exatamente o que está no tutorial e estou recebendo um erro.
Não tive nenhum erro nas etapas anteriores, só neste momento, com esta função.
Instalei há menos de um dia o pacote, então creio que está atualizado.
occs <- checkCoord(occs,
keep.cols = c("geo.check", "NAME_0", "country.gazet"))
Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
Evaluation error: Found 1 feature with invalid spherical geometry.
[194] Loop 2 edge 8 crosses loop 3 edge 0.
Oi @LimaRAF e pessoal!
Estou montando um banco de dados de Melastomataceae para o Quadrilátero Ferrífero de Minas Gerais a partir dos registros de ocorrência do GBIF, speciesLink e BIEN. Estou usando o plantR para fazer a limpeza dos registros e o pacote tem funcionando super bem! Mas acabei encontrando dois probleminhas:
1°) Na checagem da qualidade das identificações alguns registros sem identificador (estão como s.n.) estão sendo classificados com alta qualidade ("high"). Acho que esses registros deveriam estar na categoria de "unknown";
2°) A função ValidateDup() está retornando o mesmo erro reportado pelo WevertonBio:
Error in x1$col.year[ids] <- as.character(sapply(strsplit(x1$col.year[ids], : NAs are not allowed in subscripted assignments
Esse é código que estou usando:
### Installation ####
# install.packages("remotes")
#require("remotes")
#install_github("LimaRAF/plantR")
require("plantR")
#devtools::install_github("bmaitner/RBIEN")
require("BIEN")
### Download occurrence data ###
#### Download occurrence data from BIEN ####
occ.BIEN <- BIEN_occurrence_state(country = "Brazil",
state = "Minas Gerais",
cultivated = F,
new.world = T,
all.taxonomy = T,
native.status = F,
natives.only = T,
observation.type = T,
political.boundaries = T,
collection.info = T)
occ.BIEN <- occ.BIEN[occ.BIEN$scrubbed_family == "Melastomataceae", ]
dim(occ.BIEN)
### Download occurrence data from speciesLink ###
occ.splink <- rspeciesLink(basisOfRecord = "PreservedSpecimen",
family = "Melastomataceae",
country = "Brazil",
stateProvince = "Minas Gerais",
Scope = "plants",
Synonyms = "flora2020",
MaxRecords = 100000)
dim(occ.splink)
### Download occurrence data from GBIF ###
occ.gbif <- rgbif2(dir = "data/raw",
filename = "output.gbif",
species = "Melastomataceae",
country = "BR",
stateProvince = "Minas Gerais",
n.records = 100000,
force = T,
basisOfRecord = "PRESERVED_SPECIMEN")
dim(occ.gbif)
### Combine database using formatDwc() function ###
occs.all <- formatDwc(splink_data = occ.splink,
gbif_data = occ.gbif,
bien_data = occ.BIEN,
fix.encoding = c("splink_data", "gbif_data", "bien_data"),
drop = T, bind_data = T)
dim(occs.all)
### Data editing ###
#### Collection codes, people names, collector number and dates ####
occs.all.2 <- formatOcc(occs.all)
dim(occs.all.2)
#### Locality information ####
occs.all.3 <- formatLoc(occs.all.2)
dim(occs.all.3)
#### Geographical coordinates ####
occs.all.4 <- formatCoord(occs.all.3)
dim(occs.all.4)
#### Species and family names ####
occs.all.5 <- formatTax(occs.all.4, db = "bfo")
dim(occs.all.5)
### Data validation ####
#### Locality information
occs.all.6 <- validateLoc(occs.all.5)
dim(occs.all.6)
#### Geographical coordinates
occs.all.7 <- validateCoord(occs.all.6, output = "new.col")
dim(occs.all.7)
#### Species taxonomy and identification
occs.all.8 <- validateTax(occs.all.7,top.det = 200,
generalist = T)
high_s.n. <- occs.all.8[which(occs.all.8$tax.check == "high" &
occs.all.8$identifiedBy.new == "s.n."),]
dim(high_s.n.) # 8535 registros classificados com alta qualidade de identificação, mas estão sem identificador.
#### Duplicate specimens
occs.all.9 <- validateDup(occs.all.8, merge = T, prop = 0.01,
ignore.miss = T, remove = T)
# Error in x1$col.year[ids] <- as.character(sapply(strsplit(x1$col.year[ids], :
# NAs are not allowed in subscripted assignments
Muito obrigada mais uma vez!
From previous issues:
getAdmin()
you say Argentinean subdivisions are Province instead of State (OK) and Department instead of Municipality, but Department exists between the level of Province and Municipality (municipality exists) we should check this (for all countries aliás).Not sure why the function rspecieslink is not returnig the filed typeStatus, which contain the info on type specimens and is key for taxonomical validation of the specimens. I know speciesLink provided this info but it is not in the data object of the function. I tried to set the argument Typus = TRUE but then the function reported an error ("Output is empty. Check your request.").
Any ideas on how to fix it?
Abs!
Olá pessoal,
Obrigada por resolverem o problema da função validateCoord().
Instalei novamente o plantR, mas agora apareceu um novo problema na função formatOcc() que não está formatando o nome dos coletores e identificadores corretamente. Surgem nomes completamente estranhos nas colunas de recordedBy.new e identifiedBy.new
Testei com o exemplo do pacote com Euterpe edulis e está funcionando normalmente. Apenas com meus dados que aparece esse problema.
Código:
if(!require("plantR")) remotes::install_github("LimaRAF/plantR")
data <- 'https://raw.github.com/leilameyer08/plantR/master/occs.exemplo.csv'
data <- read.csv(data, row.names = 1, encoding = "UTF-8")
occs.all.2 <- formatOcc(data)
occs.all.2$recordedBy[1] ##"M. A. Costa, J. Ribeiro, P. A. Assunçao & E. C. Pereira"
occs.all.2$recordedBy.new[1] ## "Ackermann"
occs.all.2$identifiedBy[1] ##"Acevedo-Rodríguez, P., (BOT), Smithsonian Institution - National Museum of Natural History (UNITED STATES)"
occs.all.2$identifiedBy.new[1] ##"Sandwith, N.Y."
Obrigada!
We just want to have country/state/county/municipality check columns without them being merged into one.
replaceNames.csv está assumindo non-ascii no início, para strings como guyane francaise, perou, franca.
depois ele passa a ter non-ascii nos estados do brasil,
Neste momento a linha de código que tiraria está comentada, na hora de criar os dicionários que está deixando passar grafias com non-ascii.
Precisamos ou tirar aqui ou completar a tabela com grafias com non-ascii.
plantR/data-raw/00_generating_dictionaries.R
Line 137 in b0845b8
Opa,
Segue abaixo o erro que tou recebendo no formatTax()
occs <- readData(file = "0428744-210914110416597.zip",
path <- "https://api.gbif.org/v1/occurrence/download/request/",
output = 'occurrence') [usando apenas o output = occurrence por conta do que comentei na outra thread]
#DwC file: format field names for following formatatting
occs <- formatDwc(gbif_data = occs)
#Format collection codes, people names, collector number, and dates
occs <- formatOcc(occs)
#Standardize locality info (country, city names)
occs <- formatLoc(occs)
#Geographical coordinates: decimal degrees formatting and retrieves missing coordinates
#from a gazetteer
occs <- formatCoord(occs)
#Format species and family names
occs <- formatTax(occs)
Retorna o seguinte (abaixo diz que os nomes das famílias foram substituídos mas imagino que o erro impeça, pois depois chequei e não foram):
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
The following family names were automatically replaced:
|Genus |Old fam. |New fam. |
|:--------------|:----------------|:----------------|
|Cordia |Cordiaceae |Boraginaceae |
|Cunninghamia |Cupressaceae |Rubiaceae |
|Dombeya |Malvaceae |Araucariaceae |
|Ehretia |Ehretiaceae |Boraginaceae |
|Euploca |Heliotropiaceae |Boraginaceae |
|Heisteria |Erythropalaceae |Olacaceae |
|Heliotropium |Heliotropiaceae |Boraginaceae |
|Hydrocotyle |Apiaceae |Araliaceae |
|Lonicera |Caprifoliaceae |Rubiaceae |
|Matourea |Plantaginaceae |Scrophulariaceae |
|Myriopus |Heliotropiaceae |Boraginaceae |
|Nephrolepis |Nephrolepidaceae |Lomariopsidaceae |
|Piriqueta |Turneraceae |Turneraceae |
|Prosopanche |Hydnoraceae |Hydnoraceae |
|Quiina |Quiinaceae |Quiinaceae |
|Sambucus |Adoxaceae |Adoxaceae |
|Tetrastylidium |Strombosiaceae |Olacaceae |
|Tournefortia |Heliotropiaceae |Boraginaceae |
|Turnera |Turneraceae |Turneraceae |
|Varronia |Cordiaceae |Boraginaceae |
|Viburnum |Adoxaceae |Adoxaceae |
|Viviania |Vivianiaceae |Vivianiaceae |
|Ximenia |Ximeniaceae |Olacaceae |
Error in `[.data.table`(families.data, is.na(name.correct), tmp.fam, FALSE) :
The items in the 'by' or 'keyby' list are length(s) (1). Each must be length 10; the same length as there are rows in x (after subsetting if i is provided).
In addition: Warning messages:
1: In gsub(paste0(" ", rank, " "), paste0(" ", rank), x_new, perl = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
2: In gsub(paste0(" ", rank, " "), paste0(" ", rank), x_new, perl = TRUE) :
argument 'replacement' has length > 1 and only the first element will be used
Tou no {plantR} 0.1.5 e no R 4.2.1.
Abs!
Oi Renato,
Estou tendo um problema recorrente, possivelmente ligado ao issue #79.
Ao tentar rodar a função validateCoord(), obtenho o erro:
occ <- validateCoord(df, output = "new.col")
Error in `$<-.data.frame`(`*tmp*`, "geo.check", value = c("ok_state", :
replacement has 2 rows, data has 3
As vezes o erro dá com ok_state, em outros casos com ok_county. Imagino que seja por algum problema de projeção, onde 2 locais são atribuídos para um único ponto. Tentei adicionar o comando sf::sf_use_s2(F) antes de validateCoord, mas não resolveu.
Estou usando a versão 0.1.5 do plantR e a versão 1.0-8 do sf.
Estou te mandando as linhas de uma das planilhas que está dando esse erro:
ErrorValidateCoord.csv
Need to remove the TPL database and add a diffent global backbone to check species synonyms
Error in TPLck(sp = d, infra = infra, corr = corr, diffchar = diffchar, :
Cannot read TPL website.
Names in NCBI/GenBank commonly have a particularly notation that fixSpecies() currently does not handle well. Improvements are needed. Examples are:
E também:
without gbif data:
Error in check$species_status[id_authors & !is.na(check$species_status)] <- paste(check$species_status[id_authors & :
NAs não são permitidos em atribuições por subscritos
Além disso: Warning messages:
1: In [<-.factor
(*tmp*
, prob.ids, value = c("Blechnum polypodioides f. maius", :
invalid factor level, NA generated
2: In [<-.factor
(*tmp*
, !prob.ids, value = c("Telmatoblechnum serrulatum", :
invalid factor level, NA generated
devtools::check()
returns:
prepare_Rd: fixField.Rd:21-23: Dropping empty section \details
prepare_Rd: fixField.Rd:24-27: Dropping empty section \examples
prepare_Rd: formatCoord.Rd:12-14: Dropping empty section \description
checkRd: (5) formatCoord.Rd:0-15: Must have a \description
prepare_Rd: formatLoc.Rd:12-14: Dropping empty section \description
checkRd: (5) formatLoc.Rd:0-15: Must have a \description
prepare_Rd: missName.Rd:35-37: Dropping empty section \references
prepare_Rd: validateCoord.Rd:12-14: Dropping empty section \description
checkRd: (5) validateCoord.Rd:0-15: Must have a \description
prepare_Rd: validateTax.Rd:12-14: Dropping empty section \description
checkRd: (5) validateTax.Rd:0-15: Must have a \description
(To be discussed)
fixFields()
removes optional fields but this maybe should be optional too, even if the default is TRUE - what if the user wants to keep the optional fields for any reason? The warnings could be saved as metadata
colNumber()
expectations and functioning are not always clear. In the documentation, a "standardized notation" is mentioned, which is this standard notation?
fixName()
checks for ampersand and "e". ¿Why doesn't it check for "and", "et", "und", "y"?
in TWDG and maybe other functions that modify names, an initial tolower()
may be useful to put everything in lower case and then only at the end capitalize - otherwise you have to list everything (Van, van, Der, der) - I tried to make it work but capName()
does not capitalize initials correctly so I didn't apply any change
the separator "," in TDWGNames()
should not need to have a space. ";" should do the trick. the space should be put internally - I solved this the wrong way, with a paste0.
capName()
could be replaced by stringr::str_to_title()
It works identically but has the same problem as capName()
, it does not capitalize initials correctly. btw initials are always without space, is this standard mandatory? initials with space work fine in both functions.
stringr::str_to_title()
is still faster but I decided to keep capName() to avoid new depencies. Marked as solved for now. Why is formatName()
executed before TWDGName if it requires TWDG format? Also, it works with first names, not only initials.
in general the ideal standardized outputs should be described because DarwinCore documentation is a nightmare (ironic)
getAdmin()
has this error message: "input object needs to have a column loc.correct with the locality strings" this should be explained in the documentation. The fact that the name format must be in this "paraguay_paraguari" way needs to be explicit too.
regarding documentation in general not every function needs to be exported, or maybe similar functions can be documented in the same article, so that the person can see their similarities and differences. I am thinking this with all the string formatting inside formatOcc()
prepLoc()
: "de la", "del", "du" are missing (lines 37-39)
prepCoord()
: possible problems in decimal separator should not happen at this moment but we need an example. I didn't check the math but maybe using package measurements could be a good one
the data maps projection and datum is WGS84 but sf does not understand it as 4326. not a huge problem, i think.
the decimalDegree transformation will force us to assume that everything is in the same datum (for validation) - for now there will be no projection
The Cardiospermum (Sapindaceae) had another erros in the formatDwc function: including and not including the option "drop = TRUE", that descart columns that are not congruent beteween the databases:
occs_splink <- rspeciesLink(filename = "Cardiospermum_teste_splink.txt", save = TRUE, basisOfRecord = 'PreservedSpecimen', species = "Cardiospermum")
occs_gbif <- rgbif2(filename = "Cardiospermum_teste_gbif.txt", species = "Cardiospermum", n.records = 110000, force = TRUE, save = TRUE)
occs <- formatDwc(splink_data = occs_splink, gbif_data = occs_gbif, drop = TRUE)
occs <- formatDwc(splink_data = occs_splink, gbif_data = occs_gbif)
Error: Can't combine gbif$coordinatePrecision
and speciesLink$coordinatePrecision
.
Run rlang::last_error()
to see where the error occurred.
In addition: Warning messages:
1: some columns in splink_data do not follow the speciesLink pattern
2: some columns in gbif_data does not follow the gbif pattern!
After, run this code to see more details and it appeared:
rlang::last_error()
<error/vctrs_error_incompatible_type>
Can't combinegbif$coordinatePrecision
andspeciesLink$coordinatePrecision
.
Backtrace:
as of 2021-02-24:
formatDwc
Error: Can't combine `gbif$dateIdentified` <datetime<UTC>> and `speciesLink$dateIdentified` <character>.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In formatDwc(splink_data = occs_splink, gbif_data = occs_gbif) :
some columns in gbif_data do not follow the gbif pattern
Called from: signal_abort(cnd)
formatOcc:
`Warning message:
In last.name[gen.suf] <- gsub("\\s[A-Zà-ýÀ-Ý]\\.", "", name[gen.suf], :
number of items to replace is not a multiple of replacement length`
validateCoords [isto já foi]
Error in checkBorders(x1, geo.check = "geo.check", country.shape = country.shape, :
unused arguments (geo.check = "geo.check", country.shape = country.shape, country.gazetteer = country.gazetteer, output = output)
validateTax bad format names
Identifier | Records |
---|---|
(mo, R.L. | 148 |
Dias, M.C. | 141 |
(mo, R.L.L. | 140 |
(aau, S.L. | 112 |
validateDup
Warning message:
To merge geographic info, the input data must contain the following column(s): geo.check. Skipping 'geo' from info to merge.
Opa,
Geralmente envio e-mails direto ao Renato, mas talvez por aqui seja mais adequado. (escreverei em pt-br; caso prefiram em inglês, por motivos de acessibilidade, posso reescrever depois — e também para as próximas vezes)
Estou tendo um problema no formatDwc()
quando utilizo dados advindos da função rspeciesLink()
e readData()
(em um .zip que baixei direto do gbif).
Segue o código
#GBIF points for SC (Tracheophyta; GBIF)
plants_gbif <- readData(file = 'gbif_plants.zip', path = 'C:/Users/Master/OneDrive - FURB/Mestrado')
#Selecting only the first list
plants_gbif <- plants_gbif$occurrence
#INCT points for all SC plants
plants_inct <- rspeciesLink(Scope = "plants",
basisOfRecord = "PreservedSpecimen",
Synonyms = "flora2020",
stateProvince = "Santa Catarina")
plants_inct2 <- plants_inct
#Preparing input data in the correct format
occs <- formatDwc(gbif_data = plants_gbif,
splink_data = plants_inct2,
drop = TRUE)
Isso me retorna o seguinte erro:
Erro: Can't combine gbif$eventDate <datetime> and speciesLink$eventDate .
Abaixo o rlang::last_trace():
> rlang::last_trace()
<error/vctrs_error_incompatible_type>
Can't combine `gbif$eventDate` <datetime<UTC>> and `speciesLink$eventDate` <character>.
Backtrace:
x
1. \-plantR::formatDwc(...)
2. \-dplyr::bind_rows(res_list, .id = "data_source")
3. \-vctrs::vec_rbind(!!!dots, .names_to = .id)
4. \-(function () ...
5. \-vctrs::vec_default_ptype2(...)
6. \-vctrs::stop_incompatible_type(...)
7. \-vctrs:::stop_incompatible(...)
8. \-vctrs:::stop_vctrs(...)
Se eu não utilizar o readData()
, e sim o rgbif2()
direto, o erro não rola para um exemplo menor (não cheguei a testar com todos os meus dados porque demoraria um pouco para baixar todas as espécies no momento — mas posso botar para rodar aqui caso necessário. No entanto, como os dados do GBIF vieram direto do .zip pelo readData(), teoricamente, são iguais aos obtidos pela função rgbif2
, certo? Uma outra pequena diferença entre os dados do GBIF e do SpeciesLink é que no GBIF peguei apenas plantas vasculares, enquanto no SpeciesLink qualquer tipo de planta — não sei se essa informação será útil, mas vai que serve de algo.
Quanto às colunas obtidas em cada database, de acordo com as funções que usei acima:
Colunas dos dados do GBIF obtidos pela função readData()
[1] "gbifID" "abstract" "accessRights"
[4] "accrualMethod" "accrualPeriodicity" "accrualPolicy"
[7] "alternative" "audience" "available"
[10] "bibliographicCitation" "conformsTo" "contributor"
[13] "coverage" "created" "creator"
[16] "date" "dateAccepted" "dateCopyrighted"
[19] "dateSubmitted" "description" "educationLevel"
[22] "extent" "format" "hasFormat"
[25] "hasPart" "hasVersion" "identifier"
[28] "instructionalMethod" "isFormatOf" "isPartOf"
[31] "isReferencedBy" "isReplacedBy" "isRequiredBy"
[34] "isVersionOf" "issued" "language"
[37] "license" "mediator" "medium"
[40] "modified" "provenance" "publisher"
[43] "references" "relation" "replaces"
[46] "requires" "rights" "rightsHolder"
[49] "source" "spatial" "subject"
[52] "tableOfContents" "temporal" "title"
[55] "type" "valid" "institutionID"
[58] "collectionID" "datasetID" "institutionCode"
[61] "collectionCode" "datasetName" "ownerInstitutionCode"
[64] "basisOfRecord" "informationWithheld" "dataGeneralizations"
[67] "dynamicProperties" "occurrenceID" "catalogNumber"
[70] "recordNumber" "recordedBy" "individualCount"
[73] "organismQuantity" "organismQuantityType" "sex"
[76] "lifeStage" "reproductiveCondition" "behavior"
[79] "establishmentMeans" "occurrenceStatus" "preparations"
[82] "disposition" "associatedReferences" "associatedSequences"
[85] "associatedTaxa" "otherCatalogNumbers" "occurrenceRemarks"
[88] "organismID" "organismName" "organismScope"
[91] "associatedOccurrences" "associatedOrganisms" "previousIdentifications"
[94] "organismRemarks" "materialSampleID" "eventID"
[97] "parentEventID" "fieldNumber" "eventDate"
[100] "eventTime" "startDayOfYear" "endDayOfYear"
[103] "year" "month" "day"
[106] "verbatimEventDate" "habitat" "samplingProtocol"
[109] "samplingEffort" "sampleSizeValue" "sampleSizeUnit"
[112] "fieldNotes" "eventRemarks" "locationID"
[115] "higherGeographyID" "higherGeography" "continent"
[118] "waterBody" "islandGroup" "island"
[121] "countryCode" "stateProvince" "county"
[124] "municipality" "locality" "verbatimLocality"
[127] "verbatimElevation" "verbatimDepth" "minimumDistanceAboveSurfaceInMeters"
[130] "maximumDistanceAboveSurfaceInMeters" "locationAccordingTo" "locationRemarks"
[133] "decimalLatitude" "decimalLongitude" "coordinateUncertaintyInMeters"
[136] "coordinatePrecision" "pointRadiusSpatialFit" "verbatimCoordinateSystem"
[139] "verbatimSRS" "footprintWKT" "footprintSRS"
[142] "footprintSpatialFit" "georeferencedBy" "georeferencedDate"
[145] "georeferenceProtocol" "georeferenceSources" "georeferenceVerificationStatus"
[148] "georeferenceRemarks" "geologicalContextID" "earliestEonOrLowestEonothem"
[151] "latestEonOrHighestEonothem" "earliestEraOrLowestErathem" "latestEraOrHighestErathem"
[154] "earliestPeriodOrLowestSystem" "latestPeriodOrHighestSystem" "earliestEpochOrLowestSeries"
[157] "latestEpochOrHighestSeries" "earliestAgeOrLowestStage" "latestAgeOrHighestStage"
[160] "lowestBiostratigraphicZone" "highestBiostratigraphicZone" "lithostratigraphicTerms"
[163] "group" "formation" "member"
[166] "bed" "identificationID" "identificationQualifier"
[169] "typeStatus" "identifiedBy" "dateIdentified"
[172] "identificationReferences" "identificationVerificationStatus" "identificationRemarks"
[175] "taxonID" "scientificNameID" "acceptedNameUsageID"
[178] "parentNameUsageID" "originalNameUsageID" "nameAccordingToID"
[181] "namePublishedInID" "taxonConceptID" "scientificName"
[184] "acceptedNameUsage" "parentNameUsage" "originalNameUsage"
[187] "nameAccordingTo" "namePublishedIn" "namePublishedInYear"
[190] "higherClassification" "kingdom" "phylum"
[193] "class" "order" "family"
[196] "genus" "subgenus" "specificEpithet"
[199] "infraspecificEpithet" "taxonRank" "verbatimTaxonRank"
[202] "vernacularName" "nomenclaturalCode" "taxonomicStatus"
[205] "nomenclaturalStatus" "taxonRemarks" "datasetKey"
[208] "publishingCountry" "lastInterpreted" "elevation"
[211] "elevationAccuracy" "depth" "depthAccuracy"
[214] "distanceAboveSurface" "distanceAboveSurfaceAccuracy" "issue"
[217] "mediaType" "hasCoordinate" "hasGeospatialIssues"
[220] "taxonKey" "acceptedTaxonKey" "kingdomKey"
[223] "phylumKey" "classKey" "orderKey"
[226] "familyKey" "genusKey" "subgenusKey"
[229] "speciesKey" "species" "genericName"
[232] "acceptedScientificName" "verbatimScientificName" "typifiedName"
[235] "protocol" "lastParsed" "lastCrawled"
[238] "repatriated" "relativeOrganismQuantity" "recordedByID"
[241] "identifiedByID" "level0Gid" "level0Name"
[244] "level1Gid" "level1Name" "level2Gid"
[247] "level2Name" "level3Gid" "level3Name"
[250] "iucnRedListCategory" "associatedMedia" "country"
[253] "minimumElevationInMeters" "maximumElevationInMeters" "minimumDepthInMeters"
[256] "maximumDepthInMeters" "geodeticDatum" "verbatimCoordinates"
[259] "verbatimLatitude" "verbatimLongitude" "scientificNameAuthorship"
Colunas dos dados do INCT obtidos pela função rspeciesLink()
[1] "record_id" "modified" "institutionCode" "collectionCode"
[5] "catalogNumber" "basisOfRecord" "kingdom" "family"
[9] "genus" "specificEpithet" "scientificName" "scientificNameAuthorship"
[13] "identifiedBy" "recordedBy" "year" "month"
[17] "day" "country" "stateProvince" "county"
[21] "locality" "decimalLongitude" "decimalLatitude" "verbatimLongitude"
[25] "verbatimLatitude" "minimumElevationInMeters" "occurrenceRemarks" "barcode"
[29] "imagecode" "recordNumber" "maximumElevationInMeters" "infraspecificEpithet"
[33] "typeStatus" "coordinatePrecision" "geoFlag" "phylum"
[37] "order" "yearIdentified" "monthIdentified" "individualCount"
[41] "class" "dayIdentified" "continentOcean" "preparationType"
[45] "previousCatalogNumber" "relatedCatalogItem" "fieldNumber" "minimumDepthInMeters"
[49] "maximumDepthInMeters" "sex"
Agradeço desde já.
Codes from the package module on data and validation summaries and species checklist generation
summData()
summFlags()
checklist()
exportData()
Tentando ver o que foi modificado no workflow de validação geográfica, mensagens como "fixing minor bugs" mandam a mensagem de que tem bugs mas não dizem quais bugs eram esses ou aonde.
Se não forem bugs (o workflow estava funcionando) mas adições devido a causas específicas e mudanças no jeito de escrever código, essas mudanças deveriam ser compreensíveis para qualquer pessoa que veja uma modificação pontual num arquivo.
When running devtools::check()
we get this warning
Found the following files with non-ASCII characters:
colNumber.R
fixName.R
getYear.R
Portable packages must use only ASCII characters in their R code,
except perhaps in comments.
Use \uxxxx escapes for other characters.
We need to check all these functions to not use non ASCII and maybe use textclean
package
Olá, estou com um problema na leitura do plantR.
install_github("LimaRAF/plantR")
library("plantR")
library("plantR")
Error in library("plantR") : there is no package called ‘plantR’
@saramortara and @AndreaSanchezTapia,
I have been working on something to get the full description of species (including the reference where the species was published) for final reporting of the species included in my analyses. Since flora
does not do it, I had to do it in Tropicos... But Tropicos does not suggest the valid names in case of synonyms, so you need first to get the valid names using flora::get.tax
and then use taxize::get_tpsid
to get species info.
However, he DwC files from flora has the info on species references (extension data table 'Reference'). And then I realized that thre is also a table TypesAndSpecimen!!! This info is gold and it would be great to validate the taxonomy of the occurrences and could be added to plantR::validateTax
...
I was thus thinking of implementing it here within plantR
, but I wanted to talk with you first. I know that you have talked with Gustavo to do upgrades on flora
for other reasons. Do you think it is better to propose him a collaboration directly in flora
codes (this would require using tables currently not loaded from flora) or to include it all in plantR
and then use ??
If checkBorders()
will stop when geo.check
, country.shape
, country.gazetteer
are not present, checkCoord()
must return these columns by default.
Codes from the package module on search and merge of duplicated specimens
prepDup()
getDup()
mergeDup()
rmDup()
@saramortara @AndreaSanchezTapia
Tive relatos de dificuldades para instalar o plantR no windows. Primeiro foi por causa de um erro com a instalação do 'gtable', depois 'munsell', etc... No fim, a pessoa não conseguiu instalar o plantR. Independente da versão dela do R ou da versão dos pacotes que ela tem instalado e pensando no perfil de usuários que queremos atingir, isso é um problema que muitos poderão ter.
Independente da solução, a origem do problema é, ao meu ver, que temos muitas dependências e dependências recursivas. Além desses problemas de instalação, "dependencies are invitations for other people to break your package" (algumas leitures legais no assunto aqui e aqui).
Ao meu ver a solução é eliminar progressivamente as dependências. Tentei ir fazendo isso, criando funções acessórias e internas alternativas (e.g. rmLatin) ou cópias locais das funções que estão em outros pacotes, que tem um monte de funções mas nós mesmo estmos usando só uma (vários exemplos em acessory_geo.R).
Fiz um código para avaliar as dependências (diretas e recursivas) dos pacotes que temos hoje (19/3/21) no DESCRIPTION. Minhas sugestões para eliminar dependências (pacotes que dependem de muitos outros pacotes ou que usamos pouco) são (em ordem decrescente de prioridade):
Imports:
The formatDwc() function either for data downloaded directly with the PlantR script or for data downloaded directly from Gbif and Species_link encountered the following error
occs_splink <- rspeciesLink(filename = "Lophostigma_teste_splink.txt", save = TRUE, basisOfRecord = 'PreservedSpecimen', species = "Lophostigma")
occs_gbif <- rgbif2(filename = "Lophostigma_teste_gbif.txt", species = "Lophostigma Radlk.", n.records = 110000, force = TRUE, save = TRUE)
occs <- formatDwc(splink_data = occs_splink, gbif_data = occs_gbif, drop = TRUE)
Error in [.data.frame
(splink_data, , c("yearIdentified", "monthIdentified", :
undefined columns selected
In addition: Warning message:
some columns in splink_data do not follow the speciesLink pattern
@saramortara @AndreaSanchezTapia
Ao usar rgbif2() com "Euterpe edulis", tudo ok. Ao usar rgbif2() com "Trema micrantha" (ou "Casearia sylvestris" ou ambas), encontrei o seguinte erro:
rgbif2(species = "Trema micrantha", save = FALSE)
Making request to GBIF...
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 0, 1
Alguma idéia do que pode estar acontecendo?
Testendo o tutorial para família funcionou, mas não para Sapindaceae. Acho que acaba sendo um problema porque tem mais de 2,5 milhões de registros no GBIF para Sapindaceae. No exemplo tem filtro para país. Rodamos das duas formas abaixo. Para coletar as informações do specieslink, funcinou. Para o GBIF, não:
familia <- "Sapindaceae"
occs_splink <- rspeciesLink(family = familia) #It's working for specieslink
occs_gbif <- rgbif2(species = familia,
n.records = 2600000) #It's not working for gbif. Maybe the number of records?
Error in names(gbif_data) <- species :
'names' attribute [1] must be the same length as the vector [0]
occs_gbif <- rgbif2(species = familia,
country = "BR",
n.records = 450000) #It's not working, even using the same filter used to do the tutorial
Error in names(gbif_data) <- species :
'names' attribute [1] must be the same length as the vector [0]
Check which function needs fixing to solve the issue below:
plantR::prepName("Alvares, Roberto")
[1] "Alvares, R." # ok
plantR::prepName("Alvares , Roberto")
[1] "Roberto, A." # inverted
The user doesn't have a way to know we force things like timor leste or south korea. We should be the ones following current conventions.
This breaks shares_borders using world or spData, either way this should be controlled. Ex. "cote d'ivoire" is returned as "cote ivoire". The apostrophe is ascii so I don't see the necessity to remove it and on the contrary, it breaks things.
Actually we should return standard country names in our objects, that can be recognized by other packages. Either country_codes standards or spData standards. communication will be way easier.
Check function code related to the problem below:
plantR::prepTDWG("Maria Souza da Silva", get.initials = FALSE, get.prep = TRUE)
[1] "Silva, Maria Souza da" # ok
plantR::prepTDWG("Maria Souza da Silva", get.initials = FALSE, get.prep = TRUE, format = "init_last")
[1] "Maria Souza da Silva" # ok
plantR::prepTDWG("Maria da Silva Souza", get.initials = FALSE, get.prep = TRUE)
[1] "Souza, Maria Silva" # preposition removed!
plantR::prepTDWG("Maria da Silva Souza", get.initials = FALSE, get.prep = TRUE, format = "init_last")
[1] "Maria Silva Souza" # preposition removed!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.