ropensci / bold Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 11.0 11.28 MB

Interface to the Bold Systems barcode webservice

Home Page: https://docs.ropensci.org/bold

License: Other

Makefile 0.66% R 99.34%

api-wrapper barcodes r r-package rstats sequences taxize

bold's Introduction

rOpenSci

This repository has been archived. The former README is now in README-NOT.md.

bold's People

Contributors

Stargazers

Watchers

Forkers

fmichonneau alexxnica katieroserice graceli8 heibl jorisdhondt jdeck88 ghostsofhiroshima lung1984 salix-d cjfields

bold's Issues

Database description switcheroo in bold_identify documentation

The descriptions of these two databases in the documentation for bold_identify() seem to have been switched around, compared to the boldsystems website.

COX1 Every COI barcode record with a species level identification and a minimum sequence length of 500bp. This includes many species represented by only one or two specimens as well as all species with interim taxonomy.
COX1_SPECIES Every COI barcode record on BOLD with a minimum sequence length of 500bp (warning: unvalidated library and includes records without species level identification). This includes many species represented by only one or two specimens as well as all species with interim taxonomy. This search only returns a list of the nearest matches and does not provide a probability of placement to a taxon.

Add user seqID to bold_identify results

I am trying to batch process sequences with bold_identify and then write results to xlsx. I have completed this task but I lose the original sequence IDs when only the sequences are passed to bold_identify. Would it be possible to add an option to include those when bold_identify returns data from the BOLD API?

    > output <- bold_identify (mydata$seqs, db = "COX1", response=FALSE)
    > out20 <- lapply(output, head, n=20)
    > outframe <- do.call("rbind", lapply(out20, data.frame))
    > write.xlsx (outframe, "outframe.xlsx")

This bold package is great as I prefer denovo clustering with post hoc taxonomy assignment over clustering with a reference database.

Thank you for any help with this, Tim

add more tests

Import all non-base R fxns

see https://cran.rstudio.com/web/checks/check_results_bold.html

via email, user reports bug in bold_identify

he says apparently due to xml2

link to taxize book in readme/vignette for taxonomic cenetered use cases

Add function for the seq matching API

http://www.boldsystems.org/index.php/resources/api?type=idengine

bold_tax_name Error in CRAN version

Dear Scott!

Do you plan push the dev version to CRAN some time soon? Just wondering as bold_tax_name() in CRAN (bold v.0.2.0) has this behavior on some queries not found on boldsystems (see below). The CRAN and github code was restructured so I couldn't spot the bug fast enough (hence this post without a solution):

> bold_tax_name('Heteroptera')
Error in `[.data.frame`(data, , vars, drop = FALSE) : 
  undefined columns selected

Compared with (expected):

> bold_tax_name('Heteropter')
       input
1 Heteropter
> bold_tax_name('xHeteroptera')
         input
1 xHeteroptera
> bold_tax_name('HeteropteraX')
         input
1 HeteropteraX

Cheers
Johan

Add bold taxonomy API functions

related in taxize: ropensci/taxize#301

Package does not load with install_github

Fix this...

Fungi/plant identification APIs

BOLD responded on 2015-01-31:

We don’t yet have web services for plant and fungi identification but will be making them available within the next 6 months.

keep an eye out for this...ping peggy from email once in the pkg

dual set of parents with genus Ormosia

When i bold_identify() and bold_identify_parents() the following sequence, i get two sets of parents, in two sets of columns. The reason is that the genus name for this sequence (Ormosia) is used in both plants and insects. This makes the identification ambiguous (although I know in this case that the sequence is from an insect). Not sure how to fix this except that the plant and insect Ormosia id numbers are different. thus, i'm not sure how bold_identify_parents could get this wrong?

GMGLM411_13
ACTTTATATTTTATTTTTGGGGCATGAGCGGGTATAGTAGGAACTTCCCTAAGAATTTTAATTCGAGCAGAGCTTGGACACCCAGGAGCATTAATTGGTAATGATCAAATTTATAATGTAATTGTTACCGCTCATGCTTTTGTTATAATTTTTTTTATAGTAATACCAATTATAATTGGAGGATTTGGAAATTGATTAGTACCCCTAATATTAGGGGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGATTATTGCCCCCTTCTCTTACTCTTCTTTTAGCAAGTAGTTTAATTGAAAACGGGGCTGGAACAGGTTGAACAGTATATCCCCCGCTATCAGCAGGGATTGCTCATGCCGGAGCTTCAGTTGATTTAGCTATTTTTTCTCTTCATTTAGCAGGAGTTTCTTCAATTTTAGGAGCTGTAAATTTTATTACTACAGTAATTAATATACGATCAACAGGAATTACTTTTGATCGTATACCTTTATTTGTTTGAGCTGTAATTATTACTGCTGTTTTATTATTATTATCTCTCCCAGTTTTAGCAGGAGCTATTACTATACTATTAACAGATCGAAATTTTAATACATCATTTTTTGATCCTGCAGGAGGAGGAGACCCTATTTTATATCAACACTTA

bold_seqspec function gives a mixed set of markers when using marker argument

Session Info

Session info -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.1 (2017-06-30)
 system   x86_64, darwin15.6.0        
 ui       AQUA                        
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/New_York            
 date     2017-10-16                  

Packages -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package       * version  date       source                               
 ade4            1.7-6    2017-03-23 CRAN (R 3.4.0)                       
 ape           * 4.1      2017-02-14 CRAN (R 3.4.0)                       
 base          * 3.4.1    2017-07-07 local                                
 BiocGenerics  * 0.22.1   2017-10-07 Bioconductor                         
 BiocInstaller * 1.26.1   2017-09-01 Bioconductor                         
 bold          * 0.5.0    2017-07-21 CRAN (R 3.4.1)                       
 colorspace      1.3-2    2016-12-14 CRAN (R 3.4.0)                       
 compiler        3.4.1    2017-07-07 local                                
 crayon          1.3.2    2016-06-28 CRAN (R 3.4.0)                       
 crul            0.3.8    2017-06-15 CRAN (R 3.4.0)                       
 curl            2.8.1    2017-07-21 CRAN (R 3.4.1)                       
 datasets      * 3.4.1    2017-07-07 local                                
 datelife      * 0.2.13   2017-10-13 Github (phylotastic/datelife@ae43b8f)
 devtools        1.13.3   2017-08-02 CRAN (R 3.4.1)                       
 digest          0.6.12   2017-01-27 CRAN (R 3.4.0)                       
 fastmatch       1.1-0    2017-01-28 CRAN (R 3.4.0)                       
 git2r           0.19.0   2017-07-19 CRAN (R 3.4.1)                       
 graphics      * 3.4.1    2017-07-07 local                                
 grDevices     * 3.4.1    2017-07-07 local                                
 grid            3.4.1    2017-07-07 local                                
 httr            1.3.1    2017-08-20 cran (@1.3.1)                        
 igraph          1.1.2    2017-07-21 CRAN (R 3.4.1)                       
 ips             0.0-7    2014-11-10 CRAN (R 3.4.0)                       
 IRanges       * 2.10.5   2017-10-08 Bioconductor                         
 jsonlite        1.5      2017-06-01 CRAN (R 3.4.0)                       
 lattice         0.20-35  2017-03-25 CRAN (R 3.4.1)                       
 magrittr        1.5      2014-11-22 CRAN (R 3.4.0)                       
 Matrix          1.2-10   2017-05-03 CRAN (R 3.4.1)                       
 memoise         1.1.0    2017-04-21 CRAN (R 3.4.0)                       
 methods       * 3.4.1    2017-07-07 local                                
 nlme            3.1-131  2017-02-06 CRAN (R 3.4.1)                       
 parallel      * 3.4.1    2017-07-07 local                                
 phangorn        2.2.0    2017-04-03 CRAN (R 3.4.0)                       
 pkgconfig       2.0.1    2017-03-21 CRAN (R 3.4.0)                       
 plyr            1.8.4    2016-06-08 CRAN (R 3.4.0)                       
 quadprog        1.5-5    2013-04-17 CRAN (R 3.4.0)                       
 R6              2.2.2    2017-06-17 CRAN (R 3.4.0)                       
 Rcpp            0.12.12  2017-07-15 CRAN (R 3.4.1)                       
 reshape         0.8.6    2016-10-21 CRAN (R 3.4.0)                       
 S4Vectors     * 0.14.7   2017-10-08 Bioconductor                         
 seqinr        * 3.4-5    2017-08-01 CRAN (R 3.4.1)                       
 stats         * 3.4.1    2017-07-07 local                                
 stats4        * 3.4.1    2017-07-07 local                                
 stringi         1.1.5    2017-04-07 CRAN (R 3.4.0)                       
 stringr         1.2.0    2017-02-18 CRAN (R 3.4.0)                       
 testthat      * 1.0.2    2016-04-23 CRAN (R 3.4.0)                       
 tools           3.4.1    2017-07-07 local                                
 triebeard       0.3.0    2016-08-04 CRAN (R 3.4.0)                       
 urltools        1.6.0    2016-10-17 CRAN (R 3.4.0)                       
 utils         * 3.4.1    2017-07-07 local                                
 withr           2.0.0    2017-07-28 CRAN (R 3.4.1)                       
 XML           * 3.98-1.9 2017-06-19 CRAN (R 3.4.1)                       
 xml2            1.1.1    2017-01-24 CRAN (R 3.4.0)

Hi! I've been using bold::bold_seqspec() to search for plant and fungi markers. There appears to be an error with the marker argument, since it will output different types of markers for a single marker query:

library(bold)
res <- bold_seqspec(taxon="Arabidopsis", marker="rbcL")
res$markercode

[1] "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" [27] "rbcL" "rbcL" "matK" "rbcL" "matK" "rbcL" "rbcL" "rbcL" "matK" "rbcL" "rbcL" "matK" "rbcL" "matK" "matK" "rbcL"

And searching for these markers with blast shows that they correspond to the gene specified in $markercode:
which(res$markercode=="rbcL") is rbcL in blast
which(res$markercode=="matK") is matK in blast

res2 <- bold_seqspec(taxon="Arabidopsis", marker=c("ITS2"))
res2$markercode we get a wide mixture of different markers
[1] "ITS2" "rbcLa" "rbcLa" "ITS2" "ITS2" "rbcLa" "rbcLa" "COI-5P" "ITS2" "rbcLa" "ITS2" "rbcLa" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" [21] "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "rbcLa" "matK" "ITS2" "rbcLa" "ITS2" "matK" [41] "rbcLa" "ITS2" "matK" "ITS2" "rbcLa" "ITS2" "matK" "rbcLa" "rbcLa" "ITS2"

res3 <- bold_seqspec(taxon="Arabidopsis", marker=c("matK")) # the same problem res3$markercode
[1] "rbcLa" "matK" "matK" "rbcLa" "rbcLa" "matK" "matK" "rbcLa" "matK" "rbcLa" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" [24] "matK" "matK" "matK" "matK" "matK" "matK" "rbcL" "matK" "matK" "matK" "rbcL" "rbcLa" "ITS2" "matK" "ITS2" "rbcLa" "matK" "matK" "rbcLa" "ITS2" "matK" "rbcLa" "ITS2" [47] "matK" "rbcL" "rbcL" "matK" "matK" "rbcL" "rbcL" "matK" "rbcLa" "matK"

replace httr with crul

Remove methods/stats/utils imports and namespace those calls

Add parentnames to output of bold_identify()

Hi there,

When I use bold_identify, I get the lowest level taxonomic identification for that sequence (taxonomicidentification field), but it would be very useful if we could get the parentnames for that identification. The bold APIs do provide this information if i use 3 different bold package commands (see below), but i now would have to do some programming in R (not my strength) to insert the parentnames into the bold_identify output table. It seems to me that this would be better done within the bold package, if you fancy it.

also, maybe i have missed something, but i think it would be nicer to get parentnames from a taxid, not a taxonomicidentification field, given the (small) possibility for ambiguity.

thanks,
doug

library(bold)
library(plyr)
testseq <- list(eb4909 = "GAATAAATAATATAAGATTTTGATTACTCCCTCCTTCTTTATTtttATTAATTTTAAGAAATTTTATTGGAACGGGTGTAGGAACCGGATGAACTTTATATCCTCCTTTATCATCTATTGTTGGACATGATTCACCTTCTGTAGATTTAGGAATTttttCTATCCATATTGCTGGAATTTCCTCAATTATAGGATCAATTAATTTTATTGTTACTATTTTAAATATACacacaAaaaCTCATTCACTAAATTTTCTTCCTTTATTCACATGATCAATTTTAATTACAGCAATTCTTCTTCTGTTATCATTACCAGTTCTTGCAGGAGCAATTACTATACTTCTTACAGATCGAAATCTTAATACATCTTtttttGATCCCGCAGGTGGgggggATCCAATTTTATACCAACACTTATTTT")
boldoutput_public <- bold_identify(testseq, db="COX1_SPECIES_PUBLIC")
boldoutput_public.df <- ldply(boldoutput_public, data.frame)
boldoutput_public_tax_name <- bold_tax_name(name=boldoutput_public.df[3,6]) # for a particular identification (third row)
boldoutput_public_tax_name.parents <- bold_tax_id(id=boldoutput_public_tax_name$taxid, includeTree = TRUE)

Change callopts param to ... throughout all fxns

switch XML to xml2

private data API

will be coming next year probably, keep eye out on the api docs

bug in bold_identify_parents?

Hello,

bold_identify_parents works fine for me for now, but one sequence does produce an error.

When running:

hittable <- bold_identify("AACGTTATATTTTATTTTTGGAGCATGATCAGGAATAGTAGGAACTTCTTTAAGAATTTTAATTCGAGCTGAATTAGGTCACCCTGGAACATTAATTGGAGATGACCAAATTTATAATGTTATTGTTACAGCACATGCTTTTGTTATAATTTTTTTTATAGTTATACCAATTTTAATT", db="COX1")`
hittable <- bold_identify_parents(hittable, wide=T)

I get the following error:

Error in apply(h, 1, function(x) { : dim(X) must have a positive length

in the bold_identify_parents command. This happens every time I try to run it with this sequence. Could you please look into this? Thanks = )

here also my R session info:

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] JAMP_0.1     bold_0.4.0   seqinr_3.3-3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8       reshape_0.8.6     assertthat_0.1    R6_2.2.0         
 [5] plyr_1.8.4        jsonlite_1.2      magrittr_1.5      httr_1.2.1       
 [9] stringi_1.1.2     curl_2.3          data.table_1.10.0 xml2_1.1.0       
[13] tools_3.3.2       stringr_1.1.0     ade4_1.7-5

bold_specimens returns no records with format="tsv" for taxa with large numbers of records

Hi @sckott, thanks for fixing issue #46 so promptly, the table format seems fine now.

I have run into another issue when using bold_specimens to retrieve info in tsv format for some taxa. I suspect this is a problem for taxa with large numbers of records only. The command returns a (kind of) NA value and no records and does not generate an error message. xml format seems to return okay.
For example:

> bold_specimen_records <- bold::bold_specimens(taxon="Carabidae", format="tsv")
> bold_specimen_records
[1] NA.
<0 rows> (or 0-length row.names)

Whereas it works fine for a taxon with not many sequence records:

> bold_specimen_records <- bold::bold_specimens(taxon="Athericidae", format="tsv")
> bold_specimen_records[1:5,1:5]
     processid   sampleid recordID   catalognum fieldnum
1   BYRN013-12   Byrn12E1  2895459                ByrnE1
2 GBDP14660-13   KC592679  3662005      PLIM180         
3 GBDP14675-13   KC592664  3662020     PEET1005         
4 GBDP15865-15   KM243490  5649052 NCSU08021521 KM243490
5  MHTAB099-09 ACTDpON10A  1085756              DpONT10A

Session Info

Session info ---------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  en_AU.UTF-8                 
 tz       Australia/Melbourne         
 date     2017-07-21                  

Packages -------------------------------------------------------------------------
 package       * version    date       source                        
 ape             4.1        2017-02-14 CRAN (R 3.4.0)                
 assertthat      0.2.0      2017-04-11 CRAN (R 3.4.0)                
 backports       1.1.0      2017-05-22 CRAN (R 3.4.0)                
 base          * 3.4.0      2017-04-21 local                         
 bindr           0.1        2016-11-13 cran (@0.1)                   
 bindrcpp        0.2        2017-06-17 cran (@0.2)                   
 bold            0.5.0      2017-07-21 Github (ropensci/bold@48a9fe9)
 codetools       0.2-15     2016-10-05 CRAN (R 3.4.0)                
 commonmark      1.2        2017-03-01 CRAN (R 3.4.0)                
 compiler        3.4.0      2017-04-21 local                         
 crayon          1.3.2      2016-06-28 CRAN (R 3.4.0)                
 crul            0.3.8      2017-06-15 cran (@0.3.8)                 
 curl            2.7        2017-06-26 cran (@2.7)                   
 data.table      1.10.4     2017-02-01 CRAN (R 3.4.0)                
 datasets      * 3.4.0      2017-04-21 local                         
 desc            1.1.0      2017-01-27 CRAN (R 3.4.0)                
 devtools      * 1.13.2     2017-06-02 CRAN (R 3.4.0)                
 digest          0.6.12     2017-01-27 CRAN (R 3.4.0)                
 dplyr           0.7.1      2017-06-22 CRAN (R 3.4.1)                
 foreach         1.4.3      2015-10-13 CRAN (R 3.4.0)                
 git2r           0.18.0     2017-01-01 CRAN (R 3.4.0)                
 glue            1.1.1      2017-06-21 cran (@1.1.1)                 
 graphics      * 3.4.0      2017-04-21 local                         
 grDevices     * 3.4.0      2017-04-21 local                         
 grid            3.4.0      2017-04-21 local                         
 httr            1.2.1      2016-07-03 CRAN (R 3.4.0)                
 iterators       1.0.8      2015-10-13 CRAN (R 3.4.0)                
 jsonlite        1.5        2017-06-01 cran (@1.5)                   
 lattice         0.20-35    2017-03-25 CRAN (R 3.4.0)                
 magrittr        1.5        2014-11-22 CRAN (R 3.4.0)                
 memoise         1.1.0      2017-04-21 CRAN (R 3.4.0)                
 metabarcodedb * 0.0.0.9000 <NA>       local                         
 methods       * 3.4.0      2017-04-21 local                         
 nlme            3.1-131    2017-02-06 CRAN (R 3.4.0)                
 parallel        3.4.0      2017-04-21 local                         
 pbapply       * 1.3-3      2017-07-04 CRAN (R 3.4.1)                
 pkgconfig       2.0.1      2017-03-21 cran (@2.0.1)                 
 plyr            1.8.4      2016-06-08 CRAN (R 3.4.0)                
 R6              2.2.2      2017-06-17 cran (@2.2.2)                 
 Rcpp            0.12.12    2017-07-15 CRAN (R 3.4.1)                
 rentrez         1.1.0      2017-06-01 cran (@1.1.0)                 
 reshape         0.8.6      2016-10-21 CRAN (R 3.4.0)                
 reshape2        1.4.2      2016-10-22 CRAN (R 3.4.0)                
 rlang           0.1.1      2017-05-18 cran (@0.1.1)                 
 roxygen2      * 6.0.1      2017-02-06 CRAN (R 3.4.0)                
 rprojroot       1.2        2017-01-16 CRAN (R 3.4.0)                
 stats         * 3.4.0      2017-04-21 local                         
 stringi         1.1.5      2017-04-07 CRAN (R 3.4.0)                
 stringr       * 1.2.0      2017-02-18 CRAN (R 3.4.0)                
 taxize          0.8.9      2017-07-11 CRAN (R 3.4.1)                
 testthat      * 1.0.2      2016-04-23 CRAN (R 3.4.0)                
 tibble          1.3.3      2017-05-28 cran (@1.3.3)                 
 tools           3.4.0      2017-04-21 local                         
 triebeard       0.3.0      2016-08-04 CRAN (R 3.4.0)                
 urltools        1.6.0      2016-10-17 CRAN (R 3.4.0)                
 utils         * 3.4.0      2017-04-21 local                         
 withr           1.0.2      2016-06-20 CRAN (R 3.4.0)                
 XML             3.98-1.9   2017-06-19 cran (@3.98-1.)               
 xml2            1.1.1      2017-01-24 CRAN (R 3.4.0)

Update vignette with new taxonomy fxns

New v4 beta APIs

http://v4.boldsystems.org/index.php/api_home

See what needs changing, and when to change

Error in bold_seq (str_split: subscript out of bounds)

Session Info

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bold_0.8.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19      rstudioapi_0.8    xml2_1.2.0        magrittr_1.5      usethis_1.4.0     devtools_2.0.1    pkgload_1.0.1    
 [8] R6_2.3.0          rlang_0.1.6       stringr_1.2.0     plyr_1.8.4        tools_3.3.2       pkgbuild_1.0.2    sessioninfo_1.1.0
[15] cli_1.0.1         withr_2.1.2       remotes_2.0.1     assertthat_0.2.0  digest_0.6.18     rprojroot_1.3-2   httpcode_0.2.0   
[22] crayon_1.3.4      processx_3.2.0    callr_3.0.0       base64enc_0.1-3   fs_1.2.6          ps_1.2.0          triebeard_0.3.0  
[29] crul_0.6.0        curl_3.2          testthat_2.0.1    memoise_1.1.0     glue_1.3.0        stringi_1.1.6     urltools_1.7.1   
[36] desc_1.2.0        backports_1.1.2   prettyunits_1.0.2 reshape_0.8.8     jsonlite_1.5

Hi, I am enjoying playing with this package. Thanks for developing it. When trying this command

bold_arth<-bold_seq(taxon = "Arthropoda")

I get the error

Error in str_split(str_replace(temp[[1]], "\n", "<<<"), "<<<")[[1]][[2]] : 
  subscript out of bounds

This seems to be a different issue to this resolved issue regarding long vectors with bold_seqspec. I see Arthropoda contains a lot of sequences, but it would still be great to be able to download everything in one go.

Any help would be appreciated.

long vectors not supported yet?

I'm very excited to use the bold R-package but have been running into trouble when trying to pull a large dataset together using the bold_seqspec script. Specifically, I'm trying to grab all COI-5P sequences from BOLD's database. That's a lot of data.

I started by installing the package and ran the following script successfully:

install.packages("bold")
library(bold)
df_tiny <- bold_seqspec(taxon="Echiura", marker="COI-5P")
write.table(df_tiny, file = "/home/R/euthria_bold_seqspec.txt", sep = "\t")

This creates a 53-column, 58-line text file. I've performed this task in both R-studio and from the command line (running Linux 3.13.0-85-generic, R version 3.3.0) successfully for the above script.

However, I wanted to be able to modify the script above to include the biggest group in one chunk - Arthropods - by running these commands:

df_large <- bold_seqspec(taxon="Arthropoda", marker="COI-5P")
write.table(df_large, file = "/home/R/arthropoda_bold_seqspec.txt", sep = "\t").

Unfortunately I get an error message:

Error in rawToChar(content(out, encoding = "UTF-8")) : 
  long vectors not supported yet: raw.c:68

I was under the impression that the relatively recent releases of R have enabled long vectors to be supported. Perhaps more to the point, I wasn't thinking that this was a particularly long vector in terms of columns, but perhaps it does exceed that 900,000 limit in terms of rows (there certainly are more than 900,000 rows in this dataset).

To that end, perhaps you can speak to the maximum number of entries that may be downloaded at once by these scripts (should one exist).

Thanks very much

Markers option not working

A minor bug ... selecting markers has no effect in bold_seqspec.
Not such a high priority, as markers other than COI can be discarded manually after the dataframe is downloaded, but took me a while to work out why my sequences would not align!

unique(bold_seqspec(taxon="Melanogrammus aeglefinus", marker="COI-5P")$markercode)

Add travis file

Add more tests

See if can fix can't install sangserseq pkg from bioconductor

see https://cran.rstudio.com/web/checks/check_results_bold.html

network error

bold_identify() is returning a network error (I think this has arisen only in the last few days). the BOLD website is accepting sequences for identification. I'm guessing this is really a BOLD website problem, but i thought i would flag it up.

  testseq <- list(eb4909 = "GAATAAATAATATAAGATTTTGATTACTCCCTCCTTCTTTATTtttATTAATTTTAAGAAATTTTATTGGAACGGGTGTAGGAACCGGATGAACTTTATATCCTCCTTTATCATCTATTGTTGGACATGATTCACCTTCTGTAGATTTAGGAATTttttCTATCCATATTGCTGGAATTTCCTCAATTATAGGATCAATTAATTTTATTGTTACTATTTTAAATATACacacaAaaaCTCATTCACTAAATTTTCTTCCTTTATTCACATGATCAATTTTAATTACAGCAATTCTTCTTCTGTTATCATTACCAGTTCTTGCAGGAGCAATTACTATACTTCTTACAGATCGAAATCTTAATACATCTTtttttGATCCCGCAGGTGGgggggATCCAATTTTATACCAACACTTATTTT")
  boldoutput <- bold_identify(testseq, db="COX1_SPECIES")

Error in FUN(X[[i]], ...) : Internal Server Error (HTTP 500).

Make vignette better

It's super basic right now, give more examples

Submit new version to CRAN once httr v0.4 is on CRAN

Don't run tests on CRAN

...

bug in internal parser

bold_seq: trim off returns

e.g., bold_seq(taxon='Coelioxys')[[1]]

$id
[1] "ABEE117-17"

$name
[1] "Coelioxys elongata"

$gene
[1] "ABEE117-17"

$sequence
[1] "------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TTATCATTATATACATATCATCCTTCCCCATCAGTTGATTTAGCAATTTTTTYTTTACATTTATCAGGAATTTYTTYTATTATCGGATCAATAAATTTTATTGTAACAATTTTAATAATAAAAAATTATTCAATAAATTATAATCAAATACCTTTATTTCCATGATCAATTTTAATTACTACAATTTTATTATTATTATCATTACCTGTATTAGCAGGAGCTATTACAATATTATTATTTGATCGTAATTTAAATTCATCATTTTTTGACCCAATAGGAGGAGGAGATCCTATTTTATATCAACATTTATTTTG------------------------------------\r"

UPdate pkg

Looks like api is working again, update code, etc.

Some parsing not working correctly

add slack integration

help files aren't available (0.4.0)

I get this message when trying to read the help files.

Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/bold/help/bold.rdb' is corrupt

readme xhtml

From kurt hornik

These have README.md files which when converted to (X)HTML using a
current version of pandoc show problems when validated using W3C Markup
Validator, see below.

Most of these problems are caused by using images without giving a name
(so the required alt attribute for <img> is not provided), or using <br>
instead of <br/>.

Pls fix these problems in your README.md files for your next release: in
all cases I inspected, the fixes were obvious and confirmation using
pandoc and W3C markup validator seemed unnecessary.

Please also visit your package check web page at http://cran.r-project.org/web/checks/check_results_PACKAGENAME.html to see if other problems need to be addressed as well.

bold_specimens returning poorly formatted data.frame with option format="tsv"

I am using bold::bold_specimens within a custom function to return specimen records for a given taxon in "tsv" format. I am pretty sure this worked fine in the past for me, however now the resulting data.frame is not well formatted. The header line partly wraps into the first row of the data.frame, resulting in a dodgy row 1 and extra 'NA' entries in other rows that shouldn't be there.

Example:

specimen_table <- bold::bold_specimens(taxon="Anaspididae", format="tsv")
specimen_table
    processid   sampleid           recordID     catalognum    fieldnum
1   image_ids image_urls copyright_licenses      trace_ids trace_links
2 GBCM0002-06   AF048821             468923                           
3                                                                     
4 GBCM0381-06   DQ310660             501348 HBLB 047 (BIO)            
5                                                                     
6  RBGC001-03   MaAna000               4901       MaAna000            
7                                                                     
                institution_storing            bin_uri phylum_taxID phylum_name  class_taxID
1                         run_dates sequencing_centers   directions seq_primers marker_codes
2          Mined from GenBank, NCBI       BOLD:AAF3961           20  Arthropoda           69
3                                                                                           
4          Mined from GenBank, NCBI       BOLD:AAF3962           20  Arthropoda           69
5                                                                                           
6 Biodiversity Institute of Ontario       BOLD:AAF3961           20  Arthropoda           69
7                                                                                           
    class_name order_taxID  order_name family_taxID family_name subfamily_taxID subfamily_name
1                       NA                       NA                          NA             NA
2 Malacostraca         352 Anaspidacea         1697 Anaspididae              NA             NA
3                       NA                       NA                          NA             NA
4 Malacostraca         352 Anaspidacea         1697 Anaspididae              NA             NA
5                       NA                       NA                          NA             NA
6 Malacostraca         352 Anaspidacea         1697 Anaspididae              NA             NA
7                       NA                       NA                          NA             NA
  genus_taxID genus_name species_taxID        species_name subspecies_taxID subspecies_name
1          NA                       NA                                   NA              NA
2        5694  Anaspides          8241 Anaspides tasmaniae               NA              NA
3          NA                       NA                                   NA              NA
4        5694  Anaspides          8241 Anaspides tasmaniae               NA              NA
5          NA                       NA                                   NA              NA
6        5694  Anaspides          8241 Anaspides tasmaniae               NA              NA
7          NA                       NA                                   NA              NA
  identification_provided_by voucher_type tissue_type collectors collectiondate lifestage sex
1                         NA           NA          NA         NA             NA        NA  NA
2                         NA           NA          NA         NA             NA        NA  NA
3                         NA           NA          NA         NA             NA        NA  NA
4                         NA           NA          NA         NA             NA        NA  NA
5                         NA           NA          NA         NA             NA        NA  NA
6                         NA           NA          NA         NA             NA        NA  NA
7                         NA           NA          NA         NA             NA        NA  NA
  reproduction           extrainfo notes lat lon coord_source coord_accuracy country province
1           NA                        NA  NA  NA           NA             NA      NA       NA
2           NA                        NA  NA  NA           NA             NA      NA       NA
3           NA                        NA  NA  NA           NA             NA      NA       NA
4           NA                        NA  NA  NA           NA             NA      NA       NA
5           NA                        NA  NA  NA           NA             NA      NA       NA
6           NA Anaspides tasmaniae    NA  NA  NA           NA             NA      NA       NA
7           NA                        NA  NA  NA           NA             NA      NA       NA
  region exactsite  X
1     NA        NA NA
2     NA        NA NA
3     NA        NA NA
4     NA        NA NA
5     NA        NA NA
6     NA        NA NA
7     NA        NA NA

> str(specimen_table)
'data.frame':	7 obs. of  42 variables:
 $ processid                 : chr  "image_ids" "GBCM0002-06" "" "GBCM0381-06" ...
 $ sampleid                  : chr  "image_urls" "AF048821" "" "DQ310660" ...
 $ recordID                  : chr  "copyright_licenses" "468923" "" "501348" ...
 $ catalognum                : chr  "trace_ids" " " "" "HBLB 047 (BIO)" ...
 $ fieldnum                  : chr  "trace_links" " " "" " " ...
 $ institution_storing       : chr  "run_dates" "Mined from GenBank, NCBI" "" "Mined from GenBank, NCBI" ...
 $ bin_uri                   : chr  "sequencing_centers" "BOLD:AAF3961" "" "BOLD:AAF3962" ...
 $ phylum_taxID              : chr  "directions" "20" "" "20" ...
 $ phylum_name               : chr  "seq_primers" "Arthropoda" "" "Arthropoda" ...
 $ class_taxID               : chr  "marker_codes" "69" "" "69" ...
 $ class_name                : chr  "" "Malacostraca" "" "Malacostraca" ...
 $ order_taxID               : int  NA 352 NA 352 NA 352 NA
 $ order_name                : chr  "" "Anaspidacea" "" "Anaspidacea" ...
 $ family_taxID              : int  NA 1697 NA 1697 NA 1697 NA
 $ family_name               : chr  "" "Anaspididae" "" "Anaspididae" ...
 $ subfamily_taxID           : logi  NA NA NA NA NA NA ...
 $ subfamily_name            : logi  NA NA NA NA NA NA ...
 $ genus_taxID               : int  NA 5694 NA 5694 NA 5694 NA
 $ genus_name                : chr  "" "Anaspides" "" "Anaspides" ...
 $ species_taxID             : int  NA 8241 NA 8241 NA 8241 NA
 $ species_name              : chr  "" "Anaspides tasmaniae" "" "Anaspides tasmaniae" ...
 $ subspecies_taxID          : logi  NA NA NA NA NA NA ...
 $ subspecies_name           : logi  NA NA NA NA NA NA ...
 $ identification_provided_by: logi  NA NA NA NA NA NA ...
 $ voucher_type              : logi  NA NA NA NA NA NA ...
 $ tissue_type               : logi  NA NA NA NA NA NA ...
 $ collectors                : logi  NA NA NA NA NA NA ...
 $ collectiondate            : logi  NA NA NA NA NA NA ...
 $ lifestage                 : logi  NA NA NA NA NA NA ...
 $ sex                       : logi  NA NA NA NA NA NA ...
 $ reproduction              : logi  NA NA NA NA NA NA ...
 $ extrainfo                 : chr  "" " " "" " " ...
 $ notes                     : logi  NA NA NA NA NA NA ...
 $ lat                       : logi  NA NA NA NA NA NA ...
 $ lon                       : logi  NA NA NA NA NA NA ...
 $ coord_source              : logi  NA NA NA NA NA NA ...
 $ coord_accuracy            : logi  NA NA NA NA NA NA ...
 $ country                   : logi  NA NA NA NA NA NA ...
 $ province                  : logi  NA NA NA NA NA NA ...
 $ region                    : logi  NA NA NA NA NA NA ...
 $ exactsite                 : logi  NA NA NA NA NA NA ...
 $ X                         : logi  NA NA NA NA NA NA ...

Any help would be very welcome! Thanks in advance.

Session info here:

> devtools::session_info()
Session info --------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  en_AU.UTF-8                 
 tz       Australia/Melbourne         
 date     2017-07-19                  

Packages ------------------------------------------------------------------------------------------
 package       * version    date       source                                
 ape             4.1        2017-02-14 CRAN (R 3.4.0)                        
 assertthat      0.2.0      2017-04-11 CRAN (R 3.4.0)                        
 backports       1.0.5      2017-01-18 CRAN (R 3.4.0)                        
 base          * 3.4.0      2017-04-21 local                                 
 bindr           0.1        2016-11-13 cran (@0.1)                           
 bindrcpp        0.2        2017-06-17 cran (@0.2)                           
 bold            0.4.0      2017-01-06 CRAN (R 3.4.0)                        
 codetools       0.2-15     2016-10-05 CRAN (R 3.4.0)                        
 commonmark      1.2        2017-03-01 CRAN (R 3.4.0)                        
 compiler        3.4.0      2017-04-21 local                                 
 crayon          1.3.2      2016-06-28 CRAN (R 3.4.0)                        
 curl            2.7        2017-06-26 cran (@2.7)                           
 data.table      1.10.4     2017-02-01 CRAN (R 3.4.0)                        
 datasets      * 3.4.0      2017-04-21 local                                 
 desc            1.1.0      2017-01-27 CRAN (R 3.4.0)                        
 devtools      * 1.13.2     2017-06-02 CRAN (R 3.4.0)                        
 digest          0.6.12     2017-01-27 CRAN (R 3.4.0)                        
 dplyr           0.7.1      2017-06-22 cran (@0.7.1)                         
 foreach         1.4.3      2015-10-13 CRAN (R 3.4.0)                        
 git2r           0.18.0     2017-01-01 CRAN (R 3.4.0)                        
 glue            1.1.1      2017-06-21 cran (@1.1.1)                         
 graphics      * 3.4.0      2017-04-21 local                                 
 grDevices     * 3.4.0      2017-04-21 local                                 
 grid            3.4.0      2017-04-21 local                                 
 httr            1.2.1      2016-07-03 CRAN (R 3.4.0)                        
 iterators       1.0.8      2015-10-13 CRAN (R 3.4.0)                        
 jsonlite        1.5        2017-06-01 cran (@1.5)                           
 lattice         0.20-35    2017-03-25 CRAN (R 3.4.0)                        
 magrittr        1.5        2014-11-22 CRAN (R 3.4.0)                        
 memoise         1.1.0      2017-04-21 CRAN (R 3.4.0)                        
 metabarcodedb * 0.0.0.9000 2017-07-19 local (griffinp/metabarcodedb@b70ec4c)
 methods       * 3.4.0      2017-04-21 local                                 
 nlme            3.1-131    2017-02-06 CRAN (R 3.4.0)                        
 parallel        3.4.0      2017-04-21 local                                 
 pbapply         1.3-3      2017-07-04 CRAN (R 3.4.1)                        
 pkgconfig       2.0.1      2017-03-21 cran (@2.0.1)                         
 plyr            1.8.4      2016-06-08 CRAN (R 3.4.0)                        
 R6              2.2.2      2017-06-17 cran (@2.2.2)                         
 Rcpp            0.12.11    2017-05-22 cran (@0.12.11)                       
 rentrez         1.1.0      2017-06-01 cran (@1.1.0)                         
 reshape         0.8.6      2016-10-21 CRAN (R 3.4.0)                        
 reshape2        1.4.2      2016-10-22 CRAN (R 3.4.0)                        
 rlang           0.1.1      2017-05-18 cran (@0.1.1)                         
 roxygen2      * 6.0.1      2017-02-06 CRAN (R 3.4.0)                        
 rprojroot       1.2        2017-01-16 CRAN (R 3.4.0)                        
 stats         * 3.4.0      2017-04-21 local                                 
 stringi         1.1.5      2017-04-07 CRAN (R 3.4.0)                        
 stringr         1.2.0      2017-02-18 CRAN (R 3.4.0)                        
 taxize          0.8.8      2017-07-01 cran (@0.8.8)                         
 testthat        1.0.2      2016-04-23 CRAN (R 3.4.0)                        
 tibble          1.3.3      2017-05-28 cran (@1.3.3)                         
 tools           3.4.0      2017-04-21 local                                 
 utils         * 3.4.0      2017-04-21 local                                 
 withr           1.0.2      2016-06-20 CRAN (R 3.4.0)                        
 XML             3.98-1.9   2017-06-19 cran (@3.98-1.)                       
 xml2            1.1.1      2017-01-24 CRAN (R 3.4.0)

replace xml2::xml_find_one with xml2::xml_find_first

dependencies not available

I'm not being able to install the package in Ubuntu RStudio. I get the following error:

ERROR: dependencies ‘xml2’, ‘reshape’, ‘plyr’ are not available for package ‘bold’

Whenever I try to install these packages individually, I get that they are not available for R version 3.0.2.

Any suggestion?

bioc package isn't installing right, appveyor failing

https://ci.appveyor.com/project/sckott/bold/build/1.0.260#L1038

bold_seq: error well when BOLD server times out

via #52

`bold_tax_id()` throws an error in some cases

`bold_tax_name()` throws an error in some cases

e.g.,

bold_tax_name("Cordulegaster erronea")    # Throws an error

Error in data.frame(input = y, df, stringsAsFactors = FALSE) : 
  arguments imply differing number of rows: 1, 0

Reported by Bill Buaas

Graceful failure for lists of taxa

Hi Scott, for some reason a small number of taxa are returning NA dataframes, but more annoyingly, when combined with taxa that are working as expected, also return a NA dataframe. Taxa that are not in BOLD return nothing, and do not interfere with those that are. Hope this example makes sense.
Cheers!

cod <- "Gadus morhua"
haddock <- "Melanogrammus aeglefinus"
pipefish <- "Nerophis lumbriciformis"
str(bold_seqspec(taxon=cod))# taxon in BOLD, but doesn't work (returns a dataframe of NA)
str(bold_seqspec(taxon=haddock))# taxon in BOLD, works (returns nice dataframe)
str(bold_seqspec(taxon=pipefish))# taxon not in BOLD, returns nothing (i.e. works)
str(bold_seqspec(taxon=c(cod,haddock)))# when combined, both fail and a NA dataframe is returned
str(bold_seqspec(taxon=c(haddock,pipefish)))# when combined, returns just haddock data as expected

ropensci / bold Goto Github PK

bold's Introduction

rOpenSci

bold's People

Contributors

Stargazers

Watchers

Forkers

bold's Issues

Recommend Projects

Recommend Topics

Recommend Org