I'm very excited to use the bold R-package but have been running into trouble when trying to pull a large dataset together using the bold_seqspec
script. Specifically, I'm trying to grab all COI-5P sequences from BOLD's database. That's a lot of data.
I started by installing the package and ran the following script successfully:
install.packages("bold")
library(bold)
df_tiny <- bold_seqspec(taxon="Echiura", marker="COI-5P")
write.table(df_tiny, file = "/home/R/euthria_bold_seqspec.txt", sep = "\t")
This creates a 53-column, 58-line text file. I've performed this task in both R-studio and from the command line (running Linux 3.13.0-85-generic, R version 3.3.0) successfully for the above script.
However, I wanted to be able to modify the script above to include the biggest group in one chunk - Arthropods - by running these commands:
df_large <- bold_seqspec(taxon="Arthropoda", marker="COI-5P")
write.table(df_large, file = "/home/R/arthropoda_bold_seqspec.txt", sep = "\t").
Unfortunately I get an error message:
Error in rawToChar(content(out, encoding = "UTF-8")) :
long vectors not supported yet: raw.c:68
I was under the impression that the relatively recent releases of R have enabled long vectors to be supported. Perhaps more to the point, I wasn't thinking that this was a particularly long vector in terms of columns, but perhaps it does exceed that 900,000 limit in terms of rows (there certainly are more than 900,000 rows in this dataset).
To that end, perhaps you can speak to the maximum number of entries that may be downloaded at once by these scripts (should one exist).
Thanks very much