science-for-nature-and-people / bibscan Goto Github PK
View Code? Open in Web Editor NEWR package to batch download PDFs from a Web of Science search
R package to batch download PDFs from a Web of Science search
Look into why
The package downloads all papers in the exported Colandr sheet, but doesn't filter out the ones that were selected through the Colandr process.
from @kanedan29 i get an error message that says Error in dirname(outfilepath) : object 'outfilepath' not found
even though my output folder definitely exists
Dillon was trying to use BibScan to access a bunch of papers and couldn't get a few that seemed like should work. Attached is a .bib of the files that didn't download. I tried them on my machine and BibScan says they don't have DOIs, but when you look at the .bib file some of them clearly do. Does this work for you?
This function does a lot of things that should be relying on subfunctions
There are currently many different stylings in the package. We should try to make it more homogenous to help contributions
From @kanedan29
I’m running into two problems with bibscan. 1.) when I load the library it’s not loading the dependencies, so I get error messages about specific functions.
Steve and I did not get the same number of download on Leslie data.
Check what are the discrepancies and if they are due to university subscriptions or something else
See if we can rename the file with more explanatory names
If the output directory is the same as the input directory, and the .bib file is in that directory, running the package will remove the .bib file
I got the below error when trying to run the main function. Attached are the files I used.
article_pdf_download(infilepath='~/Documents/Temporary/Lesley', colandr=screened_abstracts)
Converting your isi collection into a bibliographic dataframe
Articles extracted 100
Articles extracted 200
Articles extracted 300
Articles extracted 326
Done!
Genereting affiliation field tag AU_UN from C1: Done!
Converting your isi collection into a bibliographic dataframe
Articles extracted 43
Done!
Genereting affiliation field tag AU_UN from C1: Done!
Warning: 1 parsing failure.
row # A tibble: 1 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 5 NA 73 columns 9 columns literal data file # A tibble: 1 x 5
Error in filter_impl(.data, quo) :
Evaluation error: object 'citation_screening_status' not found.
In addition: Warning messages:
1: In if (grepl("\n", x)) { :
the condition has length > 1 and only the first element will be used
2: In if (grepl("\n", path)) return(path) :
the condition has length > 1 and only the first element will be used
3: In if (grepl("\n", file)) { :
the condition has length > 1 and only the first element will be used
4: Missing column names filled in: 'X73' [73]
5: In if (grepl("\n", file)) { :
the condition has length > 1 and only the first element will be used
6: In if (grepl("\n", file)) { :
the condition has length > 1 and only the first element will be used
7: In rbind(names(probs), probs_f) :
number of columns of result is not a multiple of vector length (arg 2)
After manually loading the dependency packages, the following error appeared:
Error in select(., citation_title, citation_authors, citation_journal_name) :
could not find function "select"
Output prior to error:
Converting your isi collection into a bibliographic dataframe
Articles extracted 47
Done!
Genereting affiliation field tag AU_UN from C1: Done!
Parsed with column specification:
cols(
study_id = col_integer(),
deduplication_status = col_character(),
citation_screening_status = col_character(),
fulltext_screening_status = col_character(),
data_extraction_screening_status = col_character(),
data_source_type = col_character(),
data_source_name = col_character(),
data_source_url = col_character(),
citation_title = col_character(),
citation_abstract = col_character(),
citation_authors = col_character(),
citation_journal_name = col_character(),
citation_journal_volume = col_integer(),
citation_pub_year = col_integer(),
citation_keywords = col_character(),
fulltext_filename = col_character(),
fulltext_exclude_reasons = col_character()
)
Need to add several check on the package to make it more robust to users entries:
It looks like there are three more package dependencies when loading 'BibScan' - rvest, jsonlite, and xml2. I was getting error messages of 'function not found' for a few functions that I think are from those packages - "html_nodes" and "fromJSON" most notably.
Add travis to the package
Seems due to how I set up the cache if cr_miner. Find a better way to do this, or a past processing as plan B. Seems to not disturb the doenload and file manipulation (at least on OSX/unix)
With the crminer
version of the code, it seems that the article from PLOS One are not accessed as PDF but as html documents. Need to investigate why
Not sure what all of the other dependencies are. But any idea why it takes so long to install?
One user of the Bibscan library is asking on tips on how to improve the retrieval rate so my task for today was to figure out why the retrieval rate was so low. First, I ran the code given to me and got the same number of successful pdf retrievals. Based on the error messages given, it appears that the links don't work (don't know if this is obvious or not due to lack of knowledge about this package). To look into it more, I looked at the first ten documents. Some problems that I noticed was the documents from elsevier and wiley were not working. After trying to figure out why, I landed on this page: CrossRef/rest-api-doc#96. Also, in the crimer package, they said "At least Elsevier and I think Wiley also check your IP address in addition to requiring the authentication token". So maybe that's why these websites aren't working. For springerlink, it says that "Page Not Found". For the cambridge website, it gives me the warning pop up message "Unfortunately you do not have access to this content, please use the Get access link below for information on how to access this content." These are the websites/links that were from the first ten rows. Other than these error messages, I'm not really sure what else to look at since I'm pretty new on how this code (especially crimer) works.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.