Giter Site home page Giter Site logo

science-for-nature-and-people / bibscan Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 8.0 319 KB

R package to batch download PDFs from a Web of Science search

R 1.99% TeX 98.01%
literature-review literature-mining webofscience web-of-science bibliographic-database

bibscan's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bibscan's Issues

dirname error

from @kanedan29 i get an error message that says Error in dirname(outfilepath) : object 'outfilepath' not found even though my output folder definitely exists

issue with Dillon bib file

Dillon was trying to use BibScan to access a bunch of papers and couldn't get a few that seemed like should work. Attached is a .bib of the files that didn't download. I tried them on my machine and BibScan says they don't have DOIs, but when you look at the .bib file some of them clearly do. Does this work for you?

harmonize styling

There are currently many different stylings in the package. We should try to make it more homogenous to help contributions

Dependencies not loading

From @kanedan29

I’m running into two problems with bibscan. 1.) when I load the library it’s not loading the dependencies, so I get error messages about specific functions.

parsing failure

I got the below error when trying to run the main function. Attached are the files I used.

article_pdf_download(infilepath='~/Documents/Temporary/Lesley', colandr=screened_abstracts)

Converting your isi collection into a bibliographic dataframe

Articles extracted   100 
Articles extracted   200 
Articles extracted   300 
Articles extracted   326 
Done!


Genereting affiliation field tag AU_UN from C1:  Done!


Converting your isi collection into a bibliographic dataframe

Articles extracted   43 
Done!


Genereting affiliation field tag AU_UN from C1:  Done!

Warning: 1 parsing failure.
row # A tibble: 1 x 5 col     row col   expected   actual    file         expected   <int> <chr> <chr>      <chr>     <chr>        actual 1     5 NA    73 columns 9 columns literal data file # A tibble: 1 x 5

Error in filter_impl(.data, quo) : 
  Evaluation error: object 'citation_screening_status' not found.
In addition: Warning messages:
1: In if (grepl("\n", x)) { :
  the condition has length > 1 and only the first element will be used
2: In if (grepl("\n", path)) return(path) :
  the condition has length > 1 and only the first element will be used
3: In if (grepl("\n", file)) { :
  the condition has length > 1 and only the first element will be used
4: Missing column names filled in: 'X73' [73] 
5: In if (grepl("\n", file)) { :
  the condition has length > 1 and only the first element will be used
6: In if (grepl("\n", file)) { :
  the condition has length > 1 and only the first element will be used
7: In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 2)

files.zip

Error in "select" function

  • I also had issues with the dependency packages not loading properly.

After manually loading the dependency packages, the following error appeared:

Error in select(., citation_title, citation_authors, citation_journal_name) :
could not find function "select"


Output prior to error:
Converting your isi collection into a bibliographic dataframe

Articles extracted 47
Done!

Genereting affiliation field tag AU_UN from C1: Done!

Parsed with column specification:
cols(
study_id = col_integer(),
deduplication_status = col_character(),
citation_screening_status = col_character(),
fulltext_screening_status = col_character(),
data_extraction_screening_status = col_character(),
data_source_type = col_character(),
data_source_name = col_character(),
data_source_url = col_character(),
citation_title = col_character(),
citation_abstract = col_character(),
citation_authors = col_character(),
citation_journal_name = col_character(),
citation_journal_volume = col_integer(),
citation_pub_year = col_integer(),
citation_keywords = col_character(),
fulltext_filename = col_character(),
fulltext_exclude_reasons = col_character()
)

Other package dependencies

It looks like there are three more package dependencies when loading 'BibScan' - rvest, jsonlite, and xml2. I was getting error messages of 'function not found' for a few functions that I think are from those packages - "html_nodes" and "fromJSON" most notably.

`//` in the path of downloaded files

Seems due to how I set up the cache if cr_miner. Find a better way to do this, or a past processing as plan B. Seems to not disturb the doenload and file manipulation (at least on OSX/unix)

PLOS One article are returned as html

With the crminer version of the code, it seems that the article from PLOS One are not accessed as PDF but as html documents. Need to investigate why

Low Retrieval Rate

One user of the Bibscan library is asking on tips on how to improve the retrieval rate so my task for today was to figure out why the retrieval rate was so low. First, I ran the code given to me and got the same number of successful pdf retrievals. Based on the error messages given, it appears that the links don't work (don't know if this is obvious or not due to lack of knowledge about this package). To look into it more, I looked at the first ten documents. Some problems that I noticed was the documents from elsevier and wiley were not working. After trying to figure out why, I landed on this page: CrossRef/rest-api-doc#96. Also, in the crimer package, they said "At least Elsevier and I think Wiley also check your IP address in addition to requiring the authentication token". So maybe that's why these websites aren't working. For springerlink, it says that "Page Not Found". For the cambridge website, it gives me the warning pop up message "Unfortunately you do not have access to this content, please use the Get access link below for information on how to access this content." These are the websites/links that were from the first ten rows. Other than these error messages, I'm not really sure what else to look at since I'm pretty new on how this code (especially crimer) works.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.