Giter Site home page Giter Site logo

literaturereview's Introduction

LiteratureReview

scrapper for various science databases, supported databases are IEEE Xplore, Science Direct and ACM. theses scrapping bots will retrieve link to each search results aka paper, title and some other meta-data such as keywords and abstract, type of paper (conference, journal ect.) which useful to do the systematic literature review process make easy.

If you find this work usefully, put a star on this repo โญ

Prerequisites

  • python 3.9 or higher
  • Chrome browser
  • Chrome web driver which matches your Chrome version. download from here

How to use

  1. go to the official site (advance search page), create a search query using their form,

    Science Direct

    IEEE Xplore

    ACM

  2. copy that query text and use it to configure the tool
  3. clone the repo (create virtual environment is recommended way) and complete the configuration can use a single bot or all the bots at one by one configuration.
git clone https://github.com/ashen007/LiteratureReview.git
  • all bots with single configuration
{
  "BINARY_LOCATION": "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
  "EXECUTABLE_PATH": "D:\\chromedriver.exe",
  "SCIDIR": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/scidir_search_term.json",
    "abs_file_save_to": "./abs/scidir_search_term.json",
    "use_batches": true,
    "batch_size": 8,
    "keep_link_file": true
  },
    "ACM": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/acm_search_term.json",
    "abs_file_save_to": "./abs/acm_search_term.json",
    "use_batches": true,
    "batch_size": 8,
    "keep_link_file": true
  },
    "IEEE": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/ieee_search_term.json",
    "abs_file_save_to": "./abs/ieee_search_term.json",
    "use_batches": false,
    "batch_size": 8,
    "keep_link_file": true
  }
}      
  • or can use one bot as well
{
  "BINARY_LOCATION": "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
  "EXECUTABLE_PATH": "D:\\chromedriver.exe",
  "SCIDIR": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/scidir_search_term.json",
    "abs_file_save_to": "./abs/scidir_search_term.json",
    "use_batches": true,
    "batch_size": 8,
    "keep_link_file": true
  }
}
  • config BINARY_LOCATION use a path to chrome.exe file location

  • config EXECUTABLE_PATH use a path where you download and extract the Chrome web driver

  1. install dependencies run the main.py
pip install -r ./requirements.txt
python main.py
  1. that's it
  2. save results into excel workbook, automatically saved into ./SLR.xlsx file.
   from src.utils import to_excel
   to_excel({"acm":'./abs/acm_search_term.json', "ieee": './abs/ieee_search_term.json', "science_direct": './abs/scidir_search_term.json'})

literaturereview's People

Contributors

ashen007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

erinhmclark

literaturereview's Issues

Science Direct is hard to scrape

you may occasionally face failures in the science direct database while scrapping automatically redirected to the sign-in page this is hard to handle and unpredictable due to their security policy. when that occurs stop the job and try again after a few minutes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.