danvoinea / gscholar-citations-crawler Goto Github PK

View Code? Open in Web Editor NEW

Crawl all your citations from Google Scholar

License: Other

Dockerfile 3.16% Makefile 2.74% TeX 2.86% Python 91.23%

gscholar-citations-crawler's Introduction

Google Scholar Citations

Want to know who/which journal has cited your work and compile a list?

This program allows you to retrieve all the citations an author has garnered from other scholars via Google Scholar, to store them in a bib file, and optionally, to download the publicly available PDF files associated with those citations.

Prerequisite

Python 2.7.9+ or Python 3
Latex if you need to produce a final PDF report

Download

To download, either directly download the zip file, or clone the git repository via command line with:

$ git clone https://github.com/shiqiezi/google-scholar-citations

Basic Usage

Basic command line operation is needed. A very basic usage with defaults will be:

$ python main.py https://scholar.google.com/citations?user=lqyGZpQAAAAJ

More Options

$ python main.py [-h] [--request-interval REQUEST_INTERVAL] [--should-download]
               [--download-dir DOWNLOAD_DIR] [--citation-name CITATION_NAME]
               google_scholar_uri

optional arguments explained:

  --request-interval REQUEST_INTERVAL
                        # Interval (in seconds) between requests to google scholar
  --should-download     # Download PS/PDF files of all citations iff True
  --download-dir DOWNLOAD_DIR
                        # Directory for downloaded citations PDF files
  --citation-name CITATION_NAME
                        # File name for all your citations in BibTex format

positional argument:

  google_scholar_uri    # Your google scholar homepage

###Example

$ python main.py --request-interval=50 --should-download https://scholar.google.com/citations?user=lqyGZpQAAAAJ

Getting Help

$ python main.py -h # show this help message and exit

Avoid Too Many Requests

Crawl the web resposibly. We suggest that users set a large number for the --request-interval. Requesting too frequently may result in a block from Google.