Giter Site home page Giter Site logo

academic-bibliography-scraper's Introduction

Academic Bibliography Scraper

Introduction

This is an extensible codebase for posting queries to academic databases programatically. The results can be stored in a json or bib file to be imported into bibliographic databases such as Zotero. In situations where articles are available for download, URLs collected during search can also be used to download the files in bulk. This can save scholars and their research assistants a lot of time especially when there is heavy-duty work to be done.

File structure

Each database is given it’s own scraper file, which is named accordingly. We currently have scrapers for the following databases, most commonly used in Chinese Paleography research:

Database (中文)English NameFilename
中國期刊網CNKIcnki.py
武漢大學簡帛網Center of Bamboo Silk Manuscripts, Wuhan Universitywuhan.py
清華大學出土文獻研究與保護中心Research and Conservation Center for Unearthed Texts, Tsinghua Universityqinghua.py
復旦大學出土文獻與古文字研究中心Fudan University Unearthed and Ancient Characters Research Centerfudan.py

Each file exposes a search function, which can be called collectively by main.py to post multiple queries to multiple databases in bulk.

main.py provides a search function that accepts multiple keyword and database arguments to serve the above functionality.

Finally, a save_articles function allows the user to save the search results as json or bibtex files for viewing and further processing.

Usage

  1. Clone this repo to a local directory with:
git clone https://github.com/sati-bodhi/Academic-Bibliography-Scraper.git
  1. Open main.py, scroll to the end of the file and change the arguments for the search and save_articles function accordingly.
if __name__ == '__main__':
    rslt = search(['尹至'], 'cnki', 'wuhan', 'qinghua')
    save_articles(rslt, 'search_result', 'bib')
  • Multiple queries can be posted to a single database as such:
if __name__ == '__main__':
    rslt = search(['尹至', '郭店'], 'wuhan')
    save_articles(rslt, 'search_result', 'bib')
  • Search results can be saved as json instead of bib by changing the 3rd argument of the save_articles function.
    if __name__ == '__main__':
        rslt = search(['尹至', '郭店'], 'wuhan')
        save_articles(rslt, 'search_result', 'json')
        
  • The 2nd argument would give the name of the file, which will be ‘search_result.json’ in the example above.

Further development

Developers are welcome to extend or amend the current codebase by submitting pull requests.

Acknowledgement

I would like to thank Reinderien for helping out with the code and Dr. Pham Lee-Moi for partially funding this project.

academic-bibliography-scraper's People

Contributors

sati-bodhi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.