Giter Site home page Giter Site logo

languagecheck's Introduction

Language checking for scientific papers

This program attempts to assist you in improving your paper before submission.

Features

  • Can analyse any LaTeX papers, and Overleaf projects.
  • Makes automated reports to point you to improvements:
    • Word level:
      • find common grammar mistakes, like wrong prepositions
      • find wordy phrases and suggest replacements
      • a vs an
      • spell-check (using hunspell)
    • Sentence level:
      • find long, wordy sentences
      • check topic sentences
    • Paragraph level:
      • find tense inconsistencies
    • Paper level:
      • check visual impression of paper
  • All analysis is done offline -- your text does not leave your computer.
  • Supports British and American English, but focusses on issues applying to both.

Note that there are false positives -- only you can decide whether a change would make sense, the reports only point out potential issues.

If you find some rules useless (too many false positives), or you want to add more, please send a pull request!

Demo output

Example analysis (of an early draft of this paper):

Requirements

  • python
  • convert command (ImageMagick): Install with your distribution
  • nltk: Install with pip
  • nltk data: Install with python -m nltk.downloader all
  • detex command (usually comes with LaTeX)
  • pyhunspell (optional): Install with pip

Installation

These commands should not give you an error:

$ which convert
$ which python
$ which detex
$ which hunspell
$ ls /usr/share/hunspell/{en_US,en_UK}.{dic,aff}

Then install the python packages and data:

$ pip install pyhunspell  --user
$ pip install nltk  --user
$ python -m nltk.downloader cmudict stopwords punkt averaged_perceptron_tagger

Usage

Using directly:

  • create PDF from your latex file -> mypaper.pdf
    • For example, run "pdflatex mypaper.tex"
  • use detex to create pure text file -> mypaper.txt
    • For example, run "detex mypaper.tex > mypaper.txt". You need detex installed.
    • This does not capture figure captions. The detex.sh script can help you include those texts, "bash detex.sh mypaper.tex". You still need detex installed
  • run $ python languagecheck.py mydir/mypaper.txt mydir/mypaper.pdf
  • open with a web browser mypaper_index.html to see all reports

Using with Overleaf:

$ bash languagecheck_overleaf.sh <overleaf_url> <name of tex file>
# for example:
$ bash languagecheck_overleaf.sh https://www.overleaf.com/123456789 mypaper.tex

See also

languagecheck's People

Contributors

johannesbuchner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.