Giter Site home page Giter Site logo

readingtools's Introduction

Introduction

This repository consists of several useful script for me in reading books. There are basically two functions, one for extracting new words from some texts, and another for planning the reading process.

Prerequisite

  • Python3
  • spaCy for Extracting Word List Flow

Planning Flow

This flow is used when I want to nonlinear read a book, e.g. dictionary or book I have already read. This flow only requires one script, reading_plan.py. The script requires two argument, the first argument is the total number of the pages to read. The second argument can either be a number or a number plus string "day". In the first case, the number will be considered the pages you expected to read in one day. In the second case the number will be considered the number of days you want to finish reading.

python3 reading_plan.py 101 10 # Reading 101 pages with 10 pages a day
python3 reading_plan.py 101 10day # Reading 101 pages within 10 days

The output file is reading_plan.txt.

Extracting Word List Flow

This flow is used when I decided to read some books in foreign language. To use it you need to install spaCy. The first step is put all text files in the Litterature directory. Then run word_list.py. The script accepts two optional arguments. The first is the language of the files. Only "en" (Default) for English and "fr" for French are supported yet. But you can add easily add support to other languages supported by spaCy. The second is the name of the output file which is "word_list_" + lang + ".txt" by default.

python3 word_list.py en word_list_en.txt

After you get the word list, you can use percentage_analysis.py to analyze how many words can cover 90%, 95%, 98%, 99% usage of the texts. And memorize frequently used words before reading. The script need an argument of the word list file name.

python3 percentage_analysis.py word_list_en.txt

Finally, before import word list into anki or other program, you can use filter.py to remove those words you already know. The filter list should be placed under Filters folder. It has one obligatory argument and one optional argument. First is the word list file name and second is the languge. The filter file should be named "word_filter_" + lang + ".txt". The output file is "filtered_list.txt".

python3 filter.py word_list_en.txt en

Then you can use filtered list to enlarge your vocabulary.

History

The word_list_old.py is the deprecated script to generate word list using nltk.

readingtools's People

Contributors

zenith-john avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.