Giter Site home page Giter Site logo

zipf's Introduction

Zipf's Law

The pyzipf package tallies the occurrences of words in text files and plots each word's rank versus its frequency together with a line for the theoretical distribution for Zipf's Law.

Motivation

Zipf's Law is often stated as an observational pattern seen in the relationship between the frequency and rank of words in a text:

"…the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc." β€” wikipedia

Many books are available to download in plain text format from sites such as Project Gutenberg, so we created this package to qualitatively explore how well different books align with the word frequencies predicted by Zipf's Law.

Installation

pip install pyzipf

Usage

After installing this package, the following three commands will be available from the command line

  • countwords for counting the occurrences of words in a text.
  • collate for collating multiple word count files together.
  • plotcounts for visualizing the word counts.

A typical usage scenario would include running the following from your terminal:

countwords dracula.txt > dracula.csv
countwords moby_dick.txt > moby_dick.csv
collate dracula.csv moby_dick.csv > collated.csv
plotcounts collated.csv --outfile zipf-drac-moby.jpg

Additional information on each function can be found in their docstrings and appending the -h flag, e.g. countwords -h.

Contributors

Contributing

Interested in contributing? Check out the CONTRIBUTING.md file for guidelines on how to contribute. Please note that this project is released with a Contributor Code of Conduct (CONDUCT.md). By contributing to this project, you agree to abide by its terms. Both of these files can be found in our GitHub repository.

zipf's People

Contributors

amira-khan avatar sami-virtanen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.