Giter Site home page Giter Site logo

foadsf / avoid-gpt-phrases Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 18 KB

A simple tool to identify and help avoid overused phrases commonly found in AI-generated text, particularly in LaTeX/TeX documents.

License: GNU General Public License v3.0

Batchfile 49.22% Shell 50.78%

avoid-gpt-phrases's People

Contributors

foadsf avatar

Watchers

 avatar

avoid-gpt-phrases's Issues

Migrate from Bash/Batch to Python for Improved Word Frequency Analysis

Description:

This issue proposes migrating the word frequency analysis functionality from the current Bash/Batch scripts (count_words.sh and count_words.bat) to a Python-based solution using the pylatexenc and nltk libraries.

##Motivation:

  • Modernization: Python offers a more modern and versatile programming environment than Bash/Batch scripting.
  • Enhanced Functionality: The pylatexenc library provides robust LaTeX parsing capabilities, ensuring accurate text content extraction while effectively handling macros and LaTeX structures.
  • Advanced Text Analysis: nltk provides various natural language processing tools for sophisticated text analysis beyond basic word counting.
  • Maintainability and Extensibility: Python code is generally considered more readable and maintainable, facilitating future enhancements and extensions to the analysis functionality.

Proposed Solution:

This Python script (analyze_latex.py) demonstrates the proposed approach using pylatexenc and nltk to analyze word frequencies in LaTeX documents. The script includes the following features:

  • LaTeX to plain text conversion: Accurately extracts text content from LaTeX documents.
  • Tokenization and stopword removal: This process prepares the text for analysis by breaking it into words and removing common stopwords.
  • Frequency distribution: Calculates word frequencies and allows extraction of specific word counts.
  • Command-line interface: Enables easy usage by specifying the filename and target words as arguments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.