Giter Site home page Giter Site logo

tag_eng's Introduction

##Tagging Engine 1.0.2 01 May 2016

###Description A simple Tagging Engine in Python which takes a user-specified file and returns the most common word, as well as listing the n-most common words (where n is user-defined).

###Applications Useful for highlighting top n words in a blog post (to create tags), or reviewing the relevance of articles, web pages, etc. to a particular topic.

###Some problems to be ironed out:

  • Program currently takes a text file with punctuation marks already stripped out, punctuation marks and other non-alpha characters currently returned as parts of unique words which skews the results;
  • Words appearing both singular and plural currently returned as two unique words;
  • Need to add error handling on user input of n (order).

###Further improvements to be made:

  • User to specify whether Python/coding reserved words to be counted or excluded. Excluded words list (commonly-used Python reserved words) is hard-coded but need to add executable code to allow for their exclusion;
  • Add ability to handle files with punctuation marks by stripping them out before splitting and counting words (at user request);
  • Add ability to handle plurals and singulars as one word (at user request);
  • Add error-handling on user inpout of n (order);
  • Add ability to handle file paths (file in location other than working directory);
  • Add ability to save outputs to file (user to sepcify: restricted counts, full counts, or both).

###Future developments planned:

  • Allow user to specify whether uploading simple text or html. If html, the program can use html tags (title, headers (h1, h2, etc.), strong or italic html tags to weight importance of text (either as words or phrases) more highly than normal, untagged/unformatted text. This will help make the program more useful if being used, for example, to auto-tag a Wordpress blog post;
  • Review the text in pairs and/or triplets to identify significant phrases. User prompted to check/confirm if phrase is a significant phrase. User able to modify/overwrite significant phrase as required before appending to a list;
  • The user can be prompted for a file path and filename where the significant phrases list can be saved. This can then be reloaded when using the program multiple times on multiple web pages/articles/etc;
  • The same functionality can be used to test titles and headers for significant phrases.

This functionality will be useful for accelerated learning, and for rapid searching/reviewing of many web sites/pages/articles.

tag_eng's People

Contributors

debkr avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.