Giter Site home page Giter Site logo

textmatch-visuals's Introduction

TextMatch-Visuals

A tool for creating visual representations of word matches between two texts.

This program is based on what are more commonly bioinfomatics concepts; the Needleman–Wunsch algorithm, global alignment, and dot-matrix plots. Many areas of literature have long histories of textual study and comparison, so this analysis is nothing new, but presents a means of visualizing the relations between words in two sections of text. A work can be compared to itself or mapped to other works to find word, and potentially phrase, similarities and differences.
TM-V is far from complete. At the moment it will match one-to-one and via shared stems. Some words, however, aren't shown as related by the lemmatizer. For example, sing and sung, while the same verb in different conjugations, are not marked as linked due to the difference in stems. Lemmatizers aren't built to detect irregular or changing stems, so it is perhaps not the best tool for mapping morphological links. For now it is an improvement on the one-to-one word matching, but I will have to further investigate the best method for proceeding.

Use

In order generate a textmatch image with TM-V, run from the command line,

create_textmatch_image.rb -o [path] -t [path] -s [path]

--help
Usage: create_textmatch_image.rb [options], provide paths for text files and path 
for where the output image should be saved, be sure to inlucde the proper file extension, 
.jpg, .png, etc. If no save path is provided, the image will not save.

-o, --text_one [path_to_file]    Input path for first text
-t, --text_two [path_to_file]    Input path for second text
-s, --save_to [path_to_save]     Input the save path

The -o and -t flags are for specifying the files of the texts to be compared. If you want the image you create to be saved, provide a path with the -s flag. If no -s is included, the textmatch image will only be displayed, not saved.
As an example, I've provided two sample text files with the following exerpts of the Aeneid.

Text_1
'I sing of arms and the man, he who, exiled by fate,
first came from the coast of Troy to Italy, and to
Lavinian shores – hurled about endlessly by land and sea,
by the will of the gods, by cruel Juno’s remorseless anger,
long suffering also in war, until he founded a city
and brought his gods to Latium: from that the Latin people
came, the lords of Alba Longa, the walls of noble Rome.
Muse, tell me the cause: how was she offended in her divinity,'

Text_2
'There was an ancient city, Carthage (held by colonists from Tyre),
opposite Italy, and the far-off mouths of the Tiber,
rich in wealth, and very savage in pursuit of war.
They say Juno loved this one land above all others,
even neglecting Samos: here were her weapons
and her chariot, even then the goddess worked at,
and cherished, the idea that it should have supremacy
over the nations, if only the fates allowed.'

Comparing text_1 to itself produces the below image. same

And comparing the different texts produces this image. different

The black dots represent direct word matches while the red dots indicate a stem match. You can supply any other text in a .txt file and run them to compare the matching words. It is best, however, that the files not be too large, since the process is O(n^2).

To-Do

  • add further grammar support (irregular verb conjugations)
  • add a key of matched words and positions to make the image more legible
  • add Latin language grammar support

Installation

Other than the Ruby standard library, TM-V uses the Ruby-Stemmer gem, ImageMagick library and the RMagick gem.

textmatch-visuals's People

Contributors

annakrohn avatar

Stargazers

Kyle P. Johnson avatar  avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.