Giter Site home page Giter Site logo

iLanguage

NPM version Build Status Dependency Status

A semi-unsupervised language independent morphological analyzer useful for stemming unknown language text, or getting a rough estimate of possible parses for morphemes in a word. Uses compression, maximum entropy and fieldlinguistics.

Install

$ npm install --save ilanguage

Usage

More examples

var ILanguage = require('ilanguage').ILanguage;
var lang = new ILanguage(); 
var textToTest = {
  orthography: "this will not have any stop words or morphemes, vulgar words or unrepresentative words like banana",
  nonContentWordsArray: "not any",
  userSpecifiedNonContentWords: true,
  userRemovedWordsForThisDocumentArray: ['banana'],
  userRemovedWordsForAllDocumentsArray: ['vulgar'],
  // morphemes: /(^un|^pre|s$|ed$|ing$)/,
  morphemesArray: ["un-", " pre-", " -s", " -ed", " -ing"]
};
NonContentWords.processNonContentWords(textToTest);
expect(NonContentWords.filterText(textToTest).filteredText)
  .toEqual('thi will  have  stop word or morpheme  word or representative word like ');

Lab Members

Post Docs

Interns

Release History

  • v1.0 April 16 2009 - Initial implementation in bash and perl
  • v2.0 Jul 3 2010 - Implementation in C++
  • v3.0 April 30 2011 - Implementation in Groovy
  • v4.0 July 20 2012 - Implementation in JavaScript Map Reduce
  • v4.1 Nov 29 2013 - Added more high level functions for gloss lookup
  • v5.0 Jan 9 2014 - Implementation in CommonJS

License

This project is released under the Apache 2.0 license, which is an very non-restrictive open source license which basically says you can adapt the code to any use you see fit.

How to edit the code

Code style

Sublime will manage this for you if you format (CMD+SHIFT+P, format) your code when you save. You can refer to .editorconfig and .jshintrc for specific options.

Breakpointing while you work

You can open the test/SpecRunner.html in an actual browser to run the unit test file(s) or breakpoint the code.

Modifying the code

In general, you should always ensure that you have the latest Node.js and npm installed. On Mac you can do this by

brew update
brew upgrade node

Test that Gulp is installed by running gulp --version. If the command isn't found, run npm install -g gulp. For more information about installing the tools, see the getting started with Gulp guide.

  1. Fork the repo.
  2. Clone the repo to your computer.
  3. Run npm install to install all build dependencies.
  4. Run gulp to build this project.

Assuming that it looks something like this, you're ready to go. screen shot 2015-06-21 at 5 58 21 pm

Contributing changes

Easy way

  1. Signup for a GitHub account (GitHub is free for OpenSource)
  2. Click on the "Fork" button to create your own copy.
  3. Leave us a note in our issue tracker to tell us a bit about the bug/feature you want to work on.
  4. You can follow the 4 GitHub Help Tutorials to install and use Git on your computer.
  5. Feel free to ask us questions in our issue tracker, we're friendly and welcome Open Source newbies.
  6. Edit the code on your computer, commit it referencing the issue #xx you created ($ git commit -m "fixes #xx i changed blah blah...") and push to your origin ($ git push origin master).
  7. Click on the "Pull Request" button, and leave us a note about what you changed. We will look at your changes and help you bring them into the project!

Advanced way

  1. Create a new branch for new fixes or features, this is easier to build a fix/feature specific pull request than if you work in your master branch directly.
  2. Run gulp watch which will run the tests as you make changes.
  3. Add failing tests for the change you want to make. Run gulp test to see the tests fail.
  4. Fix stuff.
  5. Look at the terminal output (assuming you ran gulp watch) to see if the tests pass. Repeat steps 2-4 until done.
  6. Open test/SpecRunner.html unit test file(s) in actual browser(s) (Chrome Canary, Firefox, Safari) to ensure tests pass everywhere.
  7. Update the documentation to reflect any changes.
  8. Push to your fork and submit a pull request.

iLanguage's Projects

androidleveldbsample icon androidleveldbsample

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. This example shows how to use levelDB in an Android project. This project provides the Android project wrapper needed to call the levelDB project files from Java, the levelDB code is from a fork of the android branch of

androidopencvforhackathons icon androidopencvforhackathons

A hackathon ready project with selfdocumenting OpenCV debugging so you can get programming with OpenCV as soon as possible

aublog icon aublog

Automatically exported from code.google.com/p/aublog

cors-proxy icon cors-proxy

über-simple node.js-Proxy to enable CORS request for any website.

fielddb icon fielddb

Graduated to its own GitHub organization!

ilanguage icon ilanguage

A semi-unsupervised language independent morphological analyzer useful for stemming unknown language text, or getting a rough estimate of possible parses for morphemes in a word. Input: a corpus. Uses compression, maximum entropy and fieldlinguistics.

ilanguagecloud icon ilanguagecloud

An HTML5/Android word cloud generation codebase which uses statistics and field linguistics to stem/tokenize any language, and jason davies d3 cloud to render

ilanguagelab icon ilanguagelab

Automatically exported from code.google.com/p/ilanguagelab

nlpslides icon nlpslides

NLP slides is a slide package using javascript to illustrate common tasks in NLP, it uses DZSlides which is a one-file HTML template to build slides in HTML5 and CSS3.

oprime icon oprime

Open Source Experimentation Libraries - Online and Offline for Android and HTML5

upandrunningwithandroid icon upandrunningwithandroid

A workshop ready code base to get you up and running with Android intent mashups (originally from Cloud Robotics Hackathon)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.