Giter Site home page Giter Site logo

twitter-korean-text-ruby's Introduction

twitter-korean-text-ruby

Build Status Code Climate Gem Version

Ruby interface to twitter-korean-text by Twitter

트위터에서 제공하는 한글 형태소 분석기인 twitter-korean-text(Scala)를 Ruby에서 사용가능하도록 Wrapping 하였습니다.

twitter-korean-text 4.4 버젼을 바탕으로 만들어졌습니다.

Install

$ gem install twitter-korean-text-ruby

Gemfile을 사용할 경우

# Gemfile
gem 'twitter-korean-text-ruby'

Useage

Basic

require 'twitter-korean-text-ruby'

processor = TwitterKorean::Processor.new
# OR with JVM arguments
processor = TwitterKorean::Processor.new('-Xms126M', '-Xmx512M', ...)

# Normalize
processor.normalize("형태소 분석을 합니닼ㅋㅋㅋㅋㅋㅋ")
# => "형태소 분석을 합니다ㅋㅋㅋㅋㅋㅋ"

# Tokenize
proccessor.tokenize("한국어를 처리하는 예시입니다 ㅋㅋ")
# => ["한국어", "를", " ", "처리", "하는", " ", "예시", "입니", "다", " ", "ㅋㅋ"]

# Stemming
proccessor.stem("한국어를 처리하는 예시입니다 ㅋㅋ")
# => ["한국어", "를", " ", "처리", "하다", " ", "예시", "이다", " ", "ㅋㅋ"]

# extract phrases
proccessor.stem("한국어를 처리하는 예시입니다 ㅋㅋ")
# => ["한국어", "처리", "처리하는 예시", "예시"]

Token Information

토큰 클래스(TwitterKorean::KoreanToken)는 String을 상속받아 만들었습니다. 토큰에 대한 메타정보는 metadata attribute를 사용합니다.

tokens = proccessor.tokenize("한국어를 처리하는 예시입니다 ㅋㅋ")
token = tokens.first

token #=> 한국어
metadata = token.metadata
matadata #=> "noun, 0, 3"
metadata.pos #=> :noun
metadata.offset #=> 0
metadata.length #=> 3

Test

rake test

Issue

JAVA_HOME Path를 찾지 못했을 경우,

export JAVA_HOME=$(java_home_path)

Contribute

이 프로젝트는 twitter-korean-text 프로젝트의 Scala 코드를 Ruby로 Wrapping하는 프로젝트입니다. 관련된 범주에 대한 Issue와 Pull Request(테스트 코드가 포함된)는 언제나 환영입니다.

twitter-korean-text-ruby's People

Contributors

keepcosmos avatar

Stargazers

정시원 avatar Ryan(이효근) avatar  avatar Byungjik Roh avatar Jeongha Lee avatar Hyunseok Hwang avatar Horyun Lee avatar  avatar J avatar  avatar

Watchers

Boram Han avatar James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.