Giter Site home page Giter Site logo

bage79 / korean-sentence-splitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from likejazz/korean-sentence-splitter

0.0 1.0 0.0 864 KB

Split Korean text into sentences using heuristic algorithm.

License: BSD 3-Clause "New" or "Revised" License

CMake 0.02% Dockerfile 0.03% C++ 99.05% C 0.78% Python 0.12% Makefile 0.01%

korean-sentence-splitter's Introduction

Korean Sentence Splitter

Split Korean text into sentences using heuristic algorithm. This algorithm was greatly inspired by EungGyun Kim <[email protected]> who is Kakao NLP Leader and one of the most brilliant NLP Engineers in Korea.

I've started this project inspired by this article and we've achieved best result on the test set. And of course, It's very robust to both Spoken and Written expressions.

Installation

The package is listed in the Python Package Index (PyPI), so you can install it with pip:

$ pip install kss

Usage

import kss

s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습."
for sent in kss.split_sentences(s):
    print(sent)

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Demo

Requirements

  • C++11
    • GCC or Clang with C++11 build supported.
  • Python 3

Google Test binary provided was built on macOS.

Build from scratch

C++

$ mkdir bld
$ cd bld
$ cmake ..
$ make
$ ./sentsplit

NOTICE: Google Test binary provided was built on macOS only. So, You cannot build test binary on linux.

#include <iostream>
#include "sentence_splitter.h"

int main() {
    std::string s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습.";
    for (auto sent : splitSentences(s)) {
        std::cout << sent << std::endl;
    }

    return 0;
}

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Python

Python wrapper has implemented using Cython. You can execute build tasks by the command below.

$ python setup.py install --record files.txt
or
$ pip install .

Uninstall

$ xargs rm -rf < files.txt
or
$ pip uninstall kss

PyPI

$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*

korean-sentence-splitter's People

Contributors

likejazz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.