Giter Site home page Giter Site logo

goktugocal / turkish-syntactic-parser Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 10.33 MB

A small Syntactic Parser for Turkish Language, created with CKY algorithm.

License: MIT License

Jupyter Notebook 99.73% Python 0.27%
cky-algorithm cky-parser morphological-analysis natural-language-processing nlp parser syntactic-parser turkish-language turkish-nlp

turkish-syntactic-parser's Introduction

GOKTU_NLP - TURKISH NLP SYNTACTIC PARSER /w CKY ALGORITHM

Note This toolbox is prepared for CMPE561 Natural Language Processing course given by Prof. Dr. Tunga Gungor in Boğaziçi University.

Syntactic parsing is the process of analyzing a sentence or a piece of text and determining its grammatical structure. This includes identifying the constituent phrases and dependencies between the words, as well as determining the roles played by each word in the sentence (such as the subject, verb, and object). In this project, we have developed a Turkish Language CKY Parser which is fed from Chomsky's Normal Form (CNF) grammar rules and a lexicon. The process of parsing and the generation of CNF rules and lexicon are described.

Installation

1- Download codes.

$ git clone https://github.com/GoktugOcal/turkish-syntactic-parser.git

2- Install required packages.

$ pip install -r requirements.txt

Usage

Use from CLI

$ python tr_parse.py -s [<string>]
$ python tr_parse.py -s "Ben okula gittim."

Tokens : ['ben', 'okula', 'gittim']
POS Tags : [['PRO1'], ['DAT'], ['VPPAST1']]
Sentence is grammatically correct.

######### CKY CHART #########
--------  -------  -----------
ben       okula    gittim
['PRO1']  []       ['S']
[]        ['DAT']  ['VPPAST1']
[]        []       ['VPPAST1']
--------  -------  -----------
##### BEST SENTENCE STRUCTURE #####
(S(PRO1 ben ) (VPPAST1(DAT okula ) (VPPAST1 gittim ) ))

Use in Python

from tr_syntactic_parser.tools.helper import *
from tr_syntactic_parser.tr_parser import TurkishCKYParser

sentence = "..." # put your sentence
sentence = preprocess(sentence) # preprocess the sentence

filename = "tr_syntactic_parser/grammar/grammar.txt" # specify the location of CNF grammar"
parser = TurkishCKYParser(filename) # initialize the parser

parser.parse(sentence) # parse
parser.show_cky_chart() # show filled CKY chart
print("##### BEST SENTENCE STRUCTURE #####")
parser.show_sentence_structure() # show best possible sentence structure

Visualization

A parse visualizer class have been implemented with using Plotly and Spacy. The parse visualizer has three components.

First of all, run the parser

from tr_syntactic_parser.tools.helper import *
from tr_syntactic_parser.tr_parser import TurkishCKYParser
sentence = "..."
sentence = preprocess(sentence)
filename = "tr_syntactic_parser/grammar/grammar.txt"
parser = TurkishCKYParser(filename)

terminals = parser.get_terminal_nodes(parser.get_tree()) # Get terminal nodes

All the visualizations can be done esily on Jupyter Notebook.

POS tag visualizer (powered by Spacy)

Show on Notebook

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 
visualizer.pos_vis(sentence, terminals)

POS tag visualization

Sentence structure visualizer (powered by Spacy)

Show on Notebook

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 
visualizer.pos_tree_vis(sentence, parser.tokens, parser.get_tree()) # we need tokens of sentence and root of the tree in that case

POS tree visualization

  • Save Spacy output as PNG
from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 


svg = visualizer.pos_vis(sentence, terminals, jupyter=False)# set jupyter=False
# or
svg = visualizer.pos_tree_vis(sentence, parser.tokens, parser.get_tree(), jupyter=False)

from tr_syntactic_parser.tools.helper import spacy_svg2png_save # import function from helpers
spacy_svg2png_save(svg, sentence, output_path = "./") # convert svg to png

Structure tree visualizer (powered by Plotly)

Show on Notebook

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 
visualizer.tree_vis(sentence, parser.tokens, parser.get_tree()) # we need tokens of sentence and root of the tree in that case

Tree visualization

Save as

output_file = "..."
visualizer.tree_vis(sentence, parser.tokens, parser.get_tree()).write_image(output_file)

Acknowledgement

Some parts of this tool is created by using Zeyrek Morphology Analyzer and NLTK.

turkish-syntactic-parser's People

Contributors

goktugocal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.