Giter Site home page Giter Site logo

trivietnlp's Introduction

Table of contents

  1. Introduction
  2. Usage for Python users
  3. Implement in Python
  4. Experimental results

TriVietNLP: Identify the most popular people in a dataset

Usage for Python users

1. Clone this Respository.

git clone https://github.com/khuongvo2305/TriVietNLP

2. Paste your datasets to /input folder.

3. Run TRIVIETNER.ipynb in jupyter notebook or google colab.

4. Get the results.

Implement of TRIVIETNER.ipynb and how to modify it in Python

1. Connect to Google Drive if using google colab, otherwise skip.

2. Extract input to /data

3. Clone VNCoreNLP and Facebook's FastText:

4. Annoate Document:

We use VNCore NLP to segmentation, POS tagging, NER, classify and then dependency parsing raw data in data/mnt/doc/doc3/ and save as DataFrame in output/result_labeled_test.csv

5. Process Annoated Document:

  • Normalize and Clean DataFrames
  • Find the most popular people and their keywords and save in input/4_df_input_getKeyw.csv

6. Add more Profile:

You can add more profile to our labeled profile saved in PerfectProfile.csv with structure: {name:text,alias:list<text>,keyword:list<Dictionary()>,description:text}
With: Keyword[0] contain kocation keywords and its count, Keyword[1] contain Location Person keyword its count, Keyword[2] contain other keywords and its count.
Example: name | alias | keywords | description
Cầu Thủ Quang Hải<br>| [hải,quang_hải]<br>| [{'mỹ_đình':1}, {'quang_hải': 3, 'huỳnh_anh': 3}, {'đối_mặt': 2, 'nhiều': 2, 'áp_lực': 2, 'antifan': 2, 'chuyện': 2, 'tình_cảm': 2}]<br>| Mô tả cầu thủ Quang Hải

7. Identify Person:

We use our heuristic to match people found in 5 to our labeled profile saved in PerfectProfile.csv, and display our result on console window.

Results

1. Recognize and display the most popular people in raw datasets:

10 first rows:
form,label,counts
trump,the_gioi,74
ngọc,doi_song,58
phương,doi_song,39
hồ_duy_hải,phap_luat,24
joe_biden,the_gioi,22
diệu_linh,suc_khoe,21
mai_phương,doi_song,20
hải,phap_luat,19
phanxicô,the_gioi,19

2. Identify most popular people:

We use our algorithms to identify 10 documents saved in /input and these are our result:

Document 0: 
Person: sơn_tùng_m-tp
Score: 
[0.01647653 0.0009299  0.00079761 0.00163723 0.         0.00096869
 0.00207021 0.         0.00162338]
Predict: 
Perdict person: Sơn Tùng M-TP
Description: Nguyễn Thanh Tùng (sinh ngày 5 tháng 7 năm 1994), thường được biết đến với nghệ danh Sơn Tùng M-TP, là một nam ca sĩ, nhạc sĩ và diễn viên người Việt Nam.

Person: nguyễn_thanh_tùng
Score: 
[0.01647653 0.0009299  0.00079761 0.00163723 0.         0.00096869
 0.00207021 0.         0.00162338]
Predict: 
Perdict person: Sơn Tùng M-TP
Description: Nguyễn Thanh Tùng (sinh ngày 5 tháng 7 năm 1994), thường được biết đến với nghệ danh Sơn Tùng M-TP, là một nam ca sĩ, nhạc sĩ và diễn viên người Việt Nam.

-------------------------
Document 1: 
Person: sơn_tùng_m-tp
Score: 
[0.02237971 0.00138163 0.00137664 0.00106146 0.00050607 0.00115116
 0.00148515 0.         0.0018315 ]
Predict: 
Perdict person: Sơn Tùng M-TP
Description: Nguyễn Thanh Tùng (sinh ngày 5 tháng 7 năm 1994), thường được biết đến với nghệ danh Sơn Tùng M-TP, là một nam ca sĩ, nhạc sĩ và diễn viên người Việt Nam.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Reference

VnCoreNLP
Underthesea
Vietnamese Stopwords

trivietnlp's People

Contributors

khuongvo2305 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.