Giter Site home page Giter Site logo

comp472win20p2's Introduction

COMP472WIN20P2

COMP 472 win2020 team project 2

AI with Naive Bayes classification for natural language processing

Team SkAI360

members:

name student id email git id
Yefei Xue 26433979 [email protected] felixyf0124
Kevin Lin 40002383 [email protected] AznBoy00

dues: Deliverable 1: Apr 5th

Build a variety of models to identify the language of tweets, and report their performance on the initial test set (the one available on Moodle).

Deliverable 2: Apr 20th

Evaluate and analyze the results of the above models on a new test set that will be given to you at demo time. This involves analysis, report writing and an oral presentation.


to run for given model with given hyper-param sets:

python3 main.py

to run for our own model:

python3 try.py

all output files are in the directory:

./SkAI360/output/

comp472win20p2's People

Contributors

felixyf0124 avatar aznboy00 avatar

Stargazers

 avatar

Watchers

James Cloos avatar

comp472win20p2's Issues

[Main] Analysis

Description

In order for your analysis to include the results of your models with the test set used at the demo, it will be due as Deliverable 2, after the demos.

task breakdown

  • Report #3
  • other improvements & analysis

PR:

[Main] Building the Models

Description

Building the Models
Write a Python program to build a variety of Naive Bayes classifiers using the training set available on Moodle.
To avoid arithmetic underflow, you should work in log10 space.

task breakdown

  • 1. The input
    image

  • 2. The Hyper-Parameters #4
    image

  • 3. BYOM: Build Your Own Model
    image

  • input and output handler

  • main compiler

PR:

Hyper-Parameters input

Parent issue #1

Description

Your models will all be based on the same Naive Bayes classifier, but will differ only on the hyperparameters used.

task breakdown

  • Vocabulary (V)
    image
  • Size of n-grams
    image
  • Smooting value
    image

PR:

[Doc] Report

Parent issue #2

Description

The report will be used to describe your own model and analyse the results of the models. The intended audience
of your report is me (your prof) and your TAs. Hence there is no need to explain the theory behind the models.
Your report should focus on your work and the comparison of the performance of the models when the hyper-parameters are modified. Your report should be 4-6 pages (without references and appendices) and use the template provided on Moodle. The report should contain at least the following:

task breakdown

  • 1/2 ~ 1 page: Introduction and technical details.
  • 1/2 ~ 1 page: An analysis of the initial dataset given on Moodle, and the one given at the demo time. If there is anything particular about these datasets that might have an impact on the performance of some models, explain it.
  • 1/2 ~ 1 page: A motivation and description of your model. Explain its hyper-parameters and why you chose them.
  • 2 ~ 3 pages: An analysis of the results of all the models with the demo-test set and the initial test set. In particular, compare and contrast the performance of each model with one another, and with the initial and demo test sets. Please note that your report must be analytical. This means that in addition to stating the facts (e.g. the macro-F1 has this value), you should also analyse them (i.e. explain why some metric seems more appropriate than another, or why your model did not do as well as expected with the test set given at the demo . . . ). Tables and graphs would be very welcome here. A confusion matrix would be a great tool for the analysis.
  • 1/2 page: In the case of teamwork, a description of the responsibilities and contributions of each team member.
  • Your report should have a reference section (not included in the page count) that properly cites all relevant resources that you have consulted (books, Web sites . . . ), even if it was just to inspire you. Failure to properly cite your references constitutes plagiarism and will be reported.
  • Use appendices (not included in the page count), if you wish to show additional tables or graphs.

PR:
N/A

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.