Light

felixyf0124 / comp472win20p2 Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 8.83 MB

COMP 472 win2020 team project 2

Python 100.00%

comp472win20p2's Introduction

COMP472WIN20P2

COMP 472 win2020 team project 2

AI with Naive Bayes classification for natural language processing

Team SkAI360

members:

name	student id	email	git id
Yefei Xue	26433979	[email protected]	felixyf0124
Kevin Lin	40002383	[email protected]	AznBoy00

git repo https://github.com/felixyf0124/COMP472WIN20P2

dues: Deliverable 1: Apr 5th

Build a variety of models to identify the language of tweets, and report their performance on the initial test set (the one available on Moodle).

Deliverable 2: Apr 20th

Evaluate and analyze the results of the above models on a new test set that will be given to you at demo time. This involves analysis, report writing and an oral presentation.

to run for given model with given hyper-param sets:

python3 main.py

to run for our own model:

python3 try.py

all output files are in the directory:

./SkAI360/output/

comp472win20p2's People

Contributors

Stargazers

Watchers

Forkers

vincentsun870530

comp472win20p2's Issues

Description

In order for your analysis to include the results of your models with the test set used at the demo, it will be due as Deliverable 2, after the demos.

task breakdown

Report #3
other improvements & analysis

PR:

Description

Building the Models
Write a Python program to build a variety of Naive Bayes classifiers using the training set available on Moodle.
To avoid arithmetic underflow, you should work in log10 space.

task breakdown

PR:

Description

Your models will all be based on the same Naive Bayes classifier, but will differ only on the hyperparameters used.

task breakdown

Vocabulary (V)
Size of n-grams
Smooting value

PR:

Description

The report will be used to describe your own model and analyse the results of the models. The intended audience
of your report is me (your prof) and your TAs. Hence there is no need to explain the theory behind the models.
Your report should focus on your work and the comparison of the performance of the models when the hyper-parameters are modified. Your report should be 4-6 pages (without references and appendices) and use the template provided on Moodle. The report should contain at least the following:

task breakdown

1/2 ~ 1 page: Introduction and technical details.
1/2 ~ 1 page: An analysis of the initial dataset given on Moodle, and the one given at the demo time. If there is anything particular about these datasets that might have an impact on the performance of some models, explain it.
1/2 ~ 1 page: A motivation and description of your model. Explain its hyper-parameters and why you chose them.
2 ~ 3 pages: An analysis of the results of all the models with the demo-test set and the initial test set. In particular, compare and contrast the performance of each model with one another, and with the initial and demo test sets. Please note that your report must be analytical. This means that in addition to stating the facts (e.g. the macro-F1 has this value), you should also analyse them (i.e. explain why some metric seems more appropriate than another, or why your model did not do as well as expected with the test set given at the demo . . . ). Tables and graphs would be very welcome here. A confusion matrix would be a great tool for the analysis.
1/2 page: In the case of teamwork, a description of the responsibilities and contributions of each team member.
Your report should have a reference section (not included in the page count) that properly cites all relevant resources that you have consulted (books, Web sites . . . ), even if it was just to inspire you. Failure to properly cite your references constitutes plagiarism and will be reported.
Use appendices (not included in the page count), if you wish to show additional tables or graphs.

PR:
N/A

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

felixyf0124 / comp472win20p2 Goto Github PK

comp472win20p2's Introduction

COMP472WIN20P2

Team SkAI360

git repo https://github.com/felixyf0124/COMP472WIN20P2

to run for given model with given hyper-param sets:

to run for our own model:

all output files are in the directory:

comp472win20p2's People

Contributors

Stargazers

Watchers

Forkers

comp472win20p2's Issues

Description

task breakdown

Description

task breakdown

Parent issue #1

Description

task breakdown

Parent issue #2

Description

task breakdown

Recommend Projects

Recommend Topics

Recommend Org