Giter Site home page Giter Site logo

olofjons / ai-lab3 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dave3625-22h/lab-3

0.0 0.0 0.0 23.89 MB

NLP Time! Using modern ML Libraries to do beginner level topic modelling on youtube titles, with data from a kaggle dataset

Jupyter Notebook 100.00%

ai-lab3's Introduction

Lab 3 - NLP Topic Modeling (Part 1?)

Warning

Some of you might have felt like we have been throwing you to the sharks in the previous labs. Anyways...

Get ready for sharknado!!!

This lab might seem very hard / very complex.

That's okay, you are not expected to understand all of it, only follow the instructions given in the notebook, to give you perspective on how to use the tools you have learned in the previous labs, plus learn about a potential NLP use case.

Please do not expect understanding, but rather try to look at this lab as an exploration of what a NLP use case might look like.

If you want to understand everything, then you are free to spend your free time with inspecting / testing and playing around with all the code and learn from it, but there is no way we could teach you a thurough understand of the concepts used in this lab in the four hours that you have to work on it. Remember that this is an introductory course.

To give some perspective on how impossible it would be to give you a full understanding, It took me maybe 10 - 15 hours to learn what I needed to do plotly the way it is done in this lab, maybe 10 - 15 hours to understand how the sentence transformer really worked, and what functions to use and how to use them, then 10ish ours of playing with UMAP, 5 - 10 hours of looking at clustering algorithms, and 5ish hours to understand how tf-idf worked and how it was used to find topics. And that was after having taken this course fully, learning these things in a context where I was surrounded by mentors and people who knew way more than me. These time estimates are also not counting the time needed to figure out that these where the things to use and learn among all the other thing I tried to shape my understanding. Consider this a curation of things that might be cool know about, not a list of things you need to understand.

Please understand that we are not giving you these labs for you to learn and understand everything, but more as a way to show you how things can be done. A complete understanding of ML is not in the scope of this course, so it will not be provied, but you are free to learn more about it on your own time. and use this notebook as a guide on what to learn, or at least what direction to go in when learning.

Please also understand that the things we show is more a simplified version of a real ML pipeline that takes in real data and provides real valuable predictions and results. It's just hard to make it much simpler without making it unclear what you are doing or what results can be provided by the tools you are using.

You are not supposed to understand everything in depth, only learn about how to use the tools and have a somewhat feasible idea on to apply them to a real world problem, or at least have an idea on where to start

New Imports

Here are the commands needed to install the new packages:

# Windows specific command
conda install pytorch cudatoolkit=11.6 -c pytorch -c conda-forge

# Mac specific command
conda install pytorch -c pytorch

# These three commands are for both mac and windows, feel free to try to oneline it for a bit of a conda chalange
conda install -c conda-forge -c plotly sentence-transformers umap-learn hdbscan plotly
conda install -c conda-forge nbformat
conda install -c conda-forge python-kaleido 

Steps to Finish Notebook

TBA

Finished Notebook Screenshot

You should have a topic plot like this:

Topic Plot

And a topic listing per cluster like this:

Topic Listing

ai-lab3's People

Contributors

ulsin avatar trashulsin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.