Giter Site home page Giter Site logo

saranggami / ted-talks-views-prediction-supervised-learning Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 21.53 MB

This project aims to build a regression model that predicts the number of views for TED Talks videos on the TED website.

Home Page: https://github.com/SarangGami/TED-Talks-Views-Prediction-Supervised-learning

Jupyter Notebook 100.00%
baggingalgorithms boosting-algorithms data-cleaning data-wrangling eda ensamble-methods feature-engineering feature-selection machine-learning ml-algortihms multicollinearity regression-models supervised-learning ted-talks views-prediction

ted-talks-views-prediction-supervised-learning's Introduction

Best-TED-Talks

Project Summary :-

TED=Technology Entertainment And Design

TED is an unprofitable organisation that post videos online in Free. TED is devoted to spreading powerful ideas on just about any topic. These datasets contain over 4,000+ TED talks including transcripts in many languages. It was basically started as a conference in 1984 that designed by Richard saul wurman but due to some reasons it was unsuccesful. After 6 years in 1990 ,it back with a bang. In TED, speaker share their views and ideas to the society in 18 minutes. As of 2015, TED and its sister TEDx chapters have published more than 2000 talks for free consumption by the masses and its speaker list boasts of the likes of Al Gore, Jimmy Wales, Shahrukh Khan, and Bill Gates.
            Slogan Of TED :- IDEAS WORTH SPREADING โฐ

Objective :

  • The main objective is to build a predictive model, which could help in predicting the views of the videos uploaded on the TEDx website.

Dataset info :

  • 1.Number of records: 4,005
  • 2.Number of features: 19

The dataset contains features like :

  • talk_id: A unique identifier for each TED Talk video.
  • title: The title of the talk.
  • speaker_1: The primary speaker for the talk.
  • all_speakers: A list of all the speakers for the talk.
  • occupations: The occupations of the speakers.
  • about_speakers: Information about the speakers, such as their backgrounds and expertise.
  • recorded_date: The date the talk was recorded.
  • published_date: The date the talk was published on the TED Talks YouTube channel.
  • event: The name of the TED event where the talk was given.
  • native_lang: The language the talk was given in.
  • available_lang: The languages the talk is available in.
  • duration: The length of the video.(in sec.)
  • topics: The topics covered in the talk.
  • related talks: Other TED Talks that are related to this talk.
  • url: The URL of the video.
  • description: A brief description of the talk.
  • transcript: A transcript of the talk.

Target Variable :

  • views: The number of views the video has received.

Project Work flow :-

  • Importing Neccessary Libraries
  • Data Wrangling
      1. Gathering Data 
          - CSV and others files 
          - APIs 
          - Web Scraping 
          - Databases 
      2. Assessing Data
      3. Cleaning Data 
  • EDA and Features Engineering on features
  • Features Transformation and selection
  • Remove Multicollinearity
  • model implementation and pre-processing
      1. Train, Test and Split
      2. preprocessing using column-transformer
      3. make best pipeline     
  • Fitting the regression models and HyperParameter Tuning
  • Final selection of the model
  • Conclusion

Algorithms used for ML model implementation :-

  • Linear Regression
  • Ridge Regression(L2)
  • Lasso Regression(L1)
  • DecisionTreeRegressor
  • RandomForestRegressor
  • AdaBoostRegressor
  • GradientBoostingRegressor
  • XGBRegressor
  • VotingRegressor
  • StackingRegressor

Best Models to Achieve our Objective :-

๐Ÿฅ‡ RandomForestRegressor with hyperparameter tuning ๐Ÿฅ‡

random forest pipeline

Training data R2 and Adjusted R2 Score

  • R2 score 0.9108
  • Adjusted R2 score 0.9106

Testing data R2 and Adjusted R2 Score

  • R2 score 0.8977
  • Adjusted R2 score 0.8968

Cross-validation score

  • 0.8974

The performance metrics

  • MAE 0.2613
  • MSE 0.1055
  • RMSE 0.3249

๐Ÿฅˆ VotingRegressor with hyperparameter tuning ๐Ÿฅˆ

voting regressor pipeline

Training data R2 and Adjusted R2 Score

  • R2 score 0.9109
  • Adjusted R2 score 0.9107

Testing data R2 and Adjusted R2 Score

  • R2 score 0.8981
  • Adjusted R2 score 0.8972

Cross-validation score

  • 0.8977

The performance metrics

  • MAE 0.2615
  • MSE 0.1051
  • RMSE 0.3243

click here To access the solution of the Capstone Project on TED Talks Views Prediction using Supervised Learning, which includes the usage of 10 different algorithms with detailed explanations and conclusions.


CERTIFICATE

69473499840043

ted-talks-views-prediction-supervised-learning's People

Contributors

saranggami avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.