Giter Site home page Giter Site logo

ngocmphan / reddit_immigration Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 55.38 MB

Analysis of subreddits on Canadian immigration programs.

Home Page: https://redditimmigration.streamlit.app/

Python 100.00%
canada data-analysis-python immigration python reddit redditapi

reddit_immigration's Introduction

Reddit Immigration

Result Summary Dashboard

Summary Dashboard

Topic

The topic of the project is to explore the issues and topics discussed on Reddit relating to immigrating to Canada. Immigration here can relate to the early inquiring stage, applying stage, processing stage, and obtaining results stage.

Problem statement and background

The permanent residence (PR) process is a lengthy process which starts from the preparation of the experience, skills and finances for the immigration program, to the time the applicant obtained the permanent residence. The total process can take up to six years with significant financial decisions involved: application fees, lawyer fees, self-sustaining expenses, etc.

As the PR process requires significant preparations, there are many questions regarding the process from applicants. The ideal option is to consult with lawyers or contract a legal firm to prepare the application on behalf of the applicant; however, this path is known for being costly. Therefore, it is very common that people would post their questions regarding the process on social media or public forums. From this, the project aims to understand the questions and concerns posted on Reddit with emphasis on common issues asked, topics mentioned and the general review of the process or point of improvements.

The PR process can include programs like Express Entry, Provincial Nomination Program (from different provinces), Public Policy Pathways. In addition to the PR process, there are also other immigration processes to obtain legal status in Canada, for example, Work permit, Study Permit or Visitor visa processes.

The time period for analysis is 10 years from 2013 to 2023, with the extraction date on March 4th, 2023 using PMAW. Noted that the ImmigrationCanada subreddit was created on February 5th, 2013, and that the Public Policy Pathway is no longer available.

Data sources and variables

The data is obtained from Reddit directly through the use of Reddit API. The project used PRAW - Python Reddit API Wrapper to directly access Reddit. The following variables are used for analysis in the project:

  • selftext: The content of the original post
  • author_fullname: Author name account
  • title: Title of the post
  • ups: Upvote of the post (noted that PMAW uses score as ups)
  • link_flair_text: Program category of the post (Express entry, Citizenship, Visitor Visa, Sponsorship, PNP, Meta, Public Policy pathways, Study Permit, Work Permit, Quebec, Covid-19, Other)
  • upvote_ratio: The ratio of the upvote of the post
  • media_embed: Embedded links in the post
  • created_utc: Post created time based on Unix timestamp.
  • num_comments: Number of comments of the post.
  • id: Unique ID of the post.

Language and Tools used

  • Python and related libraries (pandas, matplotlib, etc.)
  • PRAW - Python Reddit API Wrapper
  • PMAW - Python Pushift API Wrapper

Deliverables

Understanding of the posts and questions asked: Issues/questions, topics, and immigration programs

  • Which topics/programs have the most discussion based on timeline (all time, by year, current year)?
  • What are the most mentioned issues in general for the Canadian Immigration application? Mentioned issues by program? (All time, by year, current year)
  • Sentimental analysis: Positive of negative review of the experience/process by program. Include analysis of sentiments per program by year.
  • Is there specific time of the year (seasonality) that a specific program requires more attention due to the amount of questions received?

Point of improvements summary

  • Topics that are considered controversial, which might indicate unclear instructions and lack of resource (all time, by year, current year)
  • Which part of the process is the pain point? Ranking by issues.
  • Which program requires the most improvement? Number of applications and analysis of pain points.

Visualization deliverable: Summary dashboard

Please refer to the Summary dashboard as follows: https://redditimmigration.streamlit.app/

reddit_immigration's People

Contributors

ngocmphan avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.