Giter Site home page Giter Site logo

mawada-sweis / clustering-analysis Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 3.0 50.66 MB

Clustering Analysis is used to analyze the travel behaviors, preferences, and attitudes of New York City citizens.

Python 0.29% Jupyter Notebook 99.71%
cluster-analysis

clustering-analysis's Introduction

Hi ๐Ÿ‘‹, I'm Mawadda

A passionate AI Engineer ๐Ÿค– and Backend developer ๐Ÿ‘จโ€๐Ÿ’ป


mawada-sweis

  • ๐Ÿ”ญ Iโ€™m currently working on Smart task management system
  • ๐Ÿ‘ฏ Proud of the machine learning project Predict car price
  • ๐ŸŒฑ Iโ€™m currently learning Power BI, Docker, and other tools.
  • ๐Ÿ’ฌ Ask me about Machine learning, Data science, and Project management.

My Skill Set

Frontend

Bootstrap CSS3 HTML5 Angular

Frameworks and Packages

.Net Core TensorFlow pytorch OpenCV Keras

Backend

Python C# Node.js Flask

DevOps

Git Docker

Github Stats


clustering-analysis's People

Contributors

eleenkmail avatar mawada-sweis avatar ramahasiba avatar zubaidasader avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

clustering-analysis's Issues

Discuss result

Meet to discuss the clustering results for various algorithms and our progress.

skeleton folder

Having a well-organized project structure makes it easy to understand and make changes.
In general, these are the basic folders we will work on:

data, models, notebooks, utils and src folders.

EDA

  • The plot displays the percentage of missing data (before and after Cleaning).
  • Plotting displays the column value counts.
  • Correlation graph.
  • Numeric column boxplot.
  • Plots of distribution.
  • Write Insights and conclusions

Transformation

Do Feature Engineering includes:

  • One hot encoding on categorical features (Nominal).
  • Extract the day, month and year features from each date feature.

Grouping Data

Group the data based on the household feature to better understand the data.

Hierarchal model

  • Choose the best threshold value.
  • Display the dendrogram graph.

Cleaning dataset

Cleaning the data by removing columns that are unnecessary, or contain a large amount of missing data. in addition to dealing with noisy or inconsistent data.

Scaling dataset

  • share final two data sets scaled with:
    • standard scaler
    • min max scaler

K-mean model

  • Choose the best K value.
  • Display the Elbow graph.
  • How to Display the clustering?

Basic Dataset information

Provide the basic information about the dataset in the notebook file. It must contain:

  • Dataset Statistical Describing.
  • Dataset information.
  • Display the percentage of missing/noisy/inconsistent data for each column.
  • Suggest method handling based on the data.

Share final dataset

Based on the dataset and our sumption and understanding:

  • What columns we will have to keep?
  • How to deal with samples of other persons in the same household?
  • How to deal with several type of collected data method in the same dataset?

Output expected:
Dataset with specific collected data method and with only persons who fill out the servay.

Readme file

The Citywide Mobility Survey (CMS)

The Citywide Mobility Survey (CMS) is a survey conducted by the New York City Department of Transportation (DOT) to gather information about the travel behavior, preferences, and attitudes of New York City residents. The survey is conducted periodically, and the data collected is used to inform transportation planning and policy decisions.

Table of Contents

Data Objectives

The primary objectives of the CMS data collection are:

  • Identify the factors and experiences that drive transportation choices for New York City residents.
  • Understand current views on the state of transportation within the City.
  • Measure attitudes toward current transportation issues and topics in New York City.

Project Objective

The objective of this project is to analyze the CMS dataset to gain insights into the travel behavior, preferences, and attitudes of New York City residents. Specifically, we aim to:

  • Understand the relationship between residents' behaviors, preferences, attitudes, and traveling methods.
  • Identify any trends or patterns in the data that may be relevant to transportation planning and policy decisions.

By achieving these objectives, we hope to contribute to the ongoing efforts to improve transportation in New York City and enhance the mobility of its residents.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.