Giter Site home page Giter Site logo

spolivin / yandex_practicum_projects Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 19.66 MB

Data Science projects completed at Yandex Practicum

Jupyter Notebook 99.98% Python 0.02%
gradient-boosting time-series-analysis pytorch huggingface-transformers natural-language-processing exploratory-data-analysis hyperparameter-optimization hypothesis-testing pandas machine-learning resnet-50 tensorflow

yandex_practicum_projects's Introduction

Repository for "Data Science Specialist" Specialization (Yandex Practicum)

This is a repo of projects completed during 8-month DS/ML/NLP/CV/DL training program at "Yandex Practicum".

Each project is assigned with its own designated folder containing all related files. Due to the exclusivity of the materials provided during the course, it is forbidden to publish datasets used in the projects but, however, all Jupyter Notebook files which contain the solution of the projects do include all explanations as well as data processing results in the cells executed.

Folder structure

The project folders are generally characterized by the following structure:

|-- [project_folder_name]
    |-- README.md
    |-- [project_name].ipynb
  • README.md - Markdown file containing the description of the project;
  • *.ipynb - Jupyter Notebook file storing the solution of the project.

Projects

Project name Description Libraries used
Big Cities Music Comparison of preferences of "Yandex.Music" users from Moscow and Saint-Peterburg depending on time of day (morning and evening) and weekday (Monday, Wednesday and Friday). matplotlib numpy pandas seaborn IPython
Borrowers Solvency Study Analysis of factors affecting the creditworthiness of clients of some bank: number of children, family status, total income and loan purpose. matplotlib numpy pandas seaborn IPython
Real Estate Ads Study Exploratory data analysis of real estate objects advertisements in Saint-Petersburg and its neighbouring localities. warnings matplotlib numpy pandas seaborn IPython
Preferred Tariff Choice Optimal choice of the most preferable tariff plan from the menu offered by a mobile network operator based on its users behavior patterns. matplotlib numpy pandas seaborn IPython scipy
Computer Games Market Analysis Identification of profit-enhancing patterns in data and making product-oriented forecasts. warnings math matplotlib numpy pandas seaborn IPython scipy
Tariff Recommendation System Building a recommendation system that would suggest tariffs to clients of a mobile operator. collections matplotlib numpy pandas seaborn IPython joblib scipy sklearn
Customer Churn Building a system capable of predicting whether a client will churn from the bank or not in the near future. re collections copy matplotlib numpy pandas seaborn imblearn IPython joblib scipy sklearn
Oil Well Location Choice Building an ML model capable of determining the most optimal location for drilling a new oil well. matplotlib numpy pandas seaborn IPython sklearn
Gold Recovery Prediction[Real project] Developing an ML model prototype for predicting recovery rate of gold from gold-bearing ore. functools itertools matplotlib numpy pandas seaborn IPython sklearn tqdm
Clients' Personal Data Protection Developing a data obfuscation algorithm such that it would make it difficult to recover personal information from it. matplotlib numpy pandas seaborn IPython sklearn
Car Prices Prediction Building an optimal ML model capable of determining the prices of automobile vehicles. re time warnings pprint matplotlib numpy pandas seaborn catboost IPython joblib lightgbm sklearn xgboost
Forecasting Taxi Orders Developing a time-series model that is capable of forecasting hourly taxi orders to the airport. itertools matplotlib numpy pandas seaborn catboost IPython lightgbm sklearn statsmodels xgboost
Transformers-based Sentiment Analysis[GPU] Classification of commentaries into positive and toxic ones using BERT language model along with GPU support. pprint matplotlib numpy pandas seaborn torch transformers catboost imblearn lightgbm sklearn tqdm xgboost
Startup Investments Writing queries of different levels of complexity to the database containing information about the venture capital and startup companies. SQL/Postgres
CV-based People's Age Determination[GPU] Building a neural net model capable of determining a person's age based on their photos. os typing matplotlib numpy pandas seaborn IPython PIL tensorflow.keras
Production Costs Optimization[Diploma project] Developing a prototype of an ML model that will predict a temperature of steel. os copy pprint joblib matplotlib numpy pandas seaborn catboost IPython lightgbm sklearn xgboost

Syllabus

  • Module 1: Introduction to Data Analysis

    • Topics: Basic Python, Data Preprocessing, Exploratory Data Analysis, Statistical Data Analysis
    • Libraries: pandas numpy scipy matplotlib seaborn
  • Module 2: Basics of Machine Learning

    • Topics: Introduction to Machine Learning, Supervised Learning, Machine Learning in Business
    • Libraries: sklearn imblearn
  • Module 3: Advanced Machine Learning

    • Topics: Transformers, Natural Language Processing, Gradient Boosting/Descent, Time Series, Linear Algebra
    • Libraries: catboost lightgbm xgboost statsmodels re pymystem3 nltk transformers torch tqdm
  • Module 4: Machine Learning for Big Data

    • Topics: SQL (Postgres), PySpark, Unsupervised Learning, Computer Vision, Deep Learning
    • Libraries: tensorflow.keras pyspark PIL cv2 pyod

yandex_practicum_projects's People

Contributors

spolivin avatar

Stargazers

Ragnarök avatar

Watchers

Kostas Georgiou avatar Ragnarök avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.