Giter Site home page Giter Site logo

end-to-end-ml-project-with-deployment's Introduction

End to End ML Project

Project Description

This project demonstrates how to build an ML app using FastAPI and Docker.

In this project, we used a dataset called Car Insurance Data, which is a company's annual car insurance data. Our aim in the project is to find real customer behaviors through the data.

If a costumer has claimed his or her loan, the outcome column shows 1; otherwise, it shows 0.

Table Of Contents ✨

Installation


1) Clone the repository

https://github.com/SBeyzaEryilmaz/Task1.git

2) Create the virtual environment and install dependencies

poetry install

See the poetry docs for details on how to add/update dependencies.

3) Run the app

poetry run python main.py

Requirements


main.py Description


This section provides general information about the main.py and the overall CLI.

options:
  -h, --help            show this help message and exit
  --ref_data_path REF_DATA_PATH
                        Path to the dataset
  --tar_data_path TAR_DATA_PATH
                        Path to the dataset
  --model_output_folder MODEL_OUTPUT_FOLDER
                        Folder to save the model
  --monitoring_output_folder MONITORING_OUTPUT_FOLDER
                        Folder to save the monitoring results
  --data_visualization DATA_VISUALIZATION
                        Whether to visualize the data
  --train_model TRAIN_MODEL
                        Whether to train the model
  --monitor_model MONITOR_MODEL
                        Whether to monitor the model

Usage:

python main.py [-h] [--ref_data_path REF_DATA_PATH] [--tar_data_path TAR_DATA_PATH]
                    [--model_output_folder MODEL_OUTPUT_FOLDER] [--monitoring_output_folder MONITORING_OUTPUT_FOLDER]
                    [--data_visualization DATA_VISUALIZATION]
                    [--train_model TRAIN_MODEL] [--monitor_model MONITOR_MODEL]

Data Drift Report


Data drift refers to the phenomenon in which the statistical properties of a dataset used for training a machine learning model change over time. This can occur for various reasons, including shifts in the underlying population, changes in data collection methods, or alterations in the environment in which the data is generated.

When data drift happens, it can have a negative impact on the performance of machine learning models. Models trained on historical data may become less accurate or even fail to make accurate predictions when applied to new data that exhibits drift.

The Data Drift report helps detect and explore changes in the input data. Applies as suitable drift detection method for numerical, categorical or text features.

Plots feature values and distributions for the two datasets.The reference dataset serves as a benchmark. To detect a distribution shift, the change is analyzed by comparing current production data with reference data.

Code Example

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(reference_data=ref, current_data=cur)
data_drift_report

Dataset Drift Results

The Data Drift reports are a crucial tool for maintaining the performance and reliability of machine learning systems in dynamic environments. They help organizations adapt to changing data distributions and ensure that their models continue to provide accurate predictions.

Classification Performance Report


A classification performance report is a summary and analysis of the performance of a classification machine learning model. This report is used to evaluate how well the model is doing at correctly categorizing or classifying data into different classes or categories.

To run this report, you need to have both target and prediction columns available.

Code Example

column_mapping = ColumnMapping()

column_mapping.target = "target"
column_mapping.prediction = "prediction"

classification_performance_report = Report(metrics=[
    ClassificationPreset(),
])

classification_performance_report.run(reference_data=ref, current_data=cur, column_mapping=column_mapping)

classification_performance_report

Classification Performance Results

A typical classification performance report comprises Accuracy, Precision, Recall, F1-Score, Class Representation, Confusion Matrix and Quality Metrics by Class.

A classification performance report is essential for assessing how well a classification model is performing, understanding its strengths and weaknesses, and making informed decisions about model tuning, feature engineering, or dataset modifications to improve classification accuracy and reliability.

FastAPI


Go to the project folder.

cd Task1

Initiate the FastAPI application.

python -m uvicorn ml_model_monitoring.model_deployment.api:app --reload

Go to http://localhost:8000 in your browser. You should receive the message "Welcome" as a result of a GET request for the '/' route.

Predict (POST) Endpoint (http://localhost:8000/predict)

Currently it does not support batch predictions (will be added in future). As an example JSON file for prediction:

{"ID": 569520,
 "AGE": "65+",
 "GENDER": "female",
 "RACE": "majority",
 "DRIVING_EXPERIENCE": "0-9y",
 "EDUCATION": "high school",
 "INCOME": "upper class",
 "CREDIT_SCORE": 0.629027313918201,
 "VEHICLE_OWNERSHIP": 1.0,
 "VEHICLE_YEAR": "after 2015",
 "MARRIED": 0.0,
 "CHILDREN": 1.0,
 "POSTAL_CODE": 10238,
 "ANNUAL_MILEAGE": 12000.0,
 "VEHICLE_TYPE": "sedan",
 "SPEEDING_VIOLATIONS": 0,
 "DUIS": 0,
 "PAST_ACCIDENTS": 0}

Docker


Go to the project folder.

cd Task1

Create a docker image.

docker build -t my_docker_image .

Run the container.

docker run -p 8080:8080 my_docker_image

Go to http://localhost:8080 in your browser.

Credits


I'd like to thank Frightera for his assistance throughout the project.

License


This project is licensed under the MIT Licence, which allows users and developers to freely use, modify and distribute the project. This license describes the conditions for the use and distribution of the project, as well as a limited warranty given by the copyright holder of the project.

The full text of the MIT License can be found in the LICENSE file.

end-to-end-ml-project-with-deployment's People

Contributors

sbeyzaeryilmaz avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.