Giter Site home page Giter Site logo

4n4nd / ceph_drive_failure Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aicoe-aiops/ceph_drive_failure

0.0 1.0 0.0 63.61 MB

An AI/ML solution that provides a probability that a hard drive will fail within some pre-defined time period.

License: GNU Lesser General Public License v2.1

Jupyter Notebook 10.52% HTML 89.46% Python 0.02%

ceph_drive_failure's Introduction

Ceph Drive Failure Prediction

Overview

More than 2500 petabytes of data is generated every day by sources such as social media, IoT, commercial services, etc. Of this, a sizeable chunk is persisted in storage systems (HDDs and SSDs). To ensure that data is not lost or corrupted, large scale storage solutions often used erasure-coding or mirroring. However, these techniques become more difficult and/or expensive to deal with at scale.

This project aims to enhance Ceph, a distributed storage system, by giving it the capability to predict the failure of storage devices well in advance. These predictions can then be used to determine when to add/remove replicas. In this way, the fault tolerance may be improved by up to an order of magnitude, since the probability of data loss is generally related to the probability of multiple, concurrent device failures.

Dataset

The Backblaze Hard Drive dataset will be used for this project. This dataset consists of daily snapshots of basic information, SMART metrics, and status (failure label) for the hard drives in the Backblaze data center. Details about this dataset can be found here. To learn more about the SMART system and SMART metrics, see this Wikipedia article.

Objective

The goal is to create predictive models using the Backblaze dataset to determine when a hard drive will fail. Ideally, the model should be able to predict the health of a hard drive in terms of "good" (>6 weeks till failure), "warning" (2-6 weeks till failure), and "bad" (<2 weeks till failure). This setup is similar to DiskProphet, a disk health prediction solution from ProphetStor.

At inference time, 6 days of SMART data (6 rows from the Backblaze dataset) will be available to feed to this multiclass classification model. How the model makes use of this is a design choice. It may predict on all 6 individually, or generate features using multiple days data, or use only the last day data, etc. For details on how this model would be integrated into Ceph (API, preprocessing at inference time, etc) see this.

NOTE: Although the end goal is a multiclass classifier, building a binary classifier ("no fail"/"fail") could be a good starting point in understanding the problem and setup. Additionally, data exploration and insightful analysis could also be useful. These would be welcome contributions to this project as well.

Notebooks/Kernels

The following are some notebooks to get started or to use as utils: data_explorer.ipynb data_cleaner_*.ipynb clustering_and_model_exploration.ipynb multiclass_clf.ipynb

Contact

Karanraj Chauhan Software Engineer, AI Center of Excellence - Office of the CTO Red Hat, Inc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.