Giter Site home page Giter Site logo

e-hossam96 / forest-fires-regression Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 264 KB

This regression project uses the forest fires dataset from UCI Machine Learning Repository. This project is aimed to analyse the dataset and create regression models based on the best R2 scores.

Python 1.47% Jupyter Notebook 98.53%
machine-learning deep-learning regression literature-review

forest-fires-regression's Introduction

Forest Fires Regression

Introduction

Forest fires have somewhat of a major impact on the environment. That's why having a model that predicts the burned area of a forest in advance will help prevents such terrible impacts. These predictions shall trigger the appropriate level of precautions to maintain or even prevent them from happening.

This regression project uses the forest fires dataset from UCI Machine Learning Repository. This project is aimed to analyse the dataset and create regression models based on the best R2 scores. Throughout the three notebooks of the project, we will implement data cleansing, transformation, and modeling as appoved in the literature. See References at the end of this README file to find out more.

Dataset Description

Attributes

Features

Spatial Attributes (S)

  • X : x-axis coordinate (from 1 to 9)
  • Y : y-axis coordinate (from 1 to 9)

Temporal Attributes (T)

  • month : Month of the year (January to December)
  • day : Day of the week (Monday to Sunday)

Fire Weather Index Attributes (FWI)

  • FFMC : Fine Fuel Moisture Code
  • DMC : Duff Moisture Code
  • DC : Drought Code
  • ISI : Initial Spread Index

Weather/Meteorological Attributes (M)

  • temp : Outside temperature (in Celsius)
  • RH : Outside relative humidity (in percentage)
  • wind : Outside wind speed (in kilometer per hour)
  • rain : Outside rain (in millimeter per square meter)

Target Variable

  • area : Total burned area (in ha)

Link to the Dataset

UCI ML Repository, Forest Fires Data Set

Notebook 01 Summary

  • In this notebook we have explored different regression models trying to come up with good R2 scores but the data distributions and coorelations among features themselves and with the target vaiable are not satisfying the linear regression models.
  • Also, as suggested from literature, we have tried using different subsets of the full data and tried to regress on only the nonzero observations but still no good results show up.
  • For the sake of finding different interactions among data features, we tried different techniques including adding polynomial and spline features. Yet, nothing produced good results but the spline transform proved to have good significance on the R2 score. For that reason, we only included the spline transformations and obmitted the polynomial transformations.
  • In addition, we defined a function to print out the maximum R2 score when selecting subset features of the full data attributes scoringfn. we used two techniques: the first was using different combinations using the combinations class from the itertools library, and the second was using automatic feature selection from feature_selection module. When using automatic feature selection, we used only the RFE (recursive feature elemination) class for having the best selection process.
  • Lastly, we reied different transormation techniques on the features including boxcox transformation, but it gave errors. So, we kept only the np.sin and np.log1p.

Notebook 02 Summary

In this notebook we used tree based regression models to fit the data. Of course, after somewhat overfitting the data, we managed to produce better R2 scores. The best three models came out to be DecisionTreeRegressor, ExtraTreeRegressor, and ExtraTreesRegressor. The R2 scores were respectively as follows: 0.996112, 0.996112, 0.996112. For the sake of producing these results without repeating any steps, we produced a python class to fit and print the scores on the training and testing datasets Evaluate_Model.

Notebook 03 Summary

In this notebook we used the deep learning approach to predict the burned area. We have applied different Sequential models that have different layers from the tensorflow software using keras library. The results weren't that much perfect but loss and val_loss values converged to their minimum. We also defined some utility_functions to produce the results and plot the propagation of loss and val_loss values.

References

  1. Cortez, Paulo & Morais, A.. (2007). A Data Mining Approach to Predict Forest Fires using Meteorological Data.
  2. Model Building with Forest Fire Data: Data Mining, Exploratory Analysis and Subset Selection
  3. Predict The Burned Area Of Forest Fires
  4. Basic regression: Predict fuel efficiency

forest-fires-regression's People

Contributors

e-hossam96 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.