Giter Site home page Giter Site logo

builiding_footprint's Introduction

I couldn't upload the images because of space availability. But please let me know if need them. I'll be more than happy to share them.

building footprints detection in satellite imagery

Findings on the dataset

First, let's visualize the images and their labels (I use more masks in the code). The images are the colored tiff files and masks are the buildings in binary tiff files.

Figure 01 : U-Net Architecture

Let's also check the dimensions of the first 5 entries in the dataset for images

(1023, 1024, 4)
(1023, 1024, 4)
(1024, 1024, 4)
(1024, 1024, 4)
(1024, 1024, 4)
(1024, 1024, 4)

and labels

(1023, 1024)
(1023, 1024)
(1024, 1024)
(1024, 1024)
(1024, 1024)
(1024, 1024)

As we can see they aren't always the same, so this has to be corrected. Also, images have 4 channels but I will work only with the first 3.

Also, the masks are binary images.

The explanation of the solution

The dimensions of the dataset have to be corrected. For that, each image and mask has to be padded to 1024x1024 shape by adding a layer of zeros so that the data has the same dimension. That job does "expand2square" function in the script "functions.py". I took the code from here. Next, I extracted 256x256 patches from the images using "patchify" package and the code from here. Then, I used "U-Nets" as network architecture. I knew it could work on image segmentation since I learned in during the Deep Learning Specialization Course. Here is the link to the github repo for the code. But their implementation for multiclass problem, whereas the project is with binary images, hence - binary classifiction. I found another implementation of precisely binary classification here. A U-Net would work for this problem since it replaces dense layers with transposed convolution layers. This helps preserving the spatial information that is otherwise lost by usuing dense layers. Also, it adds skip connections to downsampling convolutional layers from transposed convolutional layers, thus helping preserve the information in the network.

Below you can see the scheme of the network:

Figure 01 : U-Net Architecture

The decoder part of the network starts with 256x256 input. In detail, it consists of the repeated application of two 3 x 3 convolutions with "same" padding, each followed by a ReLu and a 2 x 2 max pooling operation with stride 2 for downsampling. At each downsampling step, the number of feature channels is doubled.

Crop and concatenate step crops the image from the contracting path and concatenates it to the current image on the expanding path to create a skip connection.

At each step of the encoder part the inputs are upsampled by the transposed convolution, followed by two 3 x 3 convolutions. This transposed convolution halves the number of channels, while doubling the input dimensions.

In the final layer, a 1x1 convolution is used to map each 64-component feature vector to the number of classes. For binary classification sigmoid activation is used.

Images and masks are scaled dividing by 255 and the validation set is 20% of all original dataset. Binary cross entropy is used as loss function, as an optimization algorithm "Adam" (with learning rate = 0.001) and mean IoU (intersection over unit) to evaluate the learning. During the compilation "accuracy" is set as a metric but actually it is not a good metric to judge the convergence from, since the labels contain many true negatives (the black fone).

Activation = (TP + TN)/(TN + TP + FN + FP)

whereas

IoU = TP / (TP + FP + FN)

After the compilation is over the weights of the model are exported and then imported and MeanIoU is used to evaluate the model performance.

I first tried with grayscale images to test the model and then I worked with colored images. Colored images gave a little bit higher values for MeanIoU (approx. 0.2-0.3 bigger). Below results are for 100 iterations (epochs) on colored images on Google Colab using 1 GPU. It gave MeanIoU = 0.6205597. The results seem promising but tuning of the parameters is required.

Figure 02 : "Comparing the testing image, its groud true mask and the predicted mask"

Figure 03 : "Comparing the test and validation loss and accuracy."
pros and cons of the solution
PROS
  • It was quick to implement since I could find a suitable implementation online to try my model on
  • It gave promising results on the first try (IoU = 0.62)
CONS
  • IoU = 0.62 is still far from the best. Higher than 0.9 could be achieved
  • I had to put all the test images and labels in numpy arrays, which means using more memory. Better could be to use iterators.
options for future improvements
  • I could use TensorFlow datasets for more dynamic storage of images https://www.tensorflow.org/api_docs/python/tf/data/Dataset
  • A test with 5x5, 7x7 filters to see if the IoU increases. I wouldn't increase a lot filter size since building are small in the images.
  • Reducing the size of pathces to 128x128
  • Playing with learning rate
  • I am not sure about using regularization techinuques like Dropout since from the Fig. 03 we see Training and Validation curves are very close. So no visible high variance issues. But it can be tested for smaller number of iterations.

builiding_footprint's People

Contributors

hbaghramyan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.