dynaslum / satelliteimaging Goto Github PK
View Code? Open in Web Editor NEWThe software for WP1: SatelliteImaging
License: Apache License 2.0
The software for WP1: SatelliteImaging
License: Apache License 2.0
When the tile is not square (have 2 mixed classes?) it's labeled as 'Mixed' while it's one clear class.
Dataset1 (px417m250)
Train 22 classifiers
Record performance in Excell sheet
Import to MATLAB as table and save in MAT file
Generate performance graphs & save them as fig and pdf
Dataset2 (px333m200)
Train 22 classifiers
Record performance in Excell sheet
Import to MATLAB as table and save in MAT file
Generate performance graphs & save them as fig and pdf
Dataset3 (px250m150)
Train 22 classifiers
Record performance in Excell sheet
Import to MATLAB as table and save in MAT file
Generate performance graphs & save them as fig and pdf
Dataset4 (px167m100)
Train 22 classifiers
Record performance in Excell sheet
Import to MATLAB as table and save in MAT file
Generate performance graphs & save them as fig and pdf
Ignoring Dataset5 (px83m50)
Dataset6 (px100m80)
Train few classifiers
Record performance in Excell sheet
Import to MATLAB as table and save in MAT file
Generate performance graphs & save them as fig and pdf
The classificationLearner app expects tables.
Create datastore from the dataset
Split the datastore into training and teasing subsets
Use the Upright flag in BagOfFeatures class as a parameter to create BoVW.
Recompute the SURF features with Uright flag set to 0.
Follow example Image Category Classifier but also use own scripts.
Use some default choices like SURF like 80% of the strongest features. multi-class SVM, but also own parameters:
vocabulary sizes of 10, 20 and 50
SURF locations 'Detector' and 'Grid' (default settings GridStep is [8 8] and the BlockWidth is [32 64 96 128 (> 100??)] )
Apply on data-set 6, 100px = 80m.
make script skeleton
Balance categories
Looping over some parameters
Implement cross-validation
Make better output
Publish
Determine the best classifier from the tested vocabulary sizes and SURF location points
Run with best classifier options and save the model, publish.
Share with partners
Perhaps the best place for such a utility is
/satsets/util/shapefile.py
See the cell under
Function to display a multipolygon from a shape file on a figure axis with with given color and extent
currently cell 22 (of course the relevant import from the first cell, currently 19) from this source notebook
modify demo notebook toy example JI between multipolygons
modify demo notebook JI between2 shapefiles
Write draft of extended abstract
Generate new figures for the abstract
Save the trained models and confusion matrices for each data set.
px83m50
px167m100
px250m150
px333m200
px417m250
Add missing slums
Merge missing slums to given slums -> all_slums
Improve rough built-up mask
Make slums and built-up disjoint => proper built-up mask
generate non-built-up mask as all remaining pixels
The ML white paper by MATLAB is available here
Create multiclass_mask2oneclass_masks_Kalyan
use multiclass_mask2oneclass_masks.
See oneclass_masks2multiclass_mask_Kalyan
Generate at least 1 image per class per dataset visualizing the SURF features (interest point/blobs). Related to issue #22 .
We have the HoG feature. But this needs to be summarized into 5 features per window.
Features are:
Mean
Location of the peak
Absolute sine difference between two highest peaks.
For more info see: https://digital.library.adelaide.edu.au/dspace/bitstream/2440/56300/1/hdl_56300.pdf
The segmentation result could be either pixel or grid Binary Mask (BM) or a (multipolygonal) Shape (S).
The ground truth can be either a shape file or sometimes - pixel mask. In this issue consider comparison of the same type: BM <-> BM and S <-> S.
Use skikit-learn Python implementation of the Jaccard similarity score.
Install scikit-learn
Test simple example example documentation.
Source notebook for comparison of label masks.
Function for comparison of label masks.
Demo notebook using the function
Source notebook- toy example Jaccard index mulipolygons
Demo notebook- toy example Jaccard index multipolygons
Source notebook for comparison of shape files. Fix corrupt shape file.
Function for shape files.
Demo notebook using the function for shape files
Decide on how to do it- via datastore or not. Check of the method predict
can work on single image tile or on datastore.
Create function and script for predicting the label of each pixel based on tile (100x100px = 80x80m) around it using a learned classifier.
Segment the whole image using codes: 1- Slum, 2- NonBuiltUp and 3 BuiltUp
Convert the 3 single masks to a single multi-class mask where 1 is BuiltUp, 2 is NonBuiltUp and 3is Slum. Visualize the ground truth indexed image (pure and overlaid on the original image).
Visualize overlaid ground truth and segmentation on top of original image
Fill missing pixels segmentation. Visualize the resulting segmentation image (pure and overlaid on the original image).
De-noise (via majority filter?) the segmentation. Visualize the resulting segmentation image (pure and overlaid on the original image).
Make a script to compare ground truth to partial segmentation
Compare interpolated segmentation to ground truth
Currently the tiles are not overlapping, we should be able to overlap them controlled by the tiling step size.
Perhaps the best place for such a utility is
/satsets/util/shapefile.py
It's a one-liner, but is nice to be as a utility. See the cell under
Load the contents of the shapefiles as multipolygons.
currently cell 24 (of course the relevant import from the first cell, currently 19) from this source notebook
The function should return also the multipolygon's bounds (extent). See the same notebook.
Dataset 1 (417px=250m)
Dataset 2 (333px=200m)
Dataset 3 (250px=150m)
Dataset 4 (167px=100m)
Dataset 6 (100px=80m)
For all datasets:
Create functions to generate performance plots as required by issue #24 .
Make a code which takes the 5 datasets and generated 5 imageDatastores with functionality to count labels, distributions and sample preview.
See also this MATLAB FileExchange code
Bag of features needs imageSet still!
Make list of issues related to the data preparation to satisfy the MATLAB datastore command requirements.
Label an image as belonging to a certain class if at least 75% of the image pixels are labelled of that class.
Generate sub-images (aka tiles or patches) and make them 5 datasets with 3 classes
Slum, BuiltUp, and NonBuiltUp each corresponding to the following tile grid sizes (use 1/4th of the size as step_size):
250 m = 417 px
200 m. = 333 px.
150 m. = 250 px.
100 m. = 167 px.
50 m. = 83 px.
vocabulary size 10
create BoVW
convert to table
vocabulary size 20
create BoVW
convert to table
vocabulary size 50
create BoVW
convert to table
Use the best linear SVM classifier trained and tested on Dataset6. i.e. px100m80, so tile size is 100x100 pixles (80x80) meters. Use the vocabulary size and mode (Detection/Grid) which produced the best validation results.
Function to generate tile image(s) from random location(s) of a class mask, such that at least 80% the pixels from the tile belong to the desired class. See nonSlumTiling.m
Script to and generate 10 random tiles per class
Publish
Script to test the tile class label prediction using the best pre-trained imageCategoryClassifier
on the 10 random tiles from each class
Run the prediction script
Publish
Tools for dataset preparation
Create BoVW using SURF (default feature extractor) for the 6 training datastores:
250 m = 417 px
200 m. = 333 px.
150 m. = 250 px.
100 m. = 167 px.
80 m. = 100 px.
50 m. = 83 px.
Instead of the papers approach we could use canny-edge detection + probabilistic hough lines and calculate the features from that:
Generate train, validation and test sub-image datastores for the 5 image datastores and save them as MAT files:
250 m = 417 px
200 m. = 333 px.
150 m. = 250 px.
100 m. = 167 px.
50 m. = 83 px.
Urban should be called built-up
and Rural = Non built up
Instead of the paper feature let's look at LSD Detector from opencv:
http://docs.opencv.org/3.1.0/df/dfa/tutorial_line_descriptor_main.html
Function to compute evaluation numbers given vectors of actual and predicted classes or a pre-computed confusion matrix
Function to evaluate the performance of a pre-trained classifier on a test feature set
(Unit-test) script to compute those for the test datasets
The new datastore should be of images/tiles of resolution 80m = 100px.
All slum pixels should be used to test of a tile contains at least 80% slum pixels. If yes => use for training, if no, check if it has at least 80% pixels of that class. If yes => use for training, if not- discard from training.
See how many slum tiles are obtained above. Randomly sample the same (remaining) number of pixels/80% tiles of the remaining 2 classes- BuiltUp and NonBuiltUp.
Create functions and scripts to do the 'slum conditional' tiling.
Generate training image tiles of the 3 classes.
Clean up by hand some of the bad training images.
Record the new dataset in the Excel sheet.
Feature is implemented in the GLCM source book.
It needs to be put in the python package.
Take care of tuple of windows.
Make a script for generating a final "segmented" result image by stitching tiles with a given class label.
Add a new Excel sheet to the
C:\Projects\DynaSlum\Results\Classification3Classes\PerformanceComparision\ClassifiersPerformance.odt
Make a function with unit test for tiling an image into different size tiles and assigning class labels to them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.