Giter Site home page Giter Site logo

tum-vision / fusenet Goto Github PK

View Code? Open in Web Editor NEW
126.0 18.0 40.0 8.46 MB

This repository is the official release of the code for the following paper "FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture" which is published at the 13th Asian Conference on Computer Vision (ACCV 2016).

License: Other

CMake 2.78% Makefile 0.70% C++ 80.65% Cuda 5.71% MATLAB 0.90% Python 8.80% Shell 0.39% Dockerfile 0.07%
caffemodel cnn-architecture

fusenet's Introduction

FuseNet

Please refer to PyTorch implementation for an up-to-date implementation.

FuseNet is developed as a general architecture for deep convolutional neural network (CNN) to train dataset with RGB-D images. It can be used for semantic segmentation, scene classification and other applications. This repository is an official release of this paper, and it is implemented based on the BVLC/caffe framework.

Usage

Installation

The code is compatible with an early Caffe version of June 2016. It is developed under Ubuntu 16.04 with CUDA 7.5 and cuDNN v5.0. If you use the program under other Ubuntu distributions, you may need to comment out line 72--73 in the root CMakeLists.txt file. If you compile under other OS, please use Google as your friend. We mostly test the program with Nvidia Titan X GPU. Please note multi-GPU training is supported.

git clone https://github.com/tum-vision/fusenet.git
mkdir build && cd build
cmake ..
make -j10
make runtest -j10

Training and Testing

We provide all needed python scripts and prototxt files to reproduce our published results under ./fusenet/. A short guideline is given below. For further detailed instructions, check here.

Initiazation

Our network architecture is based on VGGNet-16layer. However, since we have extra input channel for depth, we provide the compute the

Data preparation

To store dataset, we save paired RGB-D images into LMDB. We also scale the original depth image to the the range of [0, 255]. It is optional further cast the scaled depth value into unsigned char (grayscale), so as to save memory. If you do not want to lose precision, store the scaled depth as float. To prepare LMDB, we provide the following python scripts for your reference. However, you can also write your own image input layer to grab paired RGB-D images.

demo   ./fusenet/scripts/save_lmdb.py

LMDB shuffling

We support LMDB shuffling, and recommend to do shuffling after each epoch during training. To enable this option, flag shuffle to be true for the DataLayer in the prototxt. Note that we do not support shuffling with LevelDB.

demo   ./fusenet/segmentation/nyuv2_sf1/train.prototxt

Weighted cross-entropy loss

On common technique to handle class imbalance is to give the loss of each class a different weight, which typically has a higher value for less frequent classes and a lower value for more frequent class. For semantic segmentation, we support this loss weighting with the SoftmaxWithLossLayer by allowing user to specify a weight for each label. One way to set the weights is accordingly to the inverse class frequency (see our paper for detail). We provide the weights used in our paper in ./fusenet/data/.

Batch normalization

We use batch normalization after each convolution. This is supported by the Caffe BatchNormLayer. Notice that we add ScaleLayer after each BatchNormLayer.

Testing

To test the semantic segmentation performance, we provide the python scripts to calculate the global accuracy, average class accuracy and average intersection-over-union score. The implementation is based on confusion matrix.

demo   ./fusenet/scripts/test_segmentation.py

Released Caffemodel

Semantic Image Segmentation

The items marked with ticks are already available for downloading, otherwise they will be released soon. Unless otherwise stated, all models are finetuned from pretrained VGGNet-16Layer model. Stay tuned 🔥

NYUv2 40-class semantic segmentation

More information about the dataset, check here.

  • FuseNet-SF1:

    This model is trained with the FuseNet Sparse-Fusion1 architecture on 320x240 resolution. To obtain 640x480 full resolution, you can use bilinear upsample the segmenation or better with CRF refinement.

  • FuseNet-SF5:

    This model is trained the FuseNet SparseFusion5 architecture on 320x240 resolution. It gives 66.0% global pixelwise accuracy, 43.4% average classwise accuracy and 32.7% average classwise IoU.

SUN-RGBD 37-class semantic segmentation

More information about the dataset, check here.

  • FuseNet-SF5:

    This model is trained with 224x224 resolution. It gives 76.3% global pixelwise accuracy, 48.30% average classwise accuracy and 37.3% average classwise IoU,

Publication

If you use this code or our trainined model in your work, please consider cite the following paper.

Caner Hazirbas, Lingni Ma, Csaba Domokos and Daniel Cremers, "FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture", in proceedings of the 13th Asian Conference on Computer Vision, 2016. (pdf)

@inproceedings{fusenet2016accv,
 author    = "C. Hazirbas and L. Ma and C. Domokos and D. Cremers",
 title     = "FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture",
 booktitle = "Asian Conference on Computer Vision",
 year      = "2016",
 month     = "November",
}

License and Contact

BVLC/caffe is released under the BSD 2-Clause license. The modification to the original code is released under GNU General Public License Version 3 (GPLv3).

Contact Lingni Ma ✉️ for questions, comments and reporting bugs.

fusenet's People

Contributors

summericequeen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fusenet's Issues

where is the code in script?

@SummerIcequeen @hazirbas
I want to have a try with your codes, but I can't find the code you referred in the README. The /fusenet/scripts/save_lmdb.py and ./fusenet/scripts/test_segmentation.py, will you release the code, please?

How to process depth information ?

hello
I want to know how the depth channel pre-training model is generated, RGB channel and depth channel is called different pre-training model? Thanks.

get the lower performance

Can you release your evaluation code?
I have tried to train the network. But I got the lower iou and acc.
Thanks

calculate TP,FP and FN

Hello,
I faced up with a problem that the size of labeled Image in mat is 640480 while the size of predicted label Image is 320240. So how to calculate the TP, FP and FN using two images with different size .

Thanks

about caffemodel

Hi,
I have found the download_model_from_gist.sh in the "script" folder ,however I could not find the gist_id.
So I wonder if the caffemodels have been shared ? Or, where could I find the prototxt that discribes the hyper-parameter ?
Thanks.

Problem about dims in training

When I train the fusenet, I got the following error:

F0302 14:54:53.849881  9560 blob.cpp:32] Check failed: shape[i] >= 0 (-1 vs. 0) 
*** Check failure stack trace: ***
    @     0x7f208aedae3d  google::LogMessage::Fail()
    @     0x7f208aedcbc0  google::LogMessage::SendToLog()
    @     0x7f208aedaa23  google::LogMessage::Flush()
    @     0x7f208aedd58e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f208b4b662b  caffe::Blob<>::Reshape()
    @     0x7f208b578673  caffe::SliceLayer<>::Reshape()
    @     0x7f208b5e8d96  caffe::Net<>::Init()
    @     0x7f208b5ea5e1  caffe::Net<>::Net()
    @     0x7f208b432c0a  caffe::Solver<>::InitTrainNet()
I0302 14:54:53.852614  9567 db_lmdb.hpp:53] shuffle data, start epoch 3
    @     0x7f208b434017  caffe::Solver<>::Init()
    @     0x7f208b4343ba  caffe::Solver<>::Solver()
    @     0x7f208b4a1e83  caffe::Creator_SGDSolver<>()
    @           0x40b978  train()
    @           0x408618  main
    @     0x7f208992d830  (unknown)
    @           0x408d89  _start
Aborted (core dumped)

I know the question may be the dimension of net, so I print the size of blob:

top size : 3
/home/jason/Jason/Code/Semantic_Segmentation/fusenet/src/caffe/layers/slice_layer.cpp, 56
2, 3, 480, 640
/home/jason/Jason/Code/Semantic_Segmentation/fusenet/src/caffe/layers/slice_layer.cpp, 56
2, 1, 480, 640
/home/jason/Jason/Code/Semantic_Segmentation/fusenet/src/caffe/layers/slice_layer.cpp, 56
2, -1, 480, 640

So I think maybe I organize the data in a wrong way. I use the following list file to make lmdb dataset by order color images, depth images, label images, use the default convert_imagedataset offered by caffe:

00002_colors.png
00003_colors.png
...
00002_depth.png
00003_depth.png
...
00002_label.png
00003_label.png

And I wonder both label image and depth are one channel, how can the net classify them?

At the mean time , I am not clearly know how to use weighted cross-entropy, can you help me?

Thanks all the same.

Predict using C++

Hello,
When I use trained caffe model to predict with the c++ code changed according to the "cpp_classification" example in Caffe project, the result of this program is "blob",and the values are very small, kline 0.0009; Are there some lines need to be corrected?
Thanks.

`#include <caffe/caffe.hpp>
#ifdef USE_OPENCV
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#endif // USE_OPENCV
#include
#include
#include
#include
#include
#include

#ifdef USE_OPENCV
using namespace caffe; // NOLINT(build/namespaces)
using std::string;

class MyClassifier {
public:
MyClassifier(const string& model_file,
const string& trained_file,
const string& mean_file);
cv::Mat Predict(const cv::Mat& rgbImg, const cv::Mat& depthImg);
private:
void SetMean(const string& mean_file);
void WrapInputLayer(std::vectorcv::Mat* input_channels);
void Preprocess(const cv::Mat& rgbImg, const cv::Mat& depthImg,std::vectorcv::Mat* input_channels);

private:
shared_ptr<Net > net_;
cv::Size input_geometry_rgb;
cv::Size input_geometry_depth;
int num_channels_rgb;
int num_channels_depth;
cv::Mat mean_rgb;
cv::Mat mean_depth;
};

MyClassifier::MyClassifier(const string& model_file,
const string& trained_file,
const string& mean_file) {
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif

/* Load the network. */
net_.reset(new Net<float>(model_file, TEST));
net_->CopyTrainedLayersFrom(trained_file);

CHECK_EQ(net_->num_inputs(), 2) << "Network should have exactly two input.";
CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
Blob<float>* input_layer_rgb = net_->input_blobs()[0];
Blob<float>* input_layer_depth = net_->input_blobs()[1];
num_channels_rgb = input_layer_rgb->channels();
num_channels_depth = input_layer_depth->channels();
CHECK(num_channels_rgb == 3)
    << "RGB Input layer should have 3 channels.";
CHECK(num_channels_depth == 1)
<< "Depth Input layer should have 1 channels.";
input_geometry_rgb = cv::Size(input_layer_rgb->width(), input_layer_rgb->height());
input_geometry_depth = cv::Size(input_layer_depth->width(), input_layer_depth->height());

/* Load the binaryproto mean file. */
SetMean(mean_file);
}

/* Load the mean file in binaryproto format. */
void MyClassifier::SetMean(const string& mean_file) {
BlobProto blob_proto;
ReadProtoFromBinaryFileOrDie(mean_file.c_str(), &blob_proto);

/* Convert from BlobProto to Blob<float> */
Blob<float> mean_blob;
mean_blob.FromProto(blob_proto);
//mean prototxt include 1label channel: shoudl subtact 1
CHECK_EQ(mean_blob.channels()-1, (num_channels_rgb+num_channels_depth))
    << "Number of channels of mean file doesn't match input layer.";

/* The format of the mean file is planar cv_8uc4 float BGR + grayscale. */
std::vector<cv::Mat> channels_rgb;
std::vector<cv::Mat> channels_depth;
float* data = mean_blob.mutable_cpu_data();
for (int i = 0; i < (num_channels_rgb+num_channels_depth); ++i) {
    /* Extract an individual channel. */
    cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_8UC1, data);
    if(i<num_channels_rgb) {
        channels_rgb.push_back(channel);
    }else {
        channels_depth.push_back(channel);
    }
    data += mean_blob.height() * mean_blob.width();
}

/* Merge the separate channels into a single image. */
cv::Mat meanRgb,meanDepth;
cv::merge(channels_rgb, meanRgb);
cv::merge(channels_depth, meanDepth);

/* Compute the global mean pixel value and create a mean image
* filled with this value. */
cv::Scalar channel_mean_rgb = cv::mean(meanRgb);
cv::Scalar channel_mean_depth = cv::mean(meanDepth);
mean_rgb = cv::Mat(input_geometry_rgb, meanRgb.type(), channel_mean_rgb);
mean_depth = cv::Mat(input_geometry_depth, meanDepth.type(), channel_mean_depth);

}

cv::Mat MyClassifier::Predict(const cv::Mat& rgbImg, const cv::Mat& depthImg) {
Blob* input_layer_rgb = net_->input_blobs()[0];
Blob* input_layer_depth = net_->input_blobs()[1];

input_layer_rgb->Reshape(1, num_channels_rgb,
                   input_geometry_rgb.height, input_geometry_rgb.width);
input_layer_depth->Reshape(1, num_channels_depth,
                         input_geometry_depth.height, input_geometry_depth.width);
/* Forward dimension change to all layers. */
net_->Reshape();

std::vector<cv::Mat> input_channels;
WrapInputLayer(&input_channels);

Preprocess(rgbImg,depthImg, &input_channels);

net_->Forward();

/* Copy the output layer to a std::vector */
Blob<float>* output_layer = net_->output_blobs()[0];
const float * result=output_layer->cpu_data();
cv::Mat resultMat(output_layer->height(),output_layer->width(), CV_32FC1,(float *)result);
cv::Mat resultMatUC;
return resultMat;

}

void MyClassifier::WrapInputLayer(std::vectorcv::Mat* input_channels) {
Blob* input_layer_rgb = net_->input_blobs()[0];
Blob* input_layer_depth = net_->input_blobs()[1];

int width = input_layer_rgb->width();
int height = input_layer_rgb->height();
float* input_data = input_layer_rgb->mutable_cpu_data();
for (int i = 0; i < input_layer_rgb->channels(); ++i) {
    cv::Mat channel(height, width, CV_8UC3, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
}
width = input_layer_depth->width();
height = input_layer_depth->height();
input_data = input_layer_depth->mutable_cpu_data();
for (int i = 0; i < input_layer_depth->channels(); ++i) {
    cv::Mat channel(height, width, CV_8UC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
}

}

void MyClassifier::Preprocess(const cv::Mat& rgbImg, const cv::Mat& depthImg,
std::vectorcv::Mat* input_channels) {
/* Convert the input image to the input image format of the network. */
cv::Mat sample_rgb,sample_depth;
if (rgbImg.channels() == 4)
cv::cvtColor(rgbImg, sample_rgb, cv::COLOR_BGRA2BGR);
else if (rgbImg.channels() == 1 )
cv::cvtColor(rgbImg, sample_rgb, cv::COLOR_GRAY2BGR);
else
sample_rgb = rgbImg;
//depth image
if (depthImg.channels() == 4)
cv::cvtColor(depthImg, sample_depth, cv::COLOR_BGRA2GRAY);
else if (depthImg.channels() == 3 )
cv::cvtColor(depthImg, sample_depth, cv::COLOR_BGR2GRAY);
else if (depthImg.channels() == 1){
if(depthImg.type()!=CV_8UC1){
depthImg.convertTo(sample_depth,CV_8UC1,1.0/255);
}else {
sample_depth = depthImg;
}
}

cv::Mat sample_resized_rgb;
if (sample_rgb.size() != input_geometry_rgb)
    cv::resize(sample_rgb, sample_resized_rgb, input_geometry_rgb);
else
    sample_resized_rgb = sample_rgb;

cv::Mat sample_resized_depth;
if (sample_depth.size() != input_geometry_depth)
    cv::resize(sample_depth, sample_resized_depth, input_geometry_depth);
else
    sample_resized_depth = sample_depth;


cv::Mat sample_normalized_rgb,sample_normalized_depth;
cv::subtract(sample_resized_rgb, mean_rgb, sample_normalized_rgb);
cv::subtract(sample_resized_depth, mean_depth, sample_normalized_depth);

/* This operation will write the separate BGR planes directly to the
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized_rgb, *input_channels);
//depth image is 1 channel
input_channels->push_back(sample_resized_depth);

}

int main(int argc, char** argv) {
// if (argc != 6) {
// std::cerr << "Usage: " << argv[0]
// << " deploy.prototxt network.caffemodel"
// << " mean.binaryproto img.jpg depth.png" << std::endl;
// return 1;
// }

::google::InitGoogleLogging(argv[0]);

// string model_file = argv[1];//deploy.prototxt
// string trained_file = argv[2];//caffemodel
// string mean_file = argv[3];//mean.prototxt
// string file = argv[4];
// string fileDepth = argv[5];//mean.prototxt

string model_file="segmentation/nyu40-sf1/deploy.prototxt";
string trained_file="caffemodels_iter_75000.caffemodel";
string mean_file="db/mean/mean.prototxt";
string file="NYU0015/fullres/NYU0015.jpg";
string fileDepth="NYU0015/fullres/NYU0015.png";

MyClassifier classifier(model_file, trained_file, mean_file);


std::cout << "---------- Prediction for "
        << file << " ----------" << std::endl;

cv::Mat img = cv::imread(file, -1);
cv::Mat imgDepth = cv::imread(fileDepth, -1);
CHECK(!img.empty()) << "Unable to decode image " << file;
cv::Mat result=classifier.Predict(img,imgDepth);

// std::cout<<result.type() <<std::endl;
// std::cout<<result.cols <<std::endl;
// std::cout<<result.rows <<std::endl;
}
#else
int main(int argc, char** argv) {
LOG(FATAL) << "This example requires OpenCV; compile with USE_OPENCV.";
}
#endif // USE_OPENCV
`

Problem about preparing LMDB

Hi.
Could you provide the document to prepare LMDB(./fusenet/scripts/save_lmdb.py)?Because the default caffe lmdb writer does not accept 4-D input image.
Thanks very much!

Problem about LMDB with C++

Hello,
I am not sure whether this is the right way to create lmdb file to train fusenet:
I created a program to convert images to lmdb using c++ opencv. And I use the NYU dataset in SUNRGBD dataset. Firstly I load a rgb image file to a cv::Mat with 3 channels , load the corresponding depth file to a cv::mat with 1 channel(char) , create a label cv::Mat with 1 channel . By the way , all the cv::Mat has the same size. Then, I use the cv::merge function to merge all the three cv::Mat to a new cv::Mat with 5 channels. At last, I write it into lmdb file by converting the new cv::Mat to datum.
If that is the right way, I am faced up with another problem: how to find right value to fill the label cv:Mat ? The way I use is to analyze the corresponding index.json file of each rgb image and get an array of polygon, then get x and y values of each element of polygon array to create 2D points. And then I get an area for each object using drawContours (opencv function) in a mask Mat which is created for each polygon array element . The value filling in the area of the mask is the index of the corresponding label order in classes.txt( which I use the 40 classes).At last I add all the masks to one mask which is the label cv::Mat for an image. Is that right?
Hope for your response.

Depth normalization

Hi @tum-vision @hazirbas
In paper, it says the you normalize depth map to [0,255]
Does it mean to map 16bits [0,65535] to [0,255] or to map [min,max] of the depth map to [0,255] ?
By the way, did you normalize too when only training with depth ?

Thank you

Try to find save_lmdb.py

Hi, I can not find a folder called script in fusenet as in description.
Is there any way I can find it?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.