Giter Site home page Giter Site logo

tum-vision / fusenet Goto Github PK

View Code? Open in Web Editor NEW
126.0 18.0 40.0 8.46 MB

This repository is the official release of the code for the following paper "FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture" which is published at the 13th Asian Conference on Computer Vision (ACCV 2016).

License: Other

CMake 2.78% Makefile 0.70% C++ 80.65% Cuda 5.71% MATLAB 0.90% Python 8.80% Shell 0.39% Dockerfile 0.07%
caffemodel cnn-architecture

fusenet's Issues

Problem about preparing LMDB

Hi.
Could you provide the document to prepare LMDB(./fusenet/scripts/save_lmdb.py)?Because the default caffe lmdb writer does not accept 4-D input image.
Thanks very much!

calculate TP,FP and FN

Hello,
I faced up with a problem that the size of labeled Image in mat is 640480 while the size of predicted label Image is 320240. So how to calculate the TP, FP and FN using two images with different size .

Thanks

Try to find save_lmdb.py

Hi, I can not find a folder called script in fusenet as in description.
Is there any way I can find it?
Thank you.

Depth normalization

Hi @tum-vision @hazirbas
In paper, it says the you normalize depth map to [0,255]
Does it mean to map 16bits [0,65535] to [0,255] or to map [min,max] of the depth map to [0,255] ?
By the way, did you normalize too when only training with depth ?

Thank you

Predict using C++

Hello,
When I use trained caffe model to predict with the c++ code changed according to the "cpp_classification" example in Caffe project, the result of this program is "blob",and the values are very small, kline 0.0009; Are there some lines need to be corrected?
Thanks.

`#include <caffe/caffe.hpp>
#ifdef USE_OPENCV
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#endif // USE_OPENCV
#include
#include
#include
#include
#include
#include

#ifdef USE_OPENCV
using namespace caffe; // NOLINT(build/namespaces)
using std::string;

class MyClassifier {
public:
MyClassifier(const string& model_file,
const string& trained_file,
const string& mean_file);
cv::Mat Predict(const cv::Mat& rgbImg, const cv::Mat& depthImg);
private:
void SetMean(const string& mean_file);
void WrapInputLayer(std::vectorcv::Mat* input_channels);
void Preprocess(const cv::Mat& rgbImg, const cv::Mat& depthImg,std::vectorcv::Mat* input_channels);

private:
shared_ptr<Net > net_;
cv::Size input_geometry_rgb;
cv::Size input_geometry_depth;
int num_channels_rgb;
int num_channels_depth;
cv::Mat mean_rgb;
cv::Mat mean_depth;
};

MyClassifier::MyClassifier(const string& model_file,
const string& trained_file,
const string& mean_file) {
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif

/* Load the network. */
net_.reset(new Net<float>(model_file, TEST));
net_->CopyTrainedLayersFrom(trained_file);

CHECK_EQ(net_->num_inputs(), 2) << "Network should have exactly two input.";
CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
Blob<float>* input_layer_rgb = net_->input_blobs()[0];
Blob<float>* input_layer_depth = net_->input_blobs()[1];
num_channels_rgb = input_layer_rgb->channels();
num_channels_depth = input_layer_depth->channels();
CHECK(num_channels_rgb == 3)
    << "RGB Input layer should have 3 channels.";
CHECK(num_channels_depth == 1)
<< "Depth Input layer should have 1 channels.";
input_geometry_rgb = cv::Size(input_layer_rgb->width(), input_layer_rgb->height());
input_geometry_depth = cv::Size(input_layer_depth->width(), input_layer_depth->height());

/* Load the binaryproto mean file. */
SetMean(mean_file);
}

/* Load the mean file in binaryproto format. */
void MyClassifier::SetMean(const string& mean_file) {
BlobProto blob_proto;
ReadProtoFromBinaryFileOrDie(mean_file.c_str(), &blob_proto);

/* Convert from BlobProto to Blob<float> */
Blob<float> mean_blob;
mean_blob.FromProto(blob_proto);
//mean prototxt include 1label channel: shoudl subtact 1
CHECK_EQ(mean_blob.channels()-1, (num_channels_rgb+num_channels_depth))
    << "Number of channels of mean file doesn't match input layer.";

/* The format of the mean file is planar cv_8uc4 float BGR + grayscale. */
std::vector<cv::Mat> channels_rgb;
std::vector<cv::Mat> channels_depth;
float* data = mean_blob.mutable_cpu_data();
for (int i = 0; i < (num_channels_rgb+num_channels_depth); ++i) {
    /* Extract an individual channel. */
    cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_8UC1, data);
    if(i<num_channels_rgb) {
        channels_rgb.push_back(channel);
    }else {
        channels_depth.push_back(channel);
    }
    data += mean_blob.height() * mean_blob.width();
}

/* Merge the separate channels into a single image. */
cv::Mat meanRgb,meanDepth;
cv::merge(channels_rgb, meanRgb);
cv::merge(channels_depth, meanDepth);

/* Compute the global mean pixel value and create a mean image
* filled with this value. */
cv::Scalar channel_mean_rgb = cv::mean(meanRgb);
cv::Scalar channel_mean_depth = cv::mean(meanDepth);
mean_rgb = cv::Mat(input_geometry_rgb, meanRgb.type(), channel_mean_rgb);
mean_depth = cv::Mat(input_geometry_depth, meanDepth.type(), channel_mean_depth);

}

cv::Mat MyClassifier::Predict(const cv::Mat& rgbImg, const cv::Mat& depthImg) {
Blob* input_layer_rgb = net_->input_blobs()[0];
Blob* input_layer_depth = net_->input_blobs()[1];

input_layer_rgb->Reshape(1, num_channels_rgb,
                   input_geometry_rgb.height, input_geometry_rgb.width);
input_layer_depth->Reshape(1, num_channels_depth,
                         input_geometry_depth.height, input_geometry_depth.width);
/* Forward dimension change to all layers. */
net_->Reshape();

std::vector<cv::Mat> input_channels;
WrapInputLayer(&input_channels);

Preprocess(rgbImg,depthImg, &input_channels);

net_->Forward();

/* Copy the output layer to a std::vector */
Blob<float>* output_layer = net_->output_blobs()[0];
const float * result=output_layer->cpu_data();
cv::Mat resultMat(output_layer->height(),output_layer->width(), CV_32FC1,(float *)result);
cv::Mat resultMatUC;
return resultMat;

}

void MyClassifier::WrapInputLayer(std::vectorcv::Mat* input_channels) {
Blob* input_layer_rgb = net_->input_blobs()[0];
Blob* input_layer_depth = net_->input_blobs()[1];

int width = input_layer_rgb->width();
int height = input_layer_rgb->height();
float* input_data = input_layer_rgb->mutable_cpu_data();
for (int i = 0; i < input_layer_rgb->channels(); ++i) {
    cv::Mat channel(height, width, CV_8UC3, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
}
width = input_layer_depth->width();
height = input_layer_depth->height();
input_data = input_layer_depth->mutable_cpu_data();
for (int i = 0; i < input_layer_depth->channels(); ++i) {
    cv::Mat channel(height, width, CV_8UC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
}

}

void MyClassifier::Preprocess(const cv::Mat& rgbImg, const cv::Mat& depthImg,
std::vectorcv::Mat* input_channels) {
/* Convert the input image to the input image format of the network. */
cv::Mat sample_rgb,sample_depth;
if (rgbImg.channels() == 4)
cv::cvtColor(rgbImg, sample_rgb, cv::COLOR_BGRA2BGR);
else if (rgbImg.channels() == 1 )
cv::cvtColor(rgbImg, sample_rgb, cv::COLOR_GRAY2BGR);
else
sample_rgb = rgbImg;
//depth image
if (depthImg.channels() == 4)
cv::cvtColor(depthImg, sample_depth, cv::COLOR_BGRA2GRAY);
else if (depthImg.channels() == 3 )
cv::cvtColor(depthImg, sample_depth, cv::COLOR_BGR2GRAY);
else if (depthImg.channels() == 1){
if(depthImg.type()!=CV_8UC1){
depthImg.convertTo(sample_depth,CV_8UC1,1.0/255);
}else {
sample_depth = depthImg;
}
}

cv::Mat sample_resized_rgb;
if (sample_rgb.size() != input_geometry_rgb)
    cv::resize(sample_rgb, sample_resized_rgb, input_geometry_rgb);
else
    sample_resized_rgb = sample_rgb;

cv::Mat sample_resized_depth;
if (sample_depth.size() != input_geometry_depth)
    cv::resize(sample_depth, sample_resized_depth, input_geometry_depth);
else
    sample_resized_depth = sample_depth;


cv::Mat sample_normalized_rgb,sample_normalized_depth;
cv::subtract(sample_resized_rgb, mean_rgb, sample_normalized_rgb);
cv::subtract(sample_resized_depth, mean_depth, sample_normalized_depth);

/* This operation will write the separate BGR planes directly to the
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized_rgb, *input_channels);
//depth image is 1 channel
input_channels->push_back(sample_resized_depth);

}

int main(int argc, char** argv) {
// if (argc != 6) {
// std::cerr << "Usage: " << argv[0]
// << " deploy.prototxt network.caffemodel"
// << " mean.binaryproto img.jpg depth.png" << std::endl;
// return 1;
// }

::google::InitGoogleLogging(argv[0]);

// string model_file = argv[1];//deploy.prototxt
// string trained_file = argv[2];//caffemodel
// string mean_file = argv[3];//mean.prototxt
// string file = argv[4];
// string fileDepth = argv[5];//mean.prototxt

string model_file="segmentation/nyu40-sf1/deploy.prototxt";
string trained_file="caffemodels_iter_75000.caffemodel";
string mean_file="db/mean/mean.prototxt";
string file="NYU0015/fullres/NYU0015.jpg";
string fileDepth="NYU0015/fullres/NYU0015.png";

MyClassifier classifier(model_file, trained_file, mean_file);


std::cout << "---------- Prediction for "
        << file << " ----------" << std::endl;

cv::Mat img = cv::imread(file, -1);
cv::Mat imgDepth = cv::imread(fileDepth, -1);
CHECK(!img.empty()) << "Unable to decode image " << file;
cv::Mat result=classifier.Predict(img,imgDepth);

// std::cout<<result.type() <<std::endl;
// std::cout<<result.cols <<std::endl;
// std::cout<<result.rows <<std::endl;
}
#else
int main(int argc, char** argv) {
LOG(FATAL) << "This example requires OpenCV; compile with USE_OPENCV.";
}
#endif // USE_OPENCV
`

Problem about dims in training

When I train the fusenet, I got the following error:

F0302 14:54:53.849881  9560 blob.cpp:32] Check failed: shape[i] >= 0 (-1 vs. 0) 
*** Check failure stack trace: ***
    @     0x7f208aedae3d  google::LogMessage::Fail()
    @     0x7f208aedcbc0  google::LogMessage::SendToLog()
    @     0x7f208aedaa23  google::LogMessage::Flush()
    @     0x7f208aedd58e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f208b4b662b  caffe::Blob<>::Reshape()
    @     0x7f208b578673  caffe::SliceLayer<>::Reshape()
    @     0x7f208b5e8d96  caffe::Net<>::Init()
    @     0x7f208b5ea5e1  caffe::Net<>::Net()
    @     0x7f208b432c0a  caffe::Solver<>::InitTrainNet()
I0302 14:54:53.852614  9567 db_lmdb.hpp:53] shuffle data, start epoch 3
    @     0x7f208b434017  caffe::Solver<>::Init()
    @     0x7f208b4343ba  caffe::Solver<>::Solver()
    @     0x7f208b4a1e83  caffe::Creator_SGDSolver<>()
    @           0x40b978  train()
    @           0x408618  main
    @     0x7f208992d830  (unknown)
    @           0x408d89  _start
Aborted (core dumped)

I know the question may be the dimension of net, so I print the size of blob:

top size : 3
/home/jason/Jason/Code/Semantic_Segmentation/fusenet/src/caffe/layers/slice_layer.cpp, 56
2, 3, 480, 640
/home/jason/Jason/Code/Semantic_Segmentation/fusenet/src/caffe/layers/slice_layer.cpp, 56
2, 1, 480, 640
/home/jason/Jason/Code/Semantic_Segmentation/fusenet/src/caffe/layers/slice_layer.cpp, 56
2, -1, 480, 640

So I think maybe I organize the data in a wrong way. I use the following list file to make lmdb dataset by order color images, depth images, label images, use the default convert_imagedataset offered by caffe:

00002_colors.png
00003_colors.png
...
00002_depth.png
00003_depth.png
...
00002_label.png
00003_label.png

And I wonder both label image and depth are one channel, how can the net classify them?

At the mean time , I am not clearly know how to use weighted cross-entropy, can you help me?

Thanks all the same.

Problem about LMDB with C++

Hello,
I am not sure whether this is the right way to create lmdb file to train fusenet:
I created a program to convert images to lmdb using c++ opencv. And I use the NYU dataset in SUNRGBD dataset. Firstly I load a rgb image file to a cv::Mat with 3 channels , load the corresponding depth file to a cv::mat with 1 channel(char) , create a label cv::Mat with 1 channel . By the way , all the cv::Mat has the same size. Then, I use the cv::merge function to merge all the three cv::Mat to a new cv::Mat with 5 channels. At last, I write it into lmdb file by converting the new cv::Mat to datum.
If that is the right way, I am faced up with another problem: how to find right value to fill the label cv:Mat ? The way I use is to analyze the corresponding index.json file of each rgb image and get an array of polygon, then get x and y values of each element of polygon array to create 2D points. And then I get an area for each object using drawContours (opencv function) in a mask Mat which is created for each polygon array element . The value filling in the area of the mask is the index of the corresponding label order in classes.txt( which I use the 40 classes).At last I add all the masks to one mask which is the label cv::Mat for an image. Is that right?
Hope for your response.

where is the code in script?

@SummerIcequeen @hazirbas
I want to have a try with your codes, but I can't find the code you referred in the README. The /fusenet/scripts/save_lmdb.py and ./fusenet/scripts/test_segmentation.py, will you release the code, please?

about caffemodel

Hi,
I have found the download_model_from_gist.sh in the "script" folder ,however I could not find the gist_id.
So I wonder if the caffemodels have been shared ? Or, where could I find the prototxt that discribes the hyper-parameter ?
Thanks.

get the lower performance

Can you release your evaluation code?
I have tried to train the network. But I got the lower iou and acc.
Thanks

How to process depth information ?

hello
I want to know how the depth channel pre-training model is generated, RGB channel and depth channel is called different pre-training model? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.