thusiyuan / cooperative_scene_parsing Goto Github PK

Code for NeurIPS 2018: Cooperative Holisctic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Home Page: http://siyuanhuang.com/cooperative_parsing/main.html

License: MIT License

Python 45.30% MATLAB 52.09% C 2.13% M 0.07% Forth 0.05% Shell 0.36%

cooperative_scene_parsing's People

Contributors

Stargazers

Watchers

Forkers

peterzs peterzhousz hyzcn ybyangjing wellowdata aluo-x canflyzhou ahren09 blayuan youngadvance evenhax forest-bai jerrypiglet ivan-vv hiyyg sabadijou caoxingwen mspercieve smallxxxxxx

cooperative_scene_parsing's Issues

download link repeated

Hi, thank you for the great work! It seems that two download links for the preprocessed SUNRGBD dataset and GT in this github repo are the same. What's the correct download link for the dataset?

missing visualization functions

Hey, could you also provide the code for the visualization functions: show_3dpointcloud, show_3dpointcloud_aligned and show_2dcorner. It will be helpful for debugging purpose! Thank you!

Unable to download preprocess dataset

I followed the instructions from readme and try to download the preprocessed sunrgbd dataset from this link: https://drive.google.com/file/d/1QUbq7fRtJtBPkSJbIsZOTwYR5MwtZuiV/view, but it seems that the file is no longer available, do you know why?

3Dlayout and updated Rtilt

Thank you for sharing the codebase.
I am a bit confused about how to modify the 16 dimensional layout parsed from SUNRGBD index.json (e.g., as used in evaluation/vis) to your 3Dlayout cuboid in mat format. Would you be able to point me to the code that performs this transformation.
Also, is the updated Rtilt variable related to this transformation? Could you clarify, and possibly share pointers to the code that outputs this value.

Regards.

Test on images outside SUNRGBD dataset

Hey, I test the model and it works well in the SUNRGBD dataset! Thank you for sharing it. Could you please give any hint about how to apply the model to images outside SUNRGBD dataset? Should I generate a pickle file following sunrgbd_process.py? Do we need any input other than the RGB image, e.g., the camera intrinsics?

KeyError: 'seg2d'

Thanks for your great work!
I get the error when I run the sunrgbd/sunrgbd_process.py in step 4 of Data.
Traceback (most recent call last): File "preprocess/sunrgbd/sunrgbd_process.py", line 666, in <module> main() File "preprocess/sunrgbd/sunrgbd_process.py", line 660, in main prepare_data(False, shift=False) File "preprocess/sunrgbd/sunrgbd_process.py", line 75, in prepare_data sequence = readsunrgbdframe(image_id=i+1) File "/home/data4t/wyf/cooperative_scene_parsing/preprocess/sunrgbd/sunrgbd_parser.py", line 136, in readsunrgbdframe data_frame = SUNRGBDData(img_info['K'], img_info['R_ex'], img_info['R_tilt'], img_info['bdb2d'], img_info['bdb3d'], img_info['gt3dcorner'], img_info['imgdepth'], img_info['imgrgb'], img_info['seg2d'], img_info['sequence_name'], image_id, scene_category) KeyError: 'seg2d'

Then I print the keys of img_info and get the result as follows.
['R_ex', 'seg2d_path', 'sensor', 'sequence_name', 'imgrgb_path', 'gt3dcorner', 'R_tilt', 'bdb2d', 'bdb3d', 'K', 'imgdepth_path']
It seems that the dict object img_info doesn't have a key called "seg2d". But I have completed the previous three steps according to the instruction
Could you please tell me what I should do to solve this problem? Thank you very much!

hard-coded path in the pickle file

You have some hard-coded path in the pickle files, e.g., in this line: https://github.com/thusiyuan/cooperative_scene_parsing/blob/master/preprocess/sunrgbd/sunrgbd_parser.py#L116

img_info['imgrgb_path']
'/home/siyuan/Documents/Dataset/SUNRGBD_ALL/SUNRGBD/kv2/kinect2data/000002_2014-05-26_14-23-37_260595134347_rgbf000103-resize/image/0000103.jpg'

img_info['imgdepth_path']
'/home/siyuan/Documents/Dataset/SUNRGBD_ALL/SUNRGBD/kv2/kinect2data/000002_2014-05-26_14-23-37_260595134347_rgbf000103-resize/depth/0000103.png'

Could you add your code to generate the pickle files as well? Also, it seems in your preprocessed data, there is no depth images, should I download the original SUNRGBD dataset as well?

Any plan to update to python3?

It is 2019 now, and python2 will not be maintained after 2020.
I wish we could have a code with python3
Best wishes!!

Error: [Errno 2] No such file or directory: 'metadata/sunrgbd/train.json'

First, thx for your wonderful work.

I couldn't find the train.json file when compiling the code.The error was reported as follows:

Traceback (most recent call last):
File "train.py", line 89, in
train_loader = sunrgbd_train_loader(opt)
File "/home/yd/cooperative/data/sunrgbd.py", line 134, in sunrgbd_train_loader
return DataLoader(dataset=SUNRGBDDataset(op.join(opt.metadataPath, opt.dataset, 'train.json'), random_flip=True, random_shift=False),
File "/home/yd/cooperative/data/sunrgbd.py", line 34, in init
with open(list_file, 'r') as f:
IOError: [Errno 2] No such file or directory: 'metadata/sunrgbd/train.json'

How do I get a train.json file?

Thanks in advance.

FileNotFoundError: 'metadata/sunrgbd/size_avg_category.pickle'

While exploring this project i just could not find following error:

FileNotFoundError: [Errno 2] No such file or directory: 'metadata/sunrgbd/size_avg_category.pickle'

Full Traceback message is given below:

Traceback (most recent call last):
File "test.py", line 53, in
bins_tensor = to_dict_tensor(dataset_config.bins(), if_cuda=opt.cuda)
File "cooperative_scene_parsing-master/config.py", line 74, in bins
avg_size = pickle.load(open(template_path, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'metadata/sunrgbd/size_avg_category.pickle'

Any help to solve above issues is highly appreciated.
Thank You

What's the source of seg2d (input of 'process_msk' in data processing)?

Hi Siyuan:

Thanks for sharing the code of your work!

I noticed that the 2D detector and the data cleaning code for generating the pickle files are not included in this repo. I read into the processing code for the cleaned data to find that the function 'process_msk' utilizes the 2dbbox from the detector (is it so?) and semantic segmentation GT to get reasonable masks from the candidates.

However, the candidates are drawn from polygon input 'seg2d' whose source is not known. Is it from the SUNRGBD dataset? Or is it from the output of the 2D detector?

Is this project including 2D detector?

if not, are you going to open source your 2d detector code?

local variable 'corner_loss' referenced before assignment

Hi, Siyuan
When I train the bdbnet from scratch by
sh scripts/sunrgbd_train_bdbnet.sh, it reminded me that
.
Thank u !
Yongjie

Are ground-truth files in metadata preprocessed

Hi!

Most of the code uses mat/pickle files present in the metadata/sunrgbd/ folder as ground truth. A few examples of the files being used are:

metadata/sunrgbd/Dataset/data_clean/data_all/ in sunrgbd_parser.py
metadata/sunrgbd/2dbdb/ in sunrgbd_process.py
metadata/sunrgbd/3dlayout/ in sunrgbd_process.py

Are they simply a reorganization of the ground truth data (SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat, etc.) provided by the SUNRGBD dataset, or have you performed any additional processing?

Thanks,
Shubham

Preprocessed data for SUNCG

Hey, thank you for your support on this code so far! Do you mind sharing the SUNCG preprocessed data or the code to preprocess the dataset? It will be super helpful if you can release it in the same way as SUNRGBD preprocessed data. Thanks!

Link of raw data of sunrgbd has broken.

Link of raw data of sunrgbd has broken. Is the raw data of sunrgbd downloaded from the http://rgbd.cs.princeton.edu/challenge.html?

Thank the great work you have done!

Image flipping

Hello Siyuan,

First of all, thanks so much for your work. I learned a lot from reading your paper and code.

My understanding is that each 3D bounding box is parameterized by 3 basis vectors, 3 coefficients, and a 3D centroid. Theses parameters define the 3D bounding box in the world coordinate system. The extrinsic camera matrix R is the transformation from the world coordinate system to the camera coordinate system, and therefore from p_homo = K * R * P, we can recover 2D image coordinates p_homo from the bounding box corner P in the world space.

If my understanding is correct, when we perform image flipping in dataset preprocessing, we have to flip the 3D bounding box labels in the camera coordinate system, instead of the world coordinate system. However, at this line and this line, it appears to me that you are doing it in the world coordinate system directlty.

This sometimes lead to some errors. From my observation, changing the logic to the following can reduce such errors:

        # read camera parameters                                                                                                                                                                            
        K = self.meta['K'][idx]                                                                                                                                                                             
        R = self.meta['R'][idx]                                                                                                                                                                             
        yaw, pitch, roll = yaw_pitch_row_from_r(R)                                                                                                                                                          
        if flip:                                                                                                                                                                                            
            R_old = R                                                                                                                                                                                       
            R = get_rotation_matrix_from_yaw_pitch_roll(-yaw, pitch, roll)                                                                                                                                  
        else:                                                                                                                                                                                               
            R = get_rotation_matrix_from_yaw_pitch_roll(yaw, pitch, roll)                                                                                                                                   
        # read 3D bounding boxes                                                                                                                                                                            
        num_boxes = len(self.meta['boxes'][idx])                                                                                                                                                            
        raw_basis = np.array([self.meta['boxes'][idx][i]['basis'] for i in range(num_boxes)])                                                                                                               
        raw_coeffs = np.array([self.meta['boxes'][idx][i]['coeffs'] for i in range(num_boxes)])                                                                                                             
        raw_centroid = np.array([self.meta['boxes'][idx][i]['centroid'] for i in range(num_boxes)])                                                                                                         
        if flip:                                                                                                                                                                                            
            for i in range(num_boxes):                                                                                                                                                                      
                # get 3D corners in the world space                                                                                                                                                         
                corners3d = get_corners_of_bb3d_no_index(raw_basis[i],                                                                                                                                      
                                                         raw_coeffs[i],                                                                                                                                     
                                                         raw_centroid[i])                                                                                                                                   
                # get 3D corners in the camera space                                                                                                                                                        
                corners3d = np.matmul(R_old, corners3d.transpose()).transpose()                                                                                                                             
                # flip x axis                                                                                                                                                                               
                corners3d[:, 0] = -corners3d[:, 0]                                                                                                                                                          
                # get 3D corners back in world space                                                                                                                                                        
                corners3d = np.matmul(R.transpose(), corners3d.transpose()).transpose()                                                                                                                     
                # extract centroid, basis, and coeffs from 3D corners                                                                                                                                       
                raw_centroid[i] = corners3d.mean(axis=0)                                                                                                                                                    
                b0_with_scale = (corners3d[1] - corners3d[0]) / 2                                                                                                                                           
                c0 = np.linalg.norm(b0_with_scale)                                                                                                                                                          
                b0 = b0_with_scale / c0                                                                                                                                                                     
                b1_with_scale = (corners3d[1] - corners3d[2]) / 2                                                                                                                                           
                c1 = np.linalg.norm(b1_with_scale)                                                                                                                                                          
                b1 = b1_with_scale / c1                                                                                                                                                                     
                b2_with_scale = (corners3d[1] - corners3d[5]) / 2                                                                                                                                           
                c2 = np.linalg.norm(b2_with_scale)                                                                                                                                                          
                b2 = b2_with_scale / c2                                                                                                                                                                     
                raw_basis[i, 0] = -b0 # flip basis 0                                                                                                                                                        
                raw_basis[i, 1] = b1                                                                                                                                                                        
                # keep b2 as [0, -1, 0] to avoid numerical issues                                                                                                                                           
                raw_coeffs[i] = [-c0, c1, c2]

Looking forward to discussing this with you!

Pretrained weights link is broken

Please check the link to pretrained weights

unable to unzip 'preprocessed ground truth of SUNRGBD dataset'

First, thx for your wonderful work.

When i try to unzip the 'preprocessed ground truth of SUNRGBD dataset' from this link https://drive.google.com/file/d/1QUbq7fRtJtBPkSJbIsZOTwYR5MwtZuiV/view as you mentioned in the main page, it outputs:

checkdir error: 2dbdb exists but is not directory
unable to process 2dbdb/image/3413.png.
checkdir error: 2dbdb exists but is not directory
unable to process 2dbdb/image/3411.png.
checkdir error: 2dbdb exists but is not directory
unable to process 2dbdb/image/3398.png.

Do you know why this happen? I am not the only one meet this problem. See the last comment from #4.

Thanks in advance.

Pretrained weights link not working

Please update the Google Drive weights link