Giter Site home page Giter Site logo

thusiyuan / cooperative_scene_parsing Goto Github PK

View Code? Open in Web Editor NEW
101.0 101.0 19.0 24.47 MB

Code for NeurIPS 2018: Cooperative Holisctic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Home Page: http://siyuanhuang.com/cooperative_parsing/main.html

License: MIT License

Python 45.30% MATLAB 52.09% C 2.13% M 0.07% Forth 0.05% Shell 0.36%

cooperative_scene_parsing's People

Contributors

thusiyuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cooperative_scene_parsing's Issues

download link repeated

Hi, thank you for the great work! It seems that two download links for the preprocessed SUNRGBD dataset and GT in this github repo are the same. What's the correct download link for the dataset?

3Dlayout and updated Rtilt

Thank you for sharing the codebase.
I am a bit confused about how to modify the 16 dimensional layout parsed from SUNRGBD index.json (e.g., as used in evaluation/vis) to your 3Dlayout cuboid in mat format. Would you be able to point me to the code that performs this transformation.
Also, is the updated Rtilt variable related to this transformation? Could you clarify, and possibly share pointers to the code that outputs this value.

Regards.

Test on images outside SUNRGBD dataset

Hey, I test the model and it works well in the SUNRGBD dataset! Thank you for sharing it. Could you please give any hint about how to apply the model to images outside SUNRGBD dataset? Should I generate a pickle file following sunrgbd_process.py? Do we need any input other than the RGB image, e.g., the camera intrinsics?

KeyError: 'seg2d'

Thanks for your great work!
I get the error when I run the sunrgbd/sunrgbd_process.py in step 4 of Data.
Traceback (most recent call last): File "preprocess/sunrgbd/sunrgbd_process.py", line 666, in <module> main() File "preprocess/sunrgbd/sunrgbd_process.py", line 660, in main prepare_data(False, shift=False) File "preprocess/sunrgbd/sunrgbd_process.py", line 75, in prepare_data sequence = readsunrgbdframe(image_id=i+1) File "/home/data4t/wyf/cooperative_scene_parsing/preprocess/sunrgbd/sunrgbd_parser.py", line 136, in readsunrgbdframe data_frame = SUNRGBDData(img_info['K'], img_info['R_ex'], img_info['R_tilt'], img_info['bdb2d'], img_info['bdb3d'], img_info['gt3dcorner'], img_info['imgdepth'], img_info['imgrgb'], img_info['seg2d'], img_info['sequence_name'], image_id, scene_category) KeyError: 'seg2d'

Then I print the keys of img_info and get the result as follows.
['R_ex', 'seg2d_path', 'sensor', 'sequence_name', 'imgrgb_path', 'gt3dcorner', 'R_tilt', 'bdb2d', 'bdb3d', 'K', 'imgdepth_path']
It seems that the dict object img_info doesn't have a key called "seg2d". But I have completed the previous three steps according to the instruction
Could you please tell me what I should do to solve this problem? Thank you very much!

hard-coded path in the pickle file

You have some hard-coded path in the pickle files, e.g., in this line: https://github.com/thusiyuan/cooperative_scene_parsing/blob/master/preprocess/sunrgbd/sunrgbd_parser.py#L116

img_info['imgrgb_path']
'/home/siyuan/Documents/Dataset/SUNRGBD_ALL/SUNRGBD/kv2/kinect2data/000002_2014-05-26_14-23-37_260595134347_rgbf000103-resize/image/0000103.jpg'

img_info['imgdepth_path']
'/home/siyuan/Documents/Dataset/SUNRGBD_ALL/SUNRGBD/kv2/kinect2data/000002_2014-05-26_14-23-37_260595134347_rgbf000103-resize/depth/0000103.png'

Could you add your code to generate the pickle files as well? Also, it seems in your preprocessed data, there is no depth images, should I download the original SUNRGBD dataset as well?

Any plan to update to python3?

It is 2019 now, and python2 will not be maintained after 2020.
I wish we could have a code with python3
Best wishes!!

Error: [Errno 2] No such file or directory: 'metadata/sunrgbd/train.json'

First, thx for your wonderful work.

I couldn't find the train.json file when compiling the code.The error was reported as follows:

Traceback (most recent call last):
File "train.py", line 89, in
train_loader = sunrgbd_train_loader(opt)
File "/home/yd/cooperative/data/sunrgbd.py", line 134, in sunrgbd_train_loader
return DataLoader(dataset=SUNRGBDDataset(op.join(opt.metadataPath, opt.dataset, 'train.json'), random_flip=True, random_shift=False),
File "/home/yd/cooperative/data/sunrgbd.py", line 34, in init
with open(list_file, 'r') as f:
IOError: [Errno 2] No such file or directory: 'metadata/sunrgbd/train.json'

How do I get a train.json file?

Thanks in advance.

FileNotFoundError: 'metadata/sunrgbd/size_avg_category.pickle'

While exploring this project i just could not find following error:

FileNotFoundError: [Errno 2] No such file or directory: 'metadata/sunrgbd/size_avg_category.pickle' 

Full Traceback message is given below:

Traceback (most recent call last):
File "test.py", line 53, in
bins_tensor = to_dict_tensor(dataset_config.bins(), if_cuda=opt.cuda)
File "cooperative_scene_parsing-master/config.py", line 74, in bins
avg_size = pickle.load(open(template_path, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'metadata/sunrgbd/size_avg_category.pickle'

Any help to solve above issues is highly appreciated.
Thank You

What's the source of seg2d (input of 'process_msk' in data processing)?

Hi Siyuan:

Thanks for sharing the code of your work!

I noticed that the 2D detector and the data cleaning code for generating the pickle files are not included in this repo. I read into the processing code for the cleaned data to find that the function 'process_msk' utilizes the 2dbbox from the detector (is it so?) and semantic segmentation GT to get reasonable masks from the candidates.

However, the candidates are drawn from polygon input 'seg2d' whose source is not known. Is it from the SUNRGBD dataset? Or is it from the output of the 2D detector?

Are ground-truth files in metadata preprocessed

Hi!

Most of the code uses mat/pickle files present in the metadata/sunrgbd/ folder as ground truth. A few examples of the files being used are:

  • metadata/sunrgbd/Dataset/data_clean/data_all/ in sunrgbd_parser.py
  • metadata/sunrgbd/2dbdb/ in sunrgbd_process.py
  • metadata/sunrgbd/3dlayout/ in sunrgbd_process.py

Are they simply a reorganization of the ground truth data (SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat, etc.) provided by the SUNRGBD dataset, or have you performed any additional processing?

Thanks,
Shubham

Preprocessed data for SUNCG

Hey, thank you for your support on this code so far! Do you mind sharing the SUNCG preprocessed data or the code to preprocess the dataset? It will be super helpful if you can release it in the same way as SUNRGBD preprocessed data. Thanks!

Image flipping

Hello Siyuan,

First of all, thanks so much for your work. I learned a lot from reading your paper and code.

My understanding is that each 3D bounding box is parameterized by 3 basis vectors, 3 coefficients, and a 3D centroid. Theses parameters define the 3D bounding box in the world coordinate system. The extrinsic camera matrix R is the transformation from the world coordinate system to the camera coordinate system, and therefore from p_homo = K * R * P, we can recover 2D image coordinates p_homo from the bounding box corner P in the world space.

If my understanding is correct, when we perform image flipping in dataset preprocessing, we have to flip the 3D bounding box labels in the camera coordinate system, instead of the world coordinate system. However, at this line and this line, it appears to me that you are doing it in the world coordinate system directlty.

This sometimes lead to some errors. From my observation, changing the logic to the following can reduce such errors:

        # read camera parameters                                                                                                                                                                            
        K = self.meta['K'][idx]                                                                                                                                                                             
        R = self.meta['R'][idx]                                                                                                                                                                             
        yaw, pitch, roll = yaw_pitch_row_from_r(R)                                                                                                                                                          
        if flip:                                                                                                                                                                                            
            R_old = R                                                                                                                                                                                       
            R = get_rotation_matrix_from_yaw_pitch_roll(-yaw, pitch, roll)                                                                                                                                  
        else:                                                                                                                                                                                               
            R = get_rotation_matrix_from_yaw_pitch_roll(yaw, pitch, roll)                                                                                                                                   
        # read 3D bounding boxes                                                                                                                                                                            
        num_boxes = len(self.meta['boxes'][idx])                                                                                                                                                            
        raw_basis = np.array([self.meta['boxes'][idx][i]['basis'] for i in range(num_boxes)])                                                                                                               
        raw_coeffs = np.array([self.meta['boxes'][idx][i]['coeffs'] for i in range(num_boxes)])                                                                                                             
        raw_centroid = np.array([self.meta['boxes'][idx][i]['centroid'] for i in range(num_boxes)])                                                                                                         
        if flip:                                                                                                                                                                                            
            for i in range(num_boxes):                                                                                                                                                                      
                # get 3D corners in the world space                                                                                                                                                         
                corners3d = get_corners_of_bb3d_no_index(raw_basis[i],                                                                                                                                      
                                                         raw_coeffs[i],                                                                                                                                     
                                                         raw_centroid[i])                                                                                                                                   
                # get 3D corners in the camera space                                                                                                                                                        
                corners3d = np.matmul(R_old, corners3d.transpose()).transpose()                                                                                                                             
                # flip x axis                                                                                                                                                                               
                corners3d[:, 0] = -corners3d[:, 0]                                                                                                                                                          
                # get 3D corners back in world space                                                                                                                                                        
                corners3d = np.matmul(R.transpose(), corners3d.transpose()).transpose()                                                                                                                     
                # extract centroid, basis, and coeffs from 3D corners                                                                                                                                       
                raw_centroid[i] = corners3d.mean(axis=0)                                                                                                                                                    
                b0_with_scale = (corners3d[1] - corners3d[0]) / 2                                                                                                                                           
                c0 = np.linalg.norm(b0_with_scale)                                                                                                                                                          
                b0 = b0_with_scale / c0                                                                                                                                                                     
                b1_with_scale = (corners3d[1] - corners3d[2]) / 2                                                                                                                                           
                c1 = np.linalg.norm(b1_with_scale)                                                                                                                                                          
                b1 = b1_with_scale / c1                                                                                                                                                                     
                b2_with_scale = (corners3d[1] - corners3d[5]) / 2                                                                                                                                           
                c2 = np.linalg.norm(b2_with_scale)                                                                                                                                                          
                b2 = b2_with_scale / c2                                                                                                                                                                     
                raw_basis[i, 0] = -b0 # flip basis 0                                                                                                                                                        
                raw_basis[i, 1] = b1                                                                                                                                                                        
                # keep b2 as [0, -1, 0] to avoid numerical issues                                                                                                                                           
                raw_coeffs[i] = [-c0, c1, c2]

Looking forward to discussing this with you!

unable to unzip 'preprocessed ground truth of SUNRGBD dataset'

First, thx for your wonderful work.

When i try to unzip the 'preprocessed ground truth of SUNRGBD dataset' from this link https://drive.google.com/file/d/1QUbq7fRtJtBPkSJbIsZOTwYR5MwtZuiV/view as you mentioned in the main page, it outputs:

checkdir error: 2dbdb exists but is not directory
unable to process 2dbdb/image/3413.png.
checkdir error: 2dbdb exists but is not directory
unable to process 2dbdb/image/3411.png.
checkdir error: 2dbdb exists but is not directory
unable to process 2dbdb/image/3398.png.

Do you know why this happen? I am not the only one meet this problem. See the last comment from #4.

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.