Giter Site home page Giter Site logo

shanice-l / gdrnpp_bop2022 Goto Github PK

View Code? Open in Web Editor NEW
224.0 224.0 49.0 3.55 MB

PyTorch Implementation of GDRNPP, winner (most of the awards) of the BOP Challenge 2022 at ECCV'22

License: Apache License 2.0

Python 32.84% Shell 0.10% C++ 44.33% Cuda 0.13% C 21.97% CMake 0.38% GLSL 0.24%
pose-estimation pytorch

gdrnpp_bop2022's People

Contributors

belalhmedan90 avatar goodfella47 avatar lolrudy avatar rainbowend avatar shanice-l avatar tzsombor95 avatar wangg12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gdrnpp_bop2022's Issues

Index out of bounds when loading in Dataset for training

Hi,
I am getting an Index out of bounds error when starting the training for pose estimation on tless Dataset with command:
./core/gdrn_modeling/train_gdrn.sh configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py 0

The error usually occurs a few times when trying to train, but after a couple tries it starts to train without raising the error again. Sometimes also a Segmentation fault is thrown.
Here the index error:

20230131_152655|ERR|__main__@233: An error has been caught in function '<module>', process 'MainProcess' (127486), thread 'MainThread' (140000745686208):
Traceback (most recent call last):

> File "./core/gdrn_modeling/main_gdrn.py", line 233, in <module>
    main(args)
    │    └ Namespace(config_file='configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py', dist_url='tcp://127.0...
    └ <function main at 0x7f5399fdaaf0>

  File "./core/gdrn_modeling/main_gdrn.py", line 199, in main
    Lite(
    └ <class '__main__.Lite'>

  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 408, in _run_impl
    return run_method(*args, **kwargs)
           │           │       └ {}
           │           └ (Namespace(config_file='configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py', dist_url='tcp://127....
           └ functools.partial(<bound method LightningLite._run_with_strategy_setup of <__main__.Lite object at 0x7f53d3435070>>, <bound m...
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 413, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
           │           │       └ {}
           │           └ (Namespace(config_file='configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py', dist_url='tcp://127....
           └ <bound method Lite.run of <__main__.Lite object at 0x7f53d3435070>>

  File "./core/gdrn_modeling/main_gdrn.py", line 185, in run
    self.do_train(cfg, args, model, optimizer, renderer=renderer, resume=args.resume)
    │    │        │    │     │      │                   │                │    └ False
    │    │        │    │     │      │                   │                └ Namespace(config_file='configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py', dist_url='tcp://127.0...
    │    │        │    │     │      │                   └ <lib.egl_renderer.egl_renderer_v3.EGLRenderer object at 0x7f5397dcc040>
    │    │        │    │     │      └ LiteRanger (
    │    │        │    │     │        Parameter Group 0
    │    │        │    │     │            N_sma_threshhold: 5
    │    │        │    │     │            alpha: 0.5
    │    │        │    │     │            betas: (0.95, 0.999)
    │    │        │    │     │            eps: 1e-05
    │    │        │    │     │            initial_lr:...
    │    │        │    │     └ _LiteModule(
    │    │        │    │         (_module): GDRN_DoubleMask(
    │    │        │    │           (backbone): FeatureListNet(
    │    │        │    │             (stem_0): Conv2d(3, 128, kernel_size=(4, 4),...
    │    │        │    └ Namespace(config_file='configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py', dist_url='tcp://127.0...
    │    │        └ Config (path: configs/gdrn/tlessPbrSO/convnext_AugCosyAAEGray_DMask_amodalClipBox_tless/1.py): {'OUTPUT_ROOT': 'output', 'OUT...
    │    └ <function GDRN_Lite.do_train at 0x7f539a341820>
    └ <__main__.Lite object at 0x7f53d3435070>

  File "/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine.py", line 275, in do_train
    data = next(data_loader_iter)
                └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f53971af250>

  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
           │    └ <function _MultiProcessingDataLoaderIter._next_data at 0x7f53e0bb5b80>
           └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f53971af250>
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
           │    │             └ <torch._utils.ExceptionWrapper object at 0x7f537a1d97c0>
           │    └ <function _MultiProcessingDataLoaderIter._process_data at 0x7f53e0bb5ca0>
           └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f53971af250>
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
    │    └ <function ExceptionWrapper.reraise at 0x7f547514a790>
    └ <torch._utils.ExceptionWrapper object at 0x7f537a1d97c0>
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
          └ IndexError('Caught IndexError in DataLoader worker process 0.\nOriginal Traceback (most recent call last):\n  File "/home/fap...

IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/core/gdrn_modeling/../../core/gdrn_modeling/datasets/data_loader_online.py", line 765, in __getitem__
    dataset_dict = self._get_sample_dict(idx)
  File "/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/core/gdrn_modeling/../../core/base_data_loader.py", line 118, in _get_sample_dict
    start_addr = 0 if idx == 0 else self._addr[idx - 1].item()
IndexError: index 181319007 is out of bounds for axis 0 with size 17223

The index changes when trying again.

Has this happened to anyone before? Or do you have any idea what might cause this?

Thank you!

Hello, is your pre training model damaged?

When training the detector, I will encounter the following errors. Is the pre training model you uploaded incomplete or damaged?

For example, if I run the following statement, an error will occur:

init_checkpoint = "./pretrained_models/yolox/yolox_x.pth"
torch.load(init_checkpoint,map_location=torch.device("cuda"))

Traceback (most recent call last):
File "gdrnpp_bop2022/det/yolox/tools/boardtest.py", line 6, in
torch.load(init_checkpoint,map_location=torch.device("cuda"))
File "/home/dell/anaconda3/envs/gdrn/lib/python3.8/site-packages/torch/serialization.py", line 705, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/dell/anaconda3/envs/gdrn/lib/python3.8/site-packages/torch/serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Registering custom dataset

I am trying to train on my own custom dataset, but there seems to be an issue when trying to register the dataset for detectron2. I have already written files in the det/yolox/data/datasets folder, the /ref folder, added it to the dataset_factory.py file and wrote a config file for the dataset.
The error I get looks like this:

[0329_144020 det.yolox.engine.yolox_setup@121]: Full config saved to /home/faps/GDRNPP_pose_est/pose_est_gdrnpp/output/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30_epochs_Stator_test/config.yaml
[0329_144020 det.yolox.engine.yolox_setup@125]: seed: -1
[0329_144020 d2.utils.env@41]: Using a generated random seed 21122847
[0329_144020 det.yolox.engine.yolox_setup@146]: Used mmcv backend: cv2
[0329_204020@Stator_test:411] DBG register dataset: Stator_train
ERROR [0329_144020 d2.engine.launch@84]: An error has been caught in function 'launch', process 'MainProcess' (430301), thread 'MainThread' (139953541154048):
Traceback (most recent call last):

  File "./det/yolox/tools/main_yolox.py", line 77, in <module>
    args=(args,),
          -> Namespace(config_file='configs/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30_epochs_Stator_test.py', dist_url='tcp://127....

> File "/home/faps/anaconda3/envs/GDRNPP/lib/python3.7/site-packages/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
    |          -> (Namespace(config_file='configs/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30_epochs_Stator_test.py', dist_url='tcp://127...
    -> <function main at 0x7f4852409440>

  File "./det/yolox/tools/main_yolox.py", line 50, in main
    cfg = setup(args)
          |     -> Namespace(config_file='configs/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30_epochs_Stator_test.py', dist_url='tcp://127....
          -> <function setup at 0x7f49782177a0>

  File "./det/yolox/tools/main_yolox.py", line 43, in setup
    register_datasets_in_cfg(cfg)
    |                        -> {'train': {'output_dir': '/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/output/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30...
    -> <function register_datasets_in_cfg at 0x7f4852400680>

  File "/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/det/yolox/tools/../../../det/yolox/data/datasets/dataset_factory.py", line 106, in register_datasets_in_cfg
    assert "DATA_CFG" in cfg and name in cfg.DATA_CFG, "no cfg.DATA_CFG.{}".format(name)
                         |       |       |                                         -> 'Stator_test'
                         |       |       -> {'train': {'output_dir': '/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/output/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30...
                         |       -> 'Stator_test'
                         -> {'train': {'output_dir': '/home/faps/GDRNPP_pose_est/pose_est_gdrnpp/output/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30...

AssertionError: no cfg.DATA_CFG.Stator_test

Is there a step I am missing to register the dataset?

Thank you for your help!

Some pretrained model files missing for refinement

I am now trying to train the pose refinement model, and get this error:

FileNotFoundError: [Errno 2] No such file or directory: 'model_weights/detector/detector-bop-itodd-pbr--509908/config.yaml' In call to configurable 'load_detector' (<function load_detector at 0x7fef1e5d6670>)

The pretrained models available on onedrive dont have the detector files. How do I solve this?

keypoints_3d.pkl not found

In order to visualize the test results, the code accesses the function get_keypoints_3d, in the ref folder (example below is for the lm dataset)

def get_keypoints_3d():
    """key is str(obj_id) generated by
    core/roi_pvnet/tools/lm/lm_1_compute_keypoints_3d.py."""
    keypoints_3d_path = osp.join(model_dir, "keypoints_3d.pkl")
    assert osp.exists(keypoints_3d_path), keypoints_3d_path
    kpts_dict = mmcv.load(keypoints_3d_path)
    return kpts_dict

However, the path provided in the function "core/roi_pvnet/tools/lm/lm_1_compute_keypoints_3d.py" doesn't exist, so I can't generate the keypoints_3d.pkl file nor does it exist.

How can I generate this file to visualize the estimated poses?

Model download is slow in OneDrive

Dear authors,

Thanks for releasing your excellent work.

I'm reproducing this repo in my own server, but i found that the model files are so large and download from OneDrive is too slow (about 200KB/s), I would be very grateful if you could provide a Baidu cloud link.

Much appreciated your reply in advance!
And congratulations on winning the first place in the bop2022 challenge!

EGL error when starting to training gdrnpp

Training command

(base) root@a3c636c20700:/workspace/gdrnpp_bop2022# CUDA_VISIBLE_DEVICES=0 python ./core/gdrn_modeling/main_gdrn.py     --config-file configs/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless.py --num-gpus 1 --opts MODEL.WEIGHTS=output/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless/model_final_wo_optim.pth --resume

Error log

20221220_030409|d2.utils.env@41: Using a generated random seed 10091570
20221220_030409|core.utils.default_args_setup@162: Used mmcv backend: cv2
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 556faa62b6a0 at 0x7f9a566023d0>
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 556fab0e1ee0 at 0x7f9a566de0d0>
--- Logging error in Loguru Handler #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
    return function(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
    ).run(args, cfg)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
    return run_method(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
    use_cache=True,
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
    self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
  File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_MATCH,
        baseOperation = eglCreateContext,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
                <OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
                <OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
        ),
        result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
    self._queue.put(str_record)
  File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
    obj = _ForkingPickler.dumps(obj)
  File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
    pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---
--- Logging error in Loguru Handler #3 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
    return function(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
    ).run(args, cfg)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
    return run_method(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
    use_cache=True,
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
    self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
  File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_MATCH,
        baseOperation = eglCreateContext,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
                <OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
                <OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
        ),
        result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
    self._queue.put(str_record)
  File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
    obj = _ForkingPickler.dumps(obj)
  File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
    pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---

I noticed that @wangg12 had solved this error in this comment DLR-RM/AugmentedAutoencoder#19 (comment)
Would you mind telling the solution?

visualize reuslts

Hi,
I want to quickly test the model without dive into details. Everything is fine. the pre-trained model is successfully loaded and the evaluation is done correctly. However, the numbers are not intuitive and I didn't figure out how to visualize the results. Can U explain it a little bit? Besides, there are a lot of configurations for this repo, it might be better to explain some of the params in detail.
Thx

Missing lmo/train_pbr/xyz_crop/000000/000000_000006-xyz.pkl

王博,刘博,

请教一个问题。在跑yolox的训练的时候,使用下面的训练命令来训练 lmo的数据集,出现了找不到文件的错误:

main_yolox.py --config-file configs/yolox/bop_pbr/yolox_x_640_augCozyAAEhsv_ranger_30_epochs_lmo_pbr_lmo_bop_test.py --num-gpus 1

AssertionError: /home/t3080/workspace_Pose/gdrnpp_bop2022/datasets/BOP_DATASETS/lmo/train_pbr/xyz_crop/000000/000000_000006-xyz.pkl

在GDR-Net的开源里面,xyz_crop是另外下载的。这里的 xyz_crop 还需要另外下载吗?还是项目里有工具可以生成?

谢谢!
George

mask_trunc以及crop resize的疑问

王博您好:

请问在迭代中处理训练数据的时候,除了mask_visible和mask_full。还有一个mask_trunc参数,请问这个参数的含义是什么?

在对roi使用dzi并resize的时候,我发现在CDPN中考虑了经过尺度变换后bbox超出图像边界的情况,而在gdrnpp中是使用了cv2中的仿射变换函数完成,请问这种方法可以自动处理尺度变换后bbox超出边界的这种情况吗?

烦请王博解答:)

About the refinement method

Dear authors,

Thanks again for your previous reply, but I still have a question, have you tried other refine methods like RNNPose or RePOSE,.etc?If you have tried, how is their performance?

Distributed Training Slower

Hi,

I have two RTX A6000 GPUs available for training (device IDs 0 and 1).
I run the GDRN training as: "./core/gdrn_modeling/train_gdrn.sh <config_file> 0,1". The training starts as usual but it is much slower (takes almost twice as long to train) than when I use just one GPU. The terminal also shows this warning:
"[W reducer.cpp:313] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance."
Please note that there are no errors in the output, it is just way too slow.
Can anyone please help me with this issue?

Produce detectron2 compatible input data from synthetic data generated by blenderproc2.

Hi,

I created a custom dataset with https://github.com/DLR-RM/BlenderProc/blob/main/README_BlenderProc4BOP.md, to train a custom model with the help of this repository.
Is there any way or script that I can use the data generated by the above tool to produce the input format needed to train detectron2.

If not, can you point me to an exact folder and data structure description on how I can run the train script?
My last message was: "AssertionError: Dataset 'lmo_pbr_train' is empty!", but I don't know exactly what to put here and in what format.

Many thanks for your help!

Question: Several instances of same object

Hello,

Thank you for the support you have been providing in solving issues regarding this repository!
The detection algorithm works well for objects that are not cluttered, similar to the image in #13. What changes in training images or model parameters/hyperparameters would you recommend for images with a large number of objects in a box? Here is an image for your reference.

000177

bop_render无法使用

王博您好:

在test的时候,好像需要使用到bop_render,我在project的根目录发现了名为bop_render的文件,但貌似是个软链接。

我在https://github.com/thodan/bop_renderer中找到了bop_render并按照教程成功编译,但运行test时仍提示AttributeError: module 'bop_renderer' has no attribute 'Renderer'

我尝试导入bop_render编译生成的so文件,但提示file too short的错误,请王博指点一下这个bop_render该怎么导入?

下面是我的Project的文件结构和编译后的bop_render文件夹结构以及导入so文件的错误(我import的so文件所在的路径为bop_renderer/build/bop_renderer.cpython-37m-x86_64-linux-gnu.so
image

image

image

烦请王博解答:)

Potential dataset issue

王博,刘博,

在 loading dataset dicts: ycbv_train_real 完成后,出现下面的警告:

20230111_114259|WRN|core.gdrn_modeling.datasets.ycbv_d2@276: Filtered out 2716 instances without valid box. There might be issues in your dataset generation process.

Filtered out 2716 instances without valid box 是什么含义能解释一下吗?这个对后续的训练有影响吗?

谢谢!
George

ycbv和tless数据集的问题

王博您好:
在使用ycbv和tless数据集的时候遇到如下问题:

  1. core/gdrn_modeling/datasets/ycbv_d2.py(用于在代码中注册ycbv_train_real数据集)中的如下代码中:
update_cfgs` = {
    "ycbv_train_real": {
        "ann_files": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/image_sets/train.txt")],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_real")],
    },
    "ycbv_train_real_aligned_Kuw": {
        "ann_files": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/image_sets/train.txt")],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_real")],
        "align_K_by_change_pose": True,
    },
    "ycbv_train_real_uw": {
        "ann_files": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/image_sets/train_real_uw.txt")],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_real")],
    },
    "ycbv_train_real_uw_every10": {
        "ann_files": [
            osp.join(
                DATASETS_ROOT,
                "BOP_DATASETS/ycbv/image_sets/train_real_uw_every10.txt",
            )
        ],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_real")],
    },
    "ycbv_train_real_cmu": {
        "ann_files": [
            osp.join(
                DATASETS_ROOT,
                "BOP_DATASETS/ycbv/image_sets/train_real_cmu.txt",
            )
        ],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_real")],
    },
    "ycbv_train_real_cmu_aligned_Kuw": {
        "ann_files": [
            osp.join(
                DATASETS_ROOT,
                "BOP_DATASETS/ycbv/image_sets/train_real_cmu.txt",
            )
        ],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_real")],
        "align_K_by_change_pose": True,
    },
    "ycbv_train_synt": {
        "ann_files": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/image_sets/train_synt.txt")],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_synt")],
    },
    "ycbv_train_synt_50k": {
        "ann_files": [
            osp.join(
                DATASETS_ROOT,
                "BOP_DATASETS/ycbv/image_sets/train_synt_50k.txt",
            )
        ],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_synt")],
    },
    "ycbv_train_synt_30k": {
        "ann_files": [
            osp.join(
                DATASETS_ROOT,
                "BOP_DATASETS/ycbv/image_sets/train_synt_30k.txt",
            )
        ],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_synt")],
    },
    "ycbv_train_synt_100": {
        "ann_files": [
            osp.join(
                DATASETS_ROOT,
                "BOP_DATASETS/ycbv/image_sets/train_synt_100.txt",
            )
        ],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/train_synt")],
    },
    "ycbv_test": {
        "ann_files": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/image_sets/keyframe.txt")],
        "image_prefixes": [osp.join(DATASETS_ROOT, "BOP_DATASETS/ycbv/test")],
        "with_xyz": False,
        "filter_invalid": False,
    },
}

涉及到了train_real_uw.txt、train_real_uw_every10.txt等txt文件,我仅在image set文件夹中找到了train.txt文件(image set使用的原版ycbv中的,BOP没有提供),请问train_real_uw.txt、train_real_uw_every10.txt等txt文件如何获取?

  1. core/gdrn_modeling/datasets/tless_d2.py中(用于注册tless_train_primesenses数据集)的如下代码:
SPLITS_TLESS = dict(
    tless_train_primesense=dict(
        name="tless_train_primesense",
        dataset_root=osp.join(DATASETS_ROOT, "BOP_DATASETS/tless/train_primesense"),
        models_root=osp.join(DATASETS_ROOT, "BOP_DATASETS/tless/models_cad"),
        objs=ref.tless.objects,
        scale_to_meter=0.001,
        with_masks=True,  # (load masks but may not use it)
        with_depth=True,  # (load depth path here, but may not use it)
        height=400,
        width=400,
        cache_dir=osp.join(PROJ_ROOT, ".cache"),
        use_cache=True,
        num_to_load=-1,
        filter_invalid=False,
        ref_key="tless",
    ),
    tless_train_primesense_rescaled=dict(
        name="tless_train_primesense_rescaled",
        dataset_root=osp.join(DATASETS_ROOT, "BOP_DATASETS/tless/train_primesense_rescaled"),
        models_root=osp.join(DATASETS_ROOT, "BOP_DATASETS/tless/models_cad"),
        objs=ref.tless.objects,
        scale_to_meter=0.001,
        with_masks=True,  # (load masks but may not use it)
        with_depth=False,  # NOTE: we didn't prepare depth yet
        height=540,
        width=720,
        cache_dir=osp.join(PROJ_ROOT, ".cache"),
        use_cache=True,
        num_to_load=-1,
        filter_invalid=False,
        ref_key="tless",
    ),
)

其中tless_train_primesense_rescaled中的dataset_root=osp.join(DATASETS_ROOT, "BOP_DATASETS/tless/train_primesense_rescaled"
未在BOP提供的数据集中找到,请问这个是如何获取的?

烦请王博解答:)

depth_factor的含义

王博您好
请问在数据集注册时(以lmo_train_pbr为例)计算了depth_factor = 1000.0 / cam_dict[str_im_id]["depth_scale"],请问这个参数的含义是什么,有什么作用?

depth_scale官方给的解释是深度图与这个参数相乘得到真实深度,我不太理解这个depth_factor是用来干什么的?

烦请王博解答:)

install lietorch during environment creation (branch : pose_refine)

Hi,

I am trying to use the iterative refinement in the pose_refine branch, but I can't create the required environment using :
conda env create -f environment.yaml

Indeed, I get the following error :

× Encountered error while trying to install package.  
╰─> lietorch

So, I installed it with conda :

conda install -c lietorch lietorch

But then when I run training I get the following error :

Traceback (most recent call last):
  File "/home/jmoquet/gdrnpp_bop2022_refine/gdrnpp_bop2022/train.py", line 18, in <module>
    from lietorch import SE3
  File "/home/jmoquet/anaconda3/envs/refine/lib/python3.9/site-packages/lietorch/__init__.py", line 75, in <module>
    _init_backend()
  File "/home/jmoquet/anaconda3/envs/refine/lib/python3.9/site-packages/lietorch/__init__.py", line 55, in _init_backend
    torch.ops.load_library(libpath)
  File "/home/jmoquet/anaconda3/envs/refine/lib/python3.9/site-packages/torch/_ops.py", line 104, in load_library
    ctypes.CDLL(path)
  File "/home/jmoquet/anaconda3/envs/refine/lib/python3.9/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

I want to know if this branch is up to date and if you know how to install the correct version of lietorch library.

Thanks

KeyError: timm/convnext_base 不存在

王博,刘博,

在训练开始前建立神经网络模型的时候,系统返回一个错误,BACKBONES这个辞典里面找不到 timm/convnet_base这个键值。

我查了一下代码。在requirements.txt 里面,下载安装的是最新版的 timm。我这里的timm的版本是 0.8.2.dev0. 这个版本的 timm 没有单独存在的 convnext_base,而是有四个分支:
'timm/convnext_base.fb_in1k'
'timm/convnext_base.fb_in22k'
'timm/convnext_base.fb_in22k_ft_in1k'
'timm/convnext_base.fb_in22k_ft_in1k_384'

你们建议用哪个做为backbone?

谢谢!
George

compile about the egl_renderer

您好,在搭建环境的时候编译egl_renderer时遇到报错:
image
系统环境:ubuntu16.04
期望得到您的指导!

Recommendations for training on a custom dataset

Hello,
I am trying to train the algorithm on my custom dataset (50,000 images, all synthetically generated using PBR). I am currently stuck on training the YOLO-X model; training using the default settings resulted in a really terrible output model that cannot predict anything (see image). Do you have any suggestions for datasets or training parameters (e.g. number of images in the dataset, learning rate, LR scheduler type, etc.)?

In the image, all dark gray parts are to be detected and classified, the other TLESS parts are used as distractions and are supposed to be ignored.

Any kind of ideas/tips are welcome and appreciated.

bboxes

can't find the xyz_crop

hello, sorry to bother you
what's the xyz_crop in the ycbv dataset? where i can find it?
Thank you

如何生成 test_bboxes 文件

王博,刘博,

在做测试的时候,看上去系统是直接使用你们提供的 test_bboxes 文件里的数据来进行evaluate的。具体看这个文件里面的数据,发现有点不太准确。比如test的scene 00048,里面有5个物体,而test_bboxes文件里面相对应的scene 48,包含的物体多于5个,具体看物体的id,可以发现有一些物体并没有出现在这个scene里面。我的问题是,您们生成 test_bboxes 的程序开源了吗?还是需要我们自己根据 test_bboxes 的数据格式,用 yolox 或者 detectron2 来自己生成?

非常感谢!
George

关于nominal batch size的疑问

王博您好:
在config文件中有REFERENCE_BS这个参数,您在core/gdrn_modeling/engine/engine.py中的注释为nominal batch size

我通过查资料知道nominal batch size的含义为名义批次,比如实际批次为16,那么64/16=4,每4次迭代,才进行一次反向传播更新权重,在代码中您使用了accumulate_iter这个变量表示

但在代码中发现网络的参数是每次迭代都更新的,与accumulate_iter无关,但梯度每accumulate_iter个迭代才清零,这样做的效果会更好吗?

image

按理说应该每accumulate_iter个迭代进行模型参数更新和梯度清零吧

烦请王博解答:)

About training time

两位博士好,在复现工作的时候遇到关于训练时长的问题,想请教一下是否正常
两张RTX3090,在ycbv全物体上训练大概需要14天。
在tless(Pbr)单物体上训练大概需要16小时(IMS_PER_BATCH=48)。
参见https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/main/tools/lm/lm_pbr_1_gen_xyz_crop.py
进行离线渲染,对tless数据集进行xyzcrop生成,但是生成到大概五分之四就已经占用了400G硬盘内存。
由此想问一下这样的训练时长是否正常,如果正常的话,如何能提高一些训练速度呢?

期待回复!谢谢!

models_info.json

helloo,
i'm running this repo on pretrained model when i run the command it show me models_info.json where i can find that?

关于旋转变换的一点疑问

王博您好,
core/utils/utils.py中的allo_to_ego_mat_torch函数,利用该函数对旋转进行变换,请问这个函数有什么作用呢?我没太看懂。烦请王博解答:)

fastfunc no longer available

pip install fastfunc is no longer available
however it is needed in /gdrnpp_bop2022/core/gdrn_modeling/../../lib/render_vispy/model3d.py
fix needed

Pose Refinement Error

Hi, there are few issues with refinement. First issue has already been reported (key error: ["coord_x"]). I resolved that by adding "or cfg.TEST.USE_DEPTH_REFINE" on line 203 of GRDN_double_mask. I now have another issue:


File "/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/gdrn_evaluator.py", line 543, in process_depth_refine
    query_img_norm = query_img_norm.numpy() * ren_mask * depth_sensor_mask_crop
                     │              │         │          └ array([[False, False, False, ...,  True,  True,  True],
                     │              │         │                   [False, False, False, ...,  True,  True,  True],
                     │              │         │                   [False...
                     │              │         └ array([[False, False, False, ..., False, False, False],
                     │              │                  [False, False, False, ..., False, False, False],
                     │              │                  [False...
                     │              └ <method 'numpy' of 'torch._C._TensorBase' objects>
                     └ tensor([[0.0402, 0.0875, 0.1061,  ..., 0.1182, 0.1044, 0.0653],
                               [0.0913, 0.1657, 0.1896,  ..., 0.2083, 0.1792, 0.1090...

ValueError: operands could not be broadcast together with shapes (64,64) (256,256) 

It seems that the depth_sensor_mask crop is (256,256), while the other two tensors are (64,64)

how do I solve this?

When doing pose refinement, is the args of `test_gdrn_depth_refine.sh` the same as `test_gdrn.sh`?

If I use this command ./core/gdrn_modeling/test_gdrn_depth_refine.sh configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py 0 output/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo/model_final_wo_optim.pth which args are same as test_gdrn.sh, error occurred as following.

  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/gdrn_evaluator.py", line 81, in __init__
    texture_paths=self.data_ref.texture_paths if cfg.TEST.DEBUG else None,
                  │    │        │                └ Config (path: configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py): {'OUTPUT_ROOT...
                  │    │        └ None
                  │    └ <module 'ref.lmo_full' from '/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../ref/lmo_full.py'>
                  └ <core.gdrn_modeling.engine.gdrn_evaluator.GDRN_Evaluator object at 0x7fa7238bb410>

  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/config.py", line 50, in __getattr__
    raise ex
          └ AttributeError("'ConfigDict' object has no attribute 'DEBUG'")

AttributeError: 'ConfigDict' object has no attribute 'DEBUG'

Seems there is not TEST.DEBUG in configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py

refine时出现的错误

王博您好:

在对lmo进行refine的过程中出现了Keyerror “coor_x”的错误

image

我看到#8 中也提到了相关错误,但至今仍未解决

我使用的命令行为(gdrnpp_double_mask模型):

./core/gdrn_modeling/test_gdrn_depth_refine.sh configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05mlL1_DMask_amodalClipBox_classAware_lmo.py 2 output/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo/model_0223929.pth

烦请王博解答:)

yolox训练和后续的训练次序

Hi Shanice,

用yolox跑ycbv的数据集是可以立刻就跑起来的。在batch_size = 8 (GPU: RTX3080Ti, 12G)的设置下,需要的时间大概1天多。Batch_size = 16 会导致GPU内存溢出。请教几个问题:

1)yolox训练是一个必须的先期步骤吗?
2)yolox训练完成后,在进入正式的pose训练前,还需要再先进行其他的数据预处理步骤吗?比如有些数据集的训练(lmo),需要xyz_crop的数据。每个数据集的 farthest point sampling 也需要预先处理好吗?
3)整个训练的步骤似乎比较多,能否给一个更详细的全过程训练步骤?

非常感谢!
George

using pose_refine

Hi! The repo does not have the script mentioned in the readme (save_gdrn.sh). I have the csv predictions from the main branch, but when I try to load it here for refinement, I get _pickle.UnpicklingError: unpickling stack underflow
I'm gueesing it needs to be converted to a different format first? which script is used to do that?

test_bboxes

Hi, in the README.md you put the instruction to download the folder 'test_bboxes' for each respective dataset. Could you explain what exactly those files are for? And is there a way to obtain it for a custom dataset? I have already trained yolox with that dataset, so is there a way to generate these .json files?

Thank you for your help!

geo_head中的一个bug?

王博,刘博您好:
在geo_head的TopDownDoubleMaskXyzRegionHead中,我打印了一下self.feature变量,发现ConvModule中两次GN,如下图所示:
image

combine inferences into a full pipeline

Hi,
I have trained my own custom model with synthetic data and would like to test it on my own real data. As far as I guess there is currently no code snippet that combines data loading, detection, gdrn and refinement algorithms.
I see that there is a TODO in the core/gdrn_modeling/demo/demo_gdrn.py file. When do you expect this script to be ready? Maybe there is already a code snippet that I could start from to build the pipeline that would load the rgb and depth image and get the transformation matrices as output?

Thanks for your answers.

dockerfile

Hi,

great work implementing this pose estimation architecture and sharing it with us.
However I have troubles getting it running due to dependency issues and so on
Is there a dockerfile available to get it running quickly?

Help is appreciated

关于crop后相机内参变化的问题

王博您好:
core/utils/camera_geometry.py中的get_K_crop_resize函数是对crop和resize后的图像进行内参变换, 在下方标注红框的代码中, 我看网上都是用下面的公式计算的

new_K[:, [0, 1], 2] = 原图像的主点坐标-(原图像size-crop后图像的size)/2

而您使用了原图像的主点坐标-bbox的左上角坐标, 请问为什么要这么做呢

如果crop是在原图的右下角, 那么这个值就应该是负数了呀, 内参怎么会有负数呢?
image

关于训练期间保存可视化数据的一点小问题

王博您好:
在训练期间进行可视化保存的过程中(core/gdrn_modeling/engine/engine.py文件的self.do_train()函数中的if cfg.TRAIN.VIS_IMG:部分):

  1. 需要在tbx_writer.add_image中添加dataformats="HWC",否则将出现Cannot handle this data type: (1, 1, 256), |u1的错误
  2. 在训练期间,out_dict的输出为一个空的字典,如果进行可视化的将报错。需要在前向传播的时候把coor_x, coor_y, coor_z等变量添加到out_dict中去才可以(类似test过程中out_dict的更新)

[egl OffscreenContext:] Bindless Texture Not supported

Thanks for continuing the work on GDR. I enjoyed working with the initial version a year ago and look forward to trying the revised version.

When I start the training, I receive the following error message from egl_offscreen_context.py:227:

RuntimeError: Bindless Textures not supported

The entire log of the program is:

./root@n156egjsrm:/notebooks/gdrnpp_bop2022./core/gdrn_modeling/train_gdrn.sh /notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py 0
++ dirname ./core/gdrn_modeling/train_gdrn.sh
+ this_dir=./core/gdrn_modeling
+ CFG=/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py
+ CUDA_VISIBLE_DEVICES=0
+ IFS=,
+ read -ra GPUS
+ NGPU=1
+ echo 'use gpu ids: 0 num gpus: 1'
use gpu ids: 0 num gpus: 1
+ NCCL_DEBUG=INFO
+ OMP_NUM_THREADS=1
+ MKL_NUM_THREADS=1
+ PYTHONPATH=./core/gdrn_modeling/../..:
+ CUDA_VISIBLE_DEVICES=0
+ python ./core/gdrn_modeling/main_gdrn.py --config-file /notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py --num-gpus 1
/usr/local/lib/python3.9/dist-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
You requested to import horovod which is missing or not supported for your OS.
[1126_222408@main_gdrn:216] soft limit:  500000 hard limit:  1048576
[1126_222408@main_gdrn:227] Command Line Args: Namespace(config_file='/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py', resume=False, eval_only=False, launcher='none', local_rank=0, fp16_allreduce=False, use_adasum=False, num_gpus=1, num_machines=1, machine_rank=0, dist_url='tcp://127.0.0.1:49152', opts=None, strategy=None)
[1126_222408@main_gdrn:101] optimizer_cfg: {'type': 'Ranger', 'lr': 0.0008, 'weight_decay': 0.01}
[1126_222409@itodd_pbr:358] DBG register dataset: itodd_train_pbr
[1126_222409@itodd_bop_test:379] DBG register dataset: itodd_bop_test
20221126_142410|core.utils.default_args_setup@123: Rank of current process: 0. World size: 1
20221126_142410|core.utils.default_args_setup@124: Environment info:
----------------------  ----------------------------------------------------------------
sys.platform            linux
Python                  3.9.13 (main, May 23 2022, 22:01:06) [GCC 9.4.0]
numpy                   1.23.1
detectron2              0.6 @/usr/local/lib/python3.9/dist-packages/detectron2
Compiler                GCC 9.4
CUDA compiler           CUDA 11.2
detectron2 arch flags   6.1
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.12.0+cu116 @/usr/local/lib/python3.9/dist-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   Quadro P5000 (arch=6.1)
Driver version          510.73.05
CUDA_HOME               /usr/local/cuda
Pillow                  9.3.0
torchvision             0.13.0+cu116 @/usr/local/lib/python3.9/dist-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5.post20221122
iopath                  0.1.9
cv2                     4.6.0
----------------------  ----------------------------------------------------------------
PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

20221126_142410|core.utils.default_args_setup@126: Command line arguments: Namespace(config_file='/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py', resume=False, eval_only=False, launcher='none', local_rank=0, fp16_allreduce=False, use_adasum=False, num_gpus=1, num_machines=1, machine_rank=0, dist_url='tcp://127.0.0.1:49152', opts=None, strategy=None)
20221126_142410|core.utils.default_args_setup@128: Contents of args.config_file=/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py:
_base_ = ["../../_base_/gdrn_base.py"]

OUTPUT_DIR = "output/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd"
INPUT = dict(
    DZI_PAD_SCALE=1.5,
    TRUNCATE_FG=False,
    CHANGE_BG_PROB=0.5,
    COLOR_AUG_PROB=0.8,
    MIN_SIZE_TRAIN=960,
    MAX_SIZE_TRAIN=1280,
    MIN_SIZE_TEST=960,
    MAX_SIZE_TEST=1280,
    COLOR_AUG_TYPE="code",
    COLOR_AUG_CODE=(
        "Sequential(["
        # Sometimes(0.5, PerspectiveTransform(0.05)),
        # Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
        # Sometimes(0.5, Affine(scale=(1.0, 1.2))),
        "Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),"
        "Sometimes(0.4, GaussianBlur((0., 3.))),"
        "Sometimes(0.3, pillike.EnhanceSharpness(factor=(0., 50.))),"
        "Sometimes(0.3, pillike.EnhanceContrast(factor=(0.2, 50.))),"
        "Sometimes(0.5, pillike.EnhanceBrightness(factor=(0.1, 6.))),"
        "Sometimes(0.3, pillike.EnhanceColor(factor=(0., 20.))),"
        "Sometimes(0.5, Add((-25, 25), per_channel=0.3)),"
        "Sometimes(0.3, Invert(0.2, per_channel=True)),"
        "Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),"
        "Sometimes(0.5, Multiply((0.6, 1.4))),"
        "Sometimes(0.1, AdditiveGaussianNoise(scale=10, per_channel=True)),"
        "Sometimes(0.5, iaa.contrast.LinearContrast((0.5, 2.2), per_channel=0.3)),"
        "Sometimes(0.5, Grayscale(alpha=(0.0, 1.0))),"  # maybe remove for det
        "], random_order=True)"
        # cosy+aae
    ),
)

SOLVER = dict(
    IMS_PER_BATCH=48,
    TOTAL_EPOCHS=40,  # 25
    LR_SCHEDULER_NAME="flat_and_anneal",
    ANNEAL_METHOD="cosine",  # "cosine"
    ANNEAL_POINT=0.72,
    OPTIMIZER_CFG=dict(_delete_=True, type="Ranger", lr=8e-4, weight_decay=0.01),
    WEIGHT_DECAY=0.0,
    WARMUP_FACTOR=0.001,
    WARMUP_ITERS=1000,
)

DATASETS = dict(
    TRAIN=("itodd_train_pbr",),
    TEST=("itodd_bop_test",),
    DET_FILES_TEST=("datasets/BOP_DATASETS/itodd/test/test_bboxes/yolox_x_640_itodd_pbr_itodd_bop_test.json",),
    DET_TOPK_PER_OBJ=100,
    DET_THR=0.05,
)

DATALOADER = dict(
    # Number of data loading threads
    NUM_WORKERS=8,
    FILTER_VISIB_THR=0.3,
)

MODEL = dict(
    LOAD_DETS_TEST=True,
    PIXEL_MEAN=[0.0, 0.0, 0.0],
    PIXEL_STD=[255.0, 255.0, 255.0],
    BBOX_TYPE="AMODAL_CLIP",  # VISIB or AMODAL
    POSE_NET=dict(
        NAME="GDRN_double_mask",
        XYZ_ONLINE=True,
        NUM_CLASSES=28,
        BACKBONE=dict(
            FREEZE=False,
            PRETRAINED="timm",
            INIT_CFG=dict(
                type="timm/convnext_base",
                pretrained=True,
                in_chans=3,
                features_only=True,
                out_indices=(3,),
            ),
        ),
        ## geo head: Mask, XYZ, Region
        GEO_HEAD=dict(
            FREEZE=False,
            INIT_CFG=dict(
                type="TopDownDoubleMaskXyzRegionHead",
                in_dim=1024,  # this is num out channels of backbone conv feature
            ),
            NUM_REGIONS=64,
            XYZ_CLASS_AWARE=True,
            MASK_CLASS_AWARE=True,
            REGION_CLASS_AWARE=True,
        ),
        PNP_NET=dict(
            INIT_CFG=dict(norm="GN", act="gelu"),
            REGION_ATTENTION=True,
            WITH_2D_COORD=True,
            ROT_TYPE="allo_rot6d",
            TRANS_TYPE="centroid_z",
        ),
        LOSS_CFG=dict(
            # xyz loss ----------------------------
            XYZ_LOSS_TYPE="L1",  # L1 | CE_coor
            XYZ_LOSS_MASK_GT="visib",  # trunc | visib | obj
            XYZ_LW=1.0,
            # mask loss ---------------------------
            MASK_LOSS_TYPE="L1",  # L1 | BCE | CE
            MASK_LOSS_GT="trunc",  # trunc | visib | gt
            MASK_LW=1.0,
            # full mask loss ---------------------------
            FULL_MASK_LOSS_TYPE="L1",  # L1 | BCE | CE
            FULL_MASK_LW=1.0,
            # region loss -------------------------
            REGION_LOSS_TYPE="CE",  # CE
            REGION_LOSS_MASK_GT="visib",  # trunc | visib | obj
            REGION_LW=1.0,
            # pm loss --------------
            PM_LOSS_SYM=True,  # NOTE: sym loss
            PM_R_ONLY=True,  # only do R loss in PM
            PM_LW=1.0,
            # centroid loss -------
            CENTROID_LOSS_TYPE="L1",
            CENTROID_LW=1.0,
            # z loss -----------
            Z_LOSS_TYPE="L1",
            Z_LW=1.0,
        ),
    ),
)

VAL = dict(
    DATASET_NAME="itodd",
    SCRIPT_PATH="lib/pysixd/scripts/eval_pose_results_more.py",
    TARGETS_FILENAME="test_targets_bop19.json",
    ERROR_TYPES="mspd,mssd,vsd,ad,reS,teS",
    RENDERER_TYPE="cpp",  # cpp, python, egl
    SPLIT="test",
    SPLIT_TYPE="",
    N_TOP=-1,  # SISO: 1, VIVO: -1 (for LINEMOD, 1/-1 are the same)
    EVAL_CACHED=False,  # if the predicted poses have been saved
    SCORE_ONLY=False,  # if the errors have been calculated
    EVAL_PRINT_ONLY=False,  # if the scores/recalls have been saved
    EVAL_PRECISION=False,  # use precision or recall
    USE_BOP=True,  # whether to use bop toolkit
    SAVE_BOP_CSV_ONLY=True,
)


TEST = dict(EVAL_PERIOD=0, VIS=False, TEST_BBOX_TYPE="est")  # gt | est

20221126_142410|core.utils.default_args_setup@135: Running with full config:
Config (path: /notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py): {'OUTPUT_ROOT': 'output', 'OUTPUT_DIR': 'output/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd', 'EXP_NAME': '', 'DEBUG': False, 'SEED': -1, 'CUDNN_BENCHMARK': True, 'IM_BACKEND': 'cv2', 'VIS_PERIOD': 0, 'INPUT': {'FORMAT': 'BGR', 'MIN_SIZE_TRAIN': 960, 'MAX_SIZE_TRAIN': 1280, 'MIN_SIZE_TRAIN_SAMPLING': 'choice', 'MIN_SIZE_TEST': 960, 'MAX_SIZE_TEST': 1280, 'WITH_DEPTH': False, 'BP_DEPTH': False, 'AUG_DEPTH': False, 'NORM_DEPTH': False, 'DROP_DEPTH_RATIO': 0.2, 'DROP_DEPTH_PROB': 0.5, 'ADD_NOISE_DEPTH_LEVEL': 0.01, 'ADD_NOISE_DEPTH_PROB': 0.9, 'COLOR_AUG_PROB': 0.8, 'COLOR_AUG_TYPE': 'code', 'COLOR_AUG_CODE': 'Sequential([Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),Sometimes(0.4, GaussianBlur((0., 3.))),Sometimes(0.3, pillike.EnhanceSharpness(factor=(0., 50.))),Sometimes(0.3, pillike.EnhanceContrast(factor=(0.2, 50.))),Sometimes(0.5, pillike.EnhanceBrightness(factor=(0.1, 6.))),Sometimes(0.3, pillike.EnhanceColor(factor=(0., 20.))),Sometimes(0.5, Add((-25, 25), per_channel=0.3)),Sometimes(0.3, Invert(0.2, per_channel=True)),Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),Sometimes(0.5, Multiply((0.6, 1.4))),Sometimes(0.1, AdditiveGaussianNoise(scale=10, per_channel=True)),Sometimes(0.5, iaa.contrast.LinearContrast((0.5, 2.2), per_channel=0.3)),Sometimes(0.5, Grayscale(alpha=(0.0, 1.0))),], random_order=True)', 'COLOR_AUG_SYN_ONLY': False, 'RANDOM_FLIP': 'none', 'WITH_BG_DEPTH': False, 'BG_DEPTH_FACTOR': 10000.0, 'BG_TYPE': 'VOC_table', 'BG_IMGS_ROOT': 'datasets/VOCdevkit/VOC2012/', 'NUM_BG_IMGS': 10000, 'CHANGE_BG_PROB': 0.5, 'TRUNCATE_FG': False, 'BG_KEEP_ASPECT_RATIO': True, 'DZI_TYPE': 'uniform', 'DZI_PAD_SCALE': 1.5, 'DZI_SCALE_RATIO': 0.25, 'DZI_SHIFT_RATIO': 0.25, 'SMOOTH_XYZ': False}, 'DATASETS': {'TRAIN': ('itodd_train_pbr',), 'TRAIN2': (), 'TRAIN2_RATIO': 0.0, 'DATA_LEN_WITH_TRAIN2': True, 'PROPOSAL_FILES_TRAIN': (), 'PRECOMPUTED_PROPOSAL_TOPK_TRAIN': 2000, 'TEST': ('itodd_bop_test',), 'PROPOSAL_FILES_TEST': (), 'PRECOMPUTED_PROPOSAL_TOPK_TEST': 1000, 'DET_FILES_TRAIN': (), 'DET_TOPK_PER_OBJ_TRAIN': 1, 'DET_TOPK_PER_IM_TRAIN': 30, 'DET_THR_TRAIN': 0.0, 'DET_FILES_TEST': ('datasets/BOP_DATASETS/itodd/test/test_bboxes/yolox_x_640_itodd_pbr_itodd_bop_test.json',), 'DET_TOPK_PER_OBJ': 100, 'DET_TOPK_PER_IM': 30, 'DET_THR': 0.05, 'INIT_POSE_FILES_TEST': (), 'INIT_POSE_TOPK_PER_OBJ': 1, 'INIT_POSE_TOPK_PER_IM': 30, 'INIT_POSE_THR': 0.0, 'SYM_OBJS': ['bowl', 'cup', 'eggbox', 'glue'], 'EVAL_SCENE_IDS': None}, 'DATALOADER': {'NUM_WORKERS': 8, 'PERSISTENT_WORKERS': False, 'MAX_OBJS_TRAIN': 120, 'ASPECT_RATIO_GROUPING': False, 'SAMPLER_TRAIN': 'TrainingSampler', 'REPEAT_THRESHOLD': 0.0, 'FILTER_EMPTY_ANNOTATIONS': True, 'FILTER_EMPTY_DETS': True, 'FILTER_VISIB_THR': 0.3, 'REMOVE_ANNO_KEYS': []}, 'SOLVER': {'IMS_PER_BATCH': 48, 'REFERENCE_BS': 48, 'TOTAL_EPOCHS': 40, 'OPTIMIZER_CFG': {'type': 'Ranger', 'lr': 0.0008, 'weight_decay': 0.01}, 'GAMMA': 0.1, 'BIAS_LR_FACTOR': 1.0, 'LR_SCHEDULER_NAME': 'flat_and_anneal', 'WARMUP_METHOD': 'linear', 'WARMUP_FACTOR': 0.001, 'WARMUP_ITERS': 1000, 'ANNEAL_METHOD': 'cosine', 'ANNEAL_POINT': 0.72, 'POLY_POWER': 0.9, 'REL_STEPS': (0.5, 0.75), 'CHECKPOINT_PERIOD': 5, 'CHECKPOINT_BY_EPOCH': True, 'MAX_TO_KEEP': 5, 'CLIP_GRADIENTS': {'ENABLED': False, 'CLIP_TYPE': 'value', 'CLIP_VALUE': 1.0, 'NORM_TYPE': 2.0}, 'SET_NAN_GRAD_TO_ZERO': False, 'AMP': {'ENABLED': False}, 'WEIGHT_DECAY': 0.01, 'OPTIMIZER_NAME': 'Ranger', 'BASE_LR': 0.0008, 'MOMENTUM': 0.9}, 'TRAIN': {'PRINT_FREQ': 100, 'VERBOSE': False, 'VIS': False, 'VIS_IMG': False}, 'VAL': {'DATASET_NAME': 'itodd', 'SCRIPT_PATH': 'lib/pysixd/scripts/eval_pose_results_more.py', 'RESULTS_PATH': '', 'TARGETS_FILENAME': 'test_targets_bop19.json', 'ERROR_TYPES': 'mspd,mssd,vsd,ad,reS,teS', 'RENDERER_TYPE': 'cpp', 'SPLIT': 'test', 'SPLIT_TYPE': '', 'N_TOP': -1, 'EVAL_CACHED': False, 'SCORE_ONLY': False, 'EVAL_PRINT_ONLY': False, 'EVAL_PRECISION': False, 'USE_BOP': True, 'SAVE_BOP_CSV_ONLY': True}, 'TEST': {'EVAL_PERIOD': 0, 'VIS': False, 'TEST_BBOX_TYPE': 'est', 'PRECISE_BN': {'ENABLED': False, 'NUM_ITER': 200}, 'AMP_TEST': False, 'COLOR_AUG': False, 'USE_PNP': False, 'SAVE_RESULTS_ONLY': False, 'PNP_TYPE': 'ransac_pnp', 'USE_DEPTH_REFINE': False, 'DEPTH_REFINE_ITER': 2, 'DEPTH_REFINE_THRESHOLD': 0.8, 'USE_COOR_Z_REFINE': False}, 'DIST_PARAMS': {'backend': 'nccl'}, 'MODEL': {'DEVICE': 'cuda', 'WEIGHTS': '', 'PIXEL_MEAN': [0.0, 0.0, 0.0], 'PIXEL_STD': [255.0, 255.0, 255.0], 'LOAD_DETS_TEST': True, 'BBOX_CROP_REAL': False, 'BBOX_CROP_SYN': False, 'BBOX_TYPE': 'AMODAL_CLIP', 'EMA': {'ENABLED': False, 'INIT_CFG': {'decay': 0.9999, 'updates': 0}}, 'POSE_NET': {'NAME': 'GDRN_double_mask', 'XYZ_ONLINE': True, 'XYZ_BP': True, 'NUM_CLASSES': 28, 'USE_MTL': False, 'INPUT_RES': 256, 'OUTPUT_RES': 64, 'BACKBONE': {'FREEZE': False, 'PRETRAINED': 'timm', 'INIT_CFG': {'type': 'timm/convnext_base', 'in_chans': 3, 'features_only': True, 'pretrained': True, 'out_indices': (3,)}}, 'DEPTH_BACKBONE': {'ENABLED': False, 'FREEZE': False, 'PRETRAINED': 'timm', 'INIT_CFG': {'type': 'timm/resnet18', 'in_chans': 1, 'features_only': True, 'pretrained': True, 'out_indices': (4,)}}, 'FUSE_RGBD_TYPE': 'cat', 'NECK': {'ENABLED': False, 'FREEZE': False, 'LR_MULT': 1.0, 'INIT_CFG': {'type': 'FPN', 'in_channels': [256, 512, 1024, 2048], 'out_channels': 256, 'num_outs': 4}}, 'GEO_HEAD': {'FREEZE': False, 'LR_MULT': 1.0, 'INIT_CFG': {'type': 'TopDownDoubleMaskXyzRegionHead', 'in_dim': 1024, 'up_types': ('deconv', 'bilinear', 'bilinear'), 'deconv_kernel_size': 3, 'num_conv_per_block': 2, 'feat_dim': 256, 'feat_kernel_size': 3, 'norm': 'GN', 'num_gn_groups': 32, 'act': 'GELU', 'out_kernel_size': 1, 'out_layer_shared': True}, 'XYZ_BIN': 64, 'XYZ_CLASS_AWARE': True, 'MASK_CLASS_AWARE': True, 'REGION_CLASS_AWARE': True, 'MASK_THR_TEST': 0.5, 'NUM_REGIONS': 64}, 'PNP_NET': {'FREEZE': False, 'LR_MULT': 1.0, 'INIT_CFG': {'type': 'ConvPnPNet', 'norm': 'GN', 'act': 'gelu', 'num_gn_groups': 32, 'drop_prob': 0.0, 'denormalize_by_extent': True}, 'WITH_2D_COORD': True, 'COORD_2D_TYPE': 'abs', 'REGION_ATTENTION': True, 'MASK_ATTENTION': 'none', 'ROT_TYPE': 'allo_rot6d', 'TRANS_TYPE': 'centroid_z', 'Z_TYPE': 'REL'}, 'LOSS_CFG': {'XYZ_LOSS_TYPE': 'L1', 'XYZ_LOSS_MASK_GT': 'visib', 'XYZ_LW': 1.0, 'FULL_MASK_LOSS_TYPE': 'L1', 'FULL_MASK_LW': 1.0, 'MASK_LOSS_TYPE': 'L1', 'MASK_LOSS_GT': 'trunc', 'MASK_LW': 1.0, 'REGION_LOSS_TYPE': 'CE', 'REGION_LOSS_MASK_GT': 'visib', 'REGION_LW': 1.0, 'NUM_PM_POINTS': 3000, 'PM_LOSS_TYPE': 'L1', 'PM_SMOOTH_L1_BETA': 1.0, 'PM_LOSS_SYM': True, 'PM_NORM_BY_EXTENT': False, 'PM_R_ONLY': True, 'PM_DISENTANGLE_T': False, 'PM_DISENTANGLE_Z': False, 'PM_T_USE_POINTS': True, 'PM_LW': 1.0, 'ROT_LOSS_TYPE': 'angular', 'ROT_LW': 0.0, 'CENTROID_LOSS_TYPE': 'L1', 'CENTROID_LW': 1.0, 'Z_LOSS_TYPE': 'L1', 'Z_LW': 1.0, 'TRANS_LOSS_TYPE': 'L1', 'TRANS_LOSS_DISENTANGLE': True, 'TRANS_LW': 0.0, 'BIND_LOSS_TYPE': 'L1', 'BIND_LW': 0.0}}, 'KEYPOINT_ON': False, 'LOAD_PROPOSALS': False}, 'EXP_ID': 'convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd', 'RESUME': False}
Global seed set to 11295325
20221126_142411|core.utils.default_args_setup@144: Full config saved to output/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_itodd.py
20221126_142411|d2.utils.env@41: Using a generated random seed 11295325
20221126_142411|core.utils.default_args_setup@162: Used mmcv backend: cv2
20221126_142411|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 8630300 at 0x7f537f987820>
WARN
20221126_142411|ERR|__main__@233: An error has been caught in function '<module>', process 'MainProcess' (5946), thread 'MainThread' (139998642771776):
Traceback (most recent call last):

> File "/notebooks/gdrnpp_bop2022/./core/gdrn_modeling/main_gdrn.py", line 233, in <module>
    main(args)
    │    └ Namespace(config_file='/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClip...
    └ <function main at 0x7f530caa1b80>

  File "/notebooks/gdrnpp_bop2022/./core/gdrn_modeling/main_gdrn.py", line 199, in main
    Lite(
    └ <class '__main__.Lite'>

  File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/lite/lite.py", line 408, in _run_impl
    return run_method(*args, **kwargs)
           │           │       └ {}
           │           └ (Namespace(config_file='/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalCli...
           └ functools.partial(<bound method LightningLite._run_with_strategy_setup of <__main__.Lite object at 0x7f53f92ee310>>, <bound m...

  File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/lite/lite.py", line 413, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
           │           │       └ {}
           │           └ (Namespace(config_file='/notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalCli...
           └ <bound method Lite.run of <__main__.Lite object at 0x7f53f92ee310>>

  File "/notebooks/gdrnpp_bop2022/./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
               │            │    │                   │                       └ 0
               │            │    │                   └ ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '...
               │            │    └ <module 'ref.itodd' from '/notebooks/gdrnpp_bop2022/core/gdrn_modeling/../../ref/itodd.py'>
               │            └ Config (path: /notebooks/gdrnpp_bop2022/configs/gdrn/itodd_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_class...
               └ <function get_renderer at 0x7f530cfde820>

  File "/notebooks/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 280, in get_renderer
    ren = EGLRenderer(
          └ <class 'lib.egl_renderer.egl_renderer_v3.EGLRenderer'>

  File "/notebooks/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
    │               │                       └ 0
    │               └ <class 'lib.egl_renderer.glutils.egl_offscreen_context.OffscreenContext'>
    └ <lib.egl_renderer.egl_renderer_v3.EGLRenderer object at 0x7f537f987100>

  File "/notebooks/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
    │    └ <function OffscreenContext.init_context at 0x7f530e98e940>
    └ <lib.egl_renderer.glutils.egl_offscreen_context.OffscreenContext object at 0x7f537f987370>

  File "/notebooks/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 227, in init_context
    raise RuntimeError("Bindless Textures not supported")

RuntimeError: Bindless Textures not supported

I could install all the required pacakges in scripts/install_deps.sh and also compiled the egl_renderer via sh ./lib/egl_renderer/compile_cpp_egl_renderer.sh. After the compilation of the egl_renderer, the query devices identifies the gpu:

root@n156egjsrm:/notebooks/gdrnpp_bop2022/lib/egl_renderer# ./build/query_devices
query devices:
num devices: 1

However, the example program egl_renderer_v3 fails with the same errors as the training program:

root@n156egjsrm:/notebooks/gdrnpp_bop2022# python -m lib.egl_renderer.egl_renderer_v3
/usr/local/lib/python3.9/dist-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
WARN
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/notebooks/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 1415, in <module>
    renderer = EGLRenderer(
  File "/notebooks/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/notebooks/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/notebooks/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 227, in init_context
    raise RuntimeError("Bindless Textures not supported")
RuntimeError: Bindless Textures not supported

The program in run in an google-colab type environment.

Do you have any idea how to fix this issue?

训练参数的疑问

王博、刘博你们好:

请问配置文件中的XYZ_CLASS_AWARE=True, MASK_CLASS_AWARE=True, REGION_CLASS_AWARE=True这三个训练参数是什么意思呢?我的理解是是否需要估计xyz map, mask和region中每个pixel所属的类别。请问这种理解是否正确?如果正确的话,还有另一个疑问:

在Double mask GDRNPP中,输入到ConvNext中的是由目标检测网络裁剪出来的部分并resize为256,也就是所输入到ConvNext的图像中是包含物体的类别信息的(在你们提供的text_box中也可以看到),那么按理说就不需要估计每个像素的类别,只需要区分出背景和前景即可。但配置文件中的仍是XYZ_CLASS_AWARE=True, MASK_CLASS_AWARE=True, REGION_CLASS_AWARE=True,请问这是为什么呢?

上述的配置文件为./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py

Train gdrn lmo_pbr WIDTH_DEPTH=True, but got error

Hy,

I tried to train gdrn model with lmo_pbr config file, but I modified INPUT dict with WITH_DEPTH=True param:

OUTPUT_DIR = "output/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo"
INPUT = dict(
WITH_DEPTH=True,
DZI_PAD_SCALE=1.5,
TRUNCATE_FG=False,
CHANGE_BG_PROB=0.5,
COLOR_AUG_PROB=0.8,
COLOR_AUG_TYPE="code",
COLOR_AUG_CODE=(
"Sequential(["
# Sometimes(0.5, PerspectiveTransform(0.05)),
# Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
# Sometimes(0.5, Affine(scale=(1.0, 1.2))),
"Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),"
"Sometimes(0.4, GaussianBlur((0., 3.))),"
"Sometimes(0.3, pillike.EnhanceSharpness(factor=(0., 50.))),"
"Sometimes(0.3, pillike.EnhanceContrast(factor=(0.2, 50.))),"
"Sometimes(0.5, pillike.EnhanceBrightness(factor=(0.1, 6.))),"
"Sometimes(0.3, pillike.EnhanceColor(factor=(0., 20.))),"
"Sometimes(0.5, Add((-25, 25), per_channel=0.3)),"
"Sometimes(0.3, Invert(0.2, per_channel=True)),"
"Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),"
"Sometimes(0.5, Multiply((0.6, 1.4))),"
"Sometimes(0.1, AdditiveGaussianNoise(scale=10, per_channel=True)),"
"Sometimes(0.5, iaa.contrast.LinearContrast((0.5, 2.2), per_channel=0.3)),"
"Sometimes(0.5, Grayscale(alpha=(0.0, 1.0)))," # maybe remove for det
"], random_order=True)"
# cosy+aae
),
)

SOLVER = dict(
IMS_PER_BATCH=12,
TOTAL_EPOCHS=50, # 30
LR_SCHEDULER_NAME="flat_and_anneal",
ANNEAL_METHOD="cosine", # "cosine"
ANNEAL_POINT=0.72,
OPTIMIZER_CFG=dict(delete=True, type="Ranger", lr=8e-4, weight_decay=0.01),
WEIGHT_DECAY=0.0,
WARMUP_FACTOR=0.001,
WARMUP_ITERS=1000,
)

DATASETS = dict(
TRAIN=("lmo_pbr_train",),
TEST=("lmo_bop_test",),
DET_FILES_TEST=("datasets/BOP_DATASETS/lmo/test/test_bboxes/yolox_x_640_lmo_pbr_lmo_bop_test.json",),
)

DATALOADER = dict(
# Number of data loading threads
NUM_WORKERS=8,
FILTER_VISIB_THR=0.3,
)

MODEL = dict(
LOAD_DETS_TEST=True,
PIXEL_MEAN=[0.0, 0.0, 0.0],
PIXEL_STD=[255.0, 255.0, 255.0],
BBOX_TYPE="AMODAL_CLIP", # VISIB or AMODAL
POSE_NET=dict(
NAME="GDRN_double_mask",
XYZ_ONLINE=True,
NUM_CLASSES=2,
BACKBONE=dict(
FREEZE=False,
PRETRAINED="timm",
INIT_CFG=dict(
type="timm/convnext_base",
pretrained=True,
in_chans=3,
features_only=True,
out_indices=(3,),
),
),
## geo head: Mask, XYZ, Region
GEO_HEAD=dict(
FREEZE=False,
INIT_CFG=dict(
type="TopDownDoubleMaskXyzRegionHead",
in_dim=1024, # this is num out channels of backbone conv feature
),
NUM_REGIONS=64,
XYZ_CLASS_AWARE=True,
MASK_CLASS_AWARE=True,
REGION_CLASS_AWARE=True,
),
PNP_NET=dict(
INIT_CFG=dict(norm="GN", act="gelu"),
REGION_ATTENTION=True,
WITH_2D_COORD=True,
ROT_TYPE="allo_rot6d",
TRANS_TYPE="centroid_z",
),
LOSS_CFG=dict(
# xyz loss ----------------------------
XYZ_LOSS_TYPE="L1", # L1 | CE_coor
XYZ_LOSS_MASK_GT="visib", # trunc | visib | obj
XYZ_LW=1.0,
# mask loss ---------------------------
MASK_LOSS_TYPE="L1", # L1 | BCE | CE
MASK_LOSS_GT="trunc", # trunc | visib | gt
MASK_LW=1.0,
# full mask loss ---------------------------
FULL_MASK_LOSS_TYPE="L1", # L1 | BCE | CE
FULL_MASK_LW=1.0,
# region loss -------------------------
REGION_LOSS_TYPE="CE", # CE
REGION_LOSS_MASK_GT="visib", # trunc | visib | obj
REGION_LW=1.0,
# pm loss --------------
PM_LOSS_SYM=True, # NOTE: sym loss
PM_R_ONLY=True, # only do R loss in PM
PM_LW=1.0,
# centroid loss -------
CENTROID_LOSS_TYPE="L1",
CENTROID_LW=1.0,
# z loss -----------
Z_LOSS_TYPE="L1",
Z_LW=1.0,
),
),
)

VAL = dict(
DATASET_NAME="lmo",
SCRIPT_PATH="lib/pysixd/scripts/eval_pose_results_more.py",
TARGETS_FILENAME="test_targets_bop19.json",
ERROR_TYPES="vsd", #"mspd,mssd,vsd,ad,reS,teS",
RENDERER_TYPE="python", # cpp, python, egl
SPLIT="test",
SPLIT_TYPE="",
N_TOP=1, # SISO: 1, VIVO: -1 (for LINEMOD, 1/-1 are the same)
EVAL_CACHED=False, # if the predicted poses have been saved
SCORE_ONLY=False, # if the errors have been calculated
EVAL_PRINT_ONLY=False, # if the scores/recalls have been saved
EVAL_PRECISION=False, # use precision or recall
USE_BOP=True, # whether to use bop toolkit
)

TEST = dict(EVAL_PERIOD=0,
VIS=True,
USE_PNP = True,
USE_DEPTH_REFINE = True,
TEST_BBOX_TYPE="est") # gt | est

I started training, but I got the following error:

File "/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/models/GDRN_double_mask.py", line 102, in forward
conv_feat = self.backbone(x) # [bs, c, 8, 8]
│ └ tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1....
└ GDRN_DoubleMask(
(backbone): FeatureListNet(
(stem_0): Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
(stem_1): ...

File "/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
│ │ └ {}
│ └ (tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1...
└ <bound method FeatureListNet.forward of FeatureListNet(
(stem_0): Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
(stem...
File "/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/timm/models/features.py", line 232, in forward
return list(self._collect(x).values())
│ │ └ tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ │ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1....
│ └ <function FeatureDictNet._collect at 0x7ff6633fdca0>
└ FeatureListNet(
(stem_0): Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
(stem_1): LayerNorm2d((128,), eps=1e-06, elem...
File "/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/timm/models/features.py", line 203, in _collect
x = module(x)
│ └ tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1....
└ Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
File "/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
│ │ └ {}
│ └ (tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1...
└ <bound method Conv2d.forward of Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))>
File "/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
│ │ │ │ └ Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
│ │ │ └ Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
│ │ └ tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ │ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1....
│ └ <function Conv2d._conv_forward at 0x7ff68116e430>
└ Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
File "/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
│ │ │ │ │ │ └ (4, 4)
│ │ │ │ │ └ Conv2d(3, 128, kernel_size=(4, 4), stride=(4, 4))
│ │ │ │ └ Parameter containing:
│ │ │ │ tensor([-2.7726e-02, -5.4137e-02, -5.3593e-02, -5.1021e-02, -2.3165e-02,
│ │ │ │ -5.1199e-02, -3.0773e+...
│ │ │ └ Parameter containing:
│ │ │ tensor([[[[ 9.2363e-02, -7.1596e-02, 3.0232e-02, -1.5385e-03],
│ │ │ [ 1.9584e-02, -9.8770e-02, -3...
│ │ └ tensor([[[[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
│ │ [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1....
│ └ <built-in method conv2d of type object at 0x7ff6f1d1aec0>
└ <module 'torch.nn.functional' from '/home/itqs/anaconda3/envs/bop/lib/python3.8/site-packages/torch/nn/functional.py'>

RuntimeError: Given groups=1, weight of size [128, 3, 4, 4], expected input[8, 4, 256, 256] to have 3 channels, but got 4 channels instead

What should I change to be able to train with depth image?

Using a custom dataset in the BOP format

I am trying to train the network using a custom dataset. I've read I need to have it organized in the BOP format, which I've (almost) successfully done. My data is in .h5 format and I was able to extract the data in a similar fashion.

My only problem is the file scene_gt_info.json which contains info like bbox_obj and bbox_visib, which, according to the website, needs depth information to calculate the visible parts/ pixels etc. I wanted to train the RGB only GDRNet, since I generated only RGB data. It seems the GDRNet itself does not necessarily need the depth info, but then how can I generate a file similar to scene_gt_info.json to feed to the network?

403 forbidden when downloading ycbv dataset

wget http://ptak.felk.cvut.cz/6DB/public/bop_datasets/ycbv_models.zip
--2022-12-07 17:34:55-- http://ptak.felk.cvut.cz/6DB/public/bop_datasets/ycbv_models.zip
Resolving proxy.cse.cuhk.edu.hk (proxy.cse.cuhk.edu.hk)... 137.189.90.241, 137.189.90.217, 137.189.88.229
Connecting to proxy.cse.cuhk.edu.hk (proxy.cse.cuhk.edu.hk)|137.189.90.241|:8000... connected.
Proxy request sent, awaiting response... 403 Forbidden
2022-12-07 17:34:55 ERROR 403: Forbidden.

pose_refine environment

Hi! The envornment.yaml file does not seem to be doing the job, I'm getting a lot of errors. Some packages on there don't specify versions, which is causing the issues. I've tried various versions of the packages but could not find the right combination. Can you share a more detailed list of the packages than the yaml file? It would also be great if you can share the anaconda environment using conda-pack

ModuleNotFoundError: No module named 'bop_renderer'

Hi, I have a problem when eval the pretrained model with T-less dataset.
Is there any way to fix it?

import bop_renderer ModuleNotFoundError: No module named 'bop_renderer' Traceback (most recent call last): File "/home/lab/Desktop/gdrnpp_bop2022-main/lib/pysixd/scripts/eval_pose_results_more.py", line 301, in <module> raise RuntimeError("Calculation of pose errors failed.") RuntimeError: Calculation of pose errors failed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.