Giter Site home page Giter Site logo

yvanyin / metric3d Goto Github PK

View Code? Open in Web Editor NEW
785.0 23.0 53.0 290.62 MB

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."

Home Page: https://jugghm.github.io/Metric3Dv2/

License: Creative Commons Zero v1.0 Universal

Python 99.90% Shell 0.10%
3d-reconstruction 3d-scenes monocular-depth monocular-depth-estimation depth depth-map metric-depth-estimation single-image-depth-prediction zero-shot-transfer zero-shot

metric3d's Introduction

🚀 Metric3D Project 🚀

Official PyTorch implementation of Metric3Dv1 and Metric3Dv2:

[1] Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

[2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

PWC

PWC

PWC

PWC

PWC

🏆 Champion in CVPR2023 Monocular Depth Estimation Challenge

News

  • [2024/4/25] Weights for ViT-giant2 model released!
  • [2024/4/11] Training codes are released!
  • [2024/3/18] HuggingFace 🤗 GPU version updated!
  • [2024/3/18] Project page released!
  • [2024/3/18] Metric3D V2 models released, supporting metric depth and surface normal now!
  • [2023/8/10] Inference codes, pre-trained weights, and demo released.
  • [2023/7] Metric3D accepted by ICCV 2023!
  • [2023/4] The Champion of 2nd Monocular Depth Estimation Challenge in CVPR 2023

🌼 Abstract

Metric3D is a strong and robust geometry foundation model for high-quality and zero-shot metric depth and surface normal estimation from a single image. It excels at solving in-the-wild scene reconstruction. It can directly help you measure the size of structures from a single image. Now it achieves SOTA performance on over 10 depth and normal benchmarks.

depth_normal

metrology

📝 Benchmarks

Metric Depth

Our models rank 1st on the routing KITTI and NYU benchmarks.

Backbone KITTI δ1 ↑ KITTI δ2 ↑ KITTI AbsRel ↓ KITTI RMSE ↓ KITTI RMS_log ↓ NYU δ1 ↑ NYU δ2 ↑ NYU AbsRel ↓ NYU RMSE ↓ NYU log10 ↓
ZoeDepth ViT-Large 0.971 0.995 0.053 2.281 0.082 0.953 0.995 0.077 0.277 0.033
ZeroDepth ResNet-18 0.968 0.996 0.057 2.087 0.083 0.954 0.995 0.074 0.269 0.103
IEBins SwinT-Large 0.978 0.998 0.050 2.011 0.075 0.936 0.992 0.087 0.314 0.031
DepthAnything ViT-Large 0.982 0.998 0.046 1.985 0.069 0.984 0.998 0.056 0.206 0.024
Ours ViT-Large 0.985 0.998 0.999 1.985 0.064 0.989 0.998 0.047 0.183 0.020
Ours ViT-giant2 0.989 0.998 1.000 1.766 0.060 0.987 0.997 0.045 0.187 0.015

Affine-invariant Depth

Even compared to recent affine-invariant depth methods (Marigold and Depth Anything), our metric-depth (and normal) models still show superior performance.

#Data for Pretrain and Train KITTI Absrel ↓ KITTI δ1 ↑ NYUv2 AbsRel ↓ NYUv2 δ1 ↑ DIODE-Full AbsRel ↓ DIODE-Full δ1 ↑ Eth3d AbsRel ↓ Eth3d δ1 ↑
OmniData (v2, ViT-L) 1.3M + 12.2M 0.069 0.948 0.074 0.945 0.149 0.835 0.166 0.778
MariGold (LDMv2) 5B + 74K 0.099 0.916 0.055 0.961 0.308 0.773 0.127 0.960
DepthAnything (ViT-L) 142M + 63M 0.076 0.947 0.043 0.981 0.277 0.759 0.065 0.882
Ours (ViT-L) 142M + 16M 0.042 0.979 0.042 0.980 0.141 0.882 0.042 0.987
Ours (ViT-g) 142M + 16M 0.043 0.982 0.043 0.981 0.136 0.895 0.042 0.983

Surface Normal

Our models also show powerful performance on normal benchmarks.

NYU 11.25° ↑ NYU Mean ↓ NYU RMS ↓ ScanNet 11.25° ↑ ScanNet Mean ↓ ScanNet RMS ↓ iBims 11.25° ↑ iBims Mean ↓ iBims RMS ↓
EESNU 0.597 16.0 24.7 0.711 11.8 20.3 0.585 20.0 -
IronDepth - - - - - - 0.431 25.3 37.4
PolyMax 0.656 13.1 20.4 - - - - - -
Ours (ViT-L) 0.688 12.0 19.2 0.760 9.9 16.4 0.694 19.4 34.9
Ours (ViT-g) 0.662 13.2 20.2 0.778 9.2 15.3 0.697 19.6 35.2

🌈 DEMOs

Zero-shot monocular metric depth & surface normal

Zero-shot metric 3D recovery

Improving monocular SLAM

🔨 Installation

One-line Installation

For the ViT models, use the following environment:

pip install -r requirements_v2.txt

For ConvNeXt-L, it is

pip install -r requirements_v1.txt

dataset annotation components

With off-the-shelf depth datasets, we need to generate json annotaions in compatible with this dataset, which is organized by:

dict(
	'files':list(
		dict(
			'rgb': 'data/kitti_demo/rgb/xxx.png',
			'depth': 'data/kitti_demo/depth/xxx.png',
			'depth_scale': 1000.0 # the depth scale of gt depth img.
			'cam_in': [fx, fy, cx, cy],
		),

		dict(
			...
		),

		...
	)
)

To generate such annotations, please refer to the "Inference" section.

configs

In mono/configs we provide different config setups.

Intrinsics of the canonical camera is set bellow:

    canonical_space = dict(
        img_size=(512, 960),
        focal_length=1000.0,
    ),

where cx and cy is set to be half of the image size.

Inference settings are defined as

    depth_range=(0, 1),
    depth_normalize=(0.3, 150),
    crop_size = (512, 1088),

where the images will be first resized as the crop_size and then fed into the model.

✈️ Training

Please refer to training/README.md.

✈️ Inference

News: Pytorch Hub is supported

Now you can use Metric3D via Pytorch Hub with just few lines of code:

import torch
model = torch.hub.load('yvanyin/metric3d', 'metric3d_vit_small', pretrain=True)
pred_depth, confidence, output_dict = model.inference({'input': rgb})

Supported models: metric3d_convnext_large, metric3d_vit_small, metric3d_vit_large, metric3d_vit_giant2.

We also provided a minimal working example in hubconf.py, which hopefully makes everything clearer.

Download Checkpoint

Encoder Decoder Link
v1-T ConvNeXt-Tiny Hourglass-Decoder Coming soon
v1-L ConvNeXt-Large Hourglass-Decoder Download
v2-S DINO2reg-ViT-Small RAFT-4iter Download
v2-L DINO2reg-ViT-Large RAFT-8iter Download
v2-g DINO2reg-ViT-giant2 RAFT-8iter Download 🤗

Dataset Mode

  1. put the trained ckpt file model.pth in weight/.
  2. generate data annotation by following the code data/gene_annos_kitti_demo.py, which includes 'rgb', (optional) 'intrinsic', (optional) 'depth', (optional) 'depth_scale'.
  3. change the 'test_data_path' in test_*.sh to the *.json path.
  4. run source test_kitti.sh or source test_nyu.sh.

In-the-Wild Mode

  1. put the trained ckpt file model.pth in weight/.
  2. change the 'test_data_path' in test.sh to the image folder path.
  3. run source test_vit.sh for transformers and source test.sh for convnets. As no intrinsics are provided, we provided by default 9 settings of focal length.

Metric3D and Droid-Slam

If you are interested in combining metric3D and monocular visual slam system to achieve the metric slam, you can refer to this repo.

❓ Q & A

Q1: Why depth maps look good but pointclouds are distorted?

Because the focal length is not properly set! Please find a proper focal length by modifying codes here yourself.

Q2: Why the pointclouds are too slow to be generated?

Because the images are too large! Use smaller ones instead.

Q3: Why predicted depth maps are not satisfactory?

First be sure all black padding regions at image boundaries are cropped out. Then please try again. Besides, metric 3D is not almighty. Some objects (chandeliers, drones...) / camera views (aerial view, bev...) do not occur frequently in the training datasets. We will going deeper into this and release more powerful solutions.

📧 Citation

@article{hu2024metric3dv2,
  title={Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation},
  author={Hu, Mu and Yin, Wei and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao and Wang, Kaixuan and Yu, Gang and Shen, Chunhua and Shen, Shaojie},
  journal={arXiv preprint arXiv:2404.15506},
  year={2024}
}
@article{yin2023metric,
  title={Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image},
  author={Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen},
  booktitle={ICCV},
  year={2023}
}

License and Contact

The Metric 3D code is under a 2-clause BSD License for non-commercial usage. For further questions, contact Dr. Wei Yin [[email protected]] and Mr. Mu Hu [[email protected]].

metric3d's People

Contributors

jugghm avatar yvanyin avatar zachl1 avatar zhipengcai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metric3d's Issues

Measure object size

Hi,

Can know how to measure the structure size from point cloud to compare it with the GT as shown in Figure 1 and Figure 7 of paper?

The mean [123.675, 116.28, 103.53] is in the order of BGR or RGB?

Hello, thank you for your kind words!

But I have a question while using it. In the code, when normalizing the image, is the order of mean = torch.tensor([123.675, 116.28, 103.53]) in RGB or BGR?

I found that the line of code rgb_origin = cv2.imread(an['rgb'])[:, :, ::-1].copy().astype(np.float32) in

rgb_origin = cv2.imread(an['rgb'])[:, :, ::-1].copy()
has already converted the default BGR format, which is read by cv2.imread(), to RGB when reading the image.

Then, in this line of code rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB) in https://github.com/YvanYin/Metric3D/blob/377e6c6642d0aca7aaa5a19e58fdcf5d0fd3d910/mono/utils/do_test.py#L189C5-L189C47, there is another conversion, which means the image is converted back to the BGR order. However, your comment says "BGR->RGB", so I would like to know whether the mean used for normalization is in RGB order or BGR order?

If I understand correctly, this normalization method is used to match the pre-trained ConvNext. However, it seems there is ambiguity here. Did you normalize the data in the training process in the same way, following the BGR order?

Question about canonical camera parameters

Thanks for your great work.
When I dive into the code about canonical ratio,I found that the canonical ratio was normalized by the real camera focal length x(f_x) in the part of preparing gt depth label(transform.py),but it was normalized by (f_x+f_y)/2 in the test script(do_test.py).Could the inconsistency lead to deep confusion?
Looking forward to your reply!

How to combine with Droid-slam?

Hi, thanks for contributing such great work! I was wondering how you combined monocular depth with Droid-slam? Is the monocular depth used for Droid-slam initialization? Or is it just taking the pose provided by Droid-slam and fusing it with TSDF?

How can we test our own images?

Hi, thanks for your excellent work. I'm wondering how can we test the metric depth of our own images. Besides, if I have an in-the-wild image, how can I test the size of the objects in it? For example, how do you get the size of the table in Figure 1?

[Questions regarding focal length and input depth scale]

Hi! thank you for the great work!!

I was just wondering if I have depth in mm unit, can I get the result of mm unit of ply? If so, what should I change?

Is it right that I change convlarge.0.3_150.py's focal_length to 1.0?
image

And also for the gene_annos_*.py file, should I change it's depth_scale into 1.?
image

Thank you in advance!

In-the-Wild Mode,水平场景真值5米,预测16.5米;倾斜场景人身高5米。

我遇到绝对深度有偏差大的问题,请教一下是配置错误还是模型泛化问题。 @JUGGHM
背景:用例是在室内停车场水平拍摄一面墙,距离墙面真值已知。
过程:根据Readme中In-the-Wild Mode指导更换内参为[520, 520, 472.4, 276.2](像素单位),输入图像分辨率960 * 540。相机原分辨率是1920 * 1080,内参是[520, 520, 472.4, 276.2]的2 倍。
问题:距离墙面真值5米,预测16.5米,图片中基本只有左侧有车头,天花板有些架子。真值8米,预测是21.4米,图片背景类似,左侧有2个车头。
测距不准是什么原因呢?是配置错误还是模型泛化问题。
另外论文里统一到1000焦距,这是um单位,还是像素单位?而代码里配置的'intrinsic'参数是像素单位吧?

此外,还在室内顶装俯仰角大概20°,存在人的身高偏差大,为5米左右。同张图不同远近的人身高是接近的。假设人的身高是1.7米。这个偏差是什么造成的?

感谢解答!

How is the confidence modelled?

Hi, thanks for the great work. I am curious about how you have trained your network to predict confidence. Does one of the losses mentioned in the paper supervise the confidence estimate? If not, then could you please provide the loss function you used to supervise the confidence?

Furthermore, it seems like the confidence is appropriately predicted in the range [0, 1], however, the output of the network block that outputs the confidences is only a convolutional layer, and I see no sigmoid layer. How come?

Thanks!

How to use CSTM_label method.

感谢作者团队的出色的工作!
我想复现训练,但对pytorch不太熟悉,可以提供些建议么?
此外我有点小问题,请指教下哈。
1、公布的推理代码是只用了变化深度标签CSTM_label方法吗?没用CSTM_image吧?
pred_depth = pred_depth * normalize_scale / scale_info

所以如果我训练,只要按照CSTM_label方法,把深度图数据按焦距系数转换就像吗?训练分辨率是怎么设置的,和crop_size的关系是什么,网络可以输入不同分辨率的数据训练吗?

2、如果是CSTM_label,实际上比例是 ,
[label_scale_factor = cano_label_scale_ratio * resize_label_scale_ratio,比论文中的cano_label_scale_ratio](url) 还多了个 resize_label_scale_ratio。而resize_label_scale_ratio是因为输入图像要调整成crop_size造成的,调整是先resize再pad。这个label_scale_factor 可以补充解释下么?
3、内参有作为模型的输入,在推理或者训练中使用吗?
项目中有相机内存模型构建方式,应该是构建了cam_model 内容是 [x_center, y_center, fov_x, fov_y]

cam_model = build_camera_model(reshape_h, reshape_w, intrinsic),
cam_model = np.stack([x_center, y_center, fov_x, fov_y], axis=2)

data字典的构建有input图像和cam_model,

    data = dict(
        input=input,
        cam_model=cam_model,
    )

但实际推理中好像又没用到cam_model ?
这个cam_model 是什么作用?有参与推理或者训练吗?

谢谢!

NYU scene reconstruction

The poses are from colmap, how do you align its scale to real world scale?
The depths are meters but poses from colmap are of arbitrary unit, there must be some sort of alignment.

Reproducibility Zero-shot results

Thank you for your work!

I can reproduce your results on KITTI and NYU, but not on the other zero-shot datasets.
I tried the 3 different configs you provided under mono/config, but I could not reproduce them.
I would like to know what config and pre-processing you are using, especially the crop_size and the canonical focal length.

Thank you in advance.

The use of DIML dataset

Hi, thanks for your amazing work!

I find that you use ~122K training images from the DIML dataset, however, the official raw DIML dataset seems to contain 2M training images in total. Could you please introduce how to obtain these 122K images from DIML? Thank you very much.

疑问

很抱歉再次打扰您,第一张图是直接使用在github下载的图片,就是初始的nyu_demo图的测试结果;第二张图是我从nyu数据集挑了10的测试结果。仅仅只是换了测试图,w/o match当中的delta却均为0,而median match和global match是正常数值,这种结果是否具有可解释性呢?

Uploading 30d8fbf391e42f23b56acf9a0192ba9.png…
361eff3be0d834862aa1fa10da0598a

A question about the depth meaning,thanks!

hello,Thanks for your great work! I have a small question,for each pixel point, does the predicted depth mean the euclidean distance from the surface point of the object to the center of the camera's optical center , or the vertical distance to the image?

模型加速问题

请问模型可以进行TensorRT或者其他方式进行推理加速吗

Another example on real data

截圖 2023-08-12 下午12 25 26

IMG_7002

IMG_7004

So I compared the result v.s. real distance. (The focal is from iphone) It seems that there is still a large gap. The relative error is 33% and 36% in the two measurements.

Some examples in figure 7 also shows this big error, so I guess this is what to expect. This is not a bug, just some report.

Question about the distance in ply

Hi,Thanks for your greate work!
I find that when measure the distance near the postion i capture the image(~2m) is lager than position far away (~17m);
the near postion's difference with ground truth is about 0.1m;
the far postion's difference with ground truth is about 0.05m;
I am very confused about this?

how to remove the time about inference

Dear author,
I want to predict the depth in our own dataset with your model. There are more than 100 thousand images in the dataset. We want to remove the note of time about prediction such as 00:00, ?it/s ? I can't find the code to print it

how to predict depth of images from different at the same time

Dear author,
I want to predict the depth from many images captured by different cameras. it will be spend long time if we put a single image to the model every time. So I want to put a batch of images to the model one time, but there is a problem, the camera has different intrinsics, and cannot use a normal cameral model. I have tried to the inference function, but miss in "get_func('mono.model.model_pipelines.' + model_type)(cfg)"

Plan about code release

Hi, thanks for your excellent! I'm sure that It will be helpful to the computer vision field. I'm wondering when you will release the code.

defective prediction for large focal length

          在训练中,大部分数据也并不1000。比如taskonomy,大部分在500-700左右,测试的NYU等数据集也并不在500左右。所以并不是平均值在1000。关于这部分的ablation,还可以继续深入探索一下。

Originally posted by @YvanYin in #3 (comment)
Hi, thanks for your great work! I think I met a duplicate of this issue when testing on different dataset. When inferring on outdoor images with 'intrinsic': [1966.9, 1969.5, 948.7, 498.4], the predicted depth is unexpectedly poor. I tried different crop_size, but it didn't help much, RMSE is about 10m on sparse lidar measurements.

image

for SHIFT dataset where 'intrinsic': [640, 640, 640, 400], the result is much more reasonable, RMSE is about 7m.
image

This performance difference seems to be related to the training data, could you share your ideas on this? Thank you!

模型输出单位

请问模型输出的深度图单位是米吗?实测米为单位的话误差还不小。

深度丢失

作者大大,您好!在运行test.sh的时候无法获取深度
Uploading 微信图片_20231106170420.png…

Incorrect scale

Thank you for your excellent work! We use correct intrinsics, but the scale is wrong. The ground truth (GT) width of the two lines is 3.5m, but the estimated width is 15m~25m. We have tried adjusting the 'crop_size', but we find that it is a sensitive parameter. Different values have a large influence on the results. Can you give me some advice to solve these problems? Thank you.

Fine tuning the model on NYU datasets

Thanks for your amazing work!

  1. I really want to know the performance if you fine tuning the model on NYU datasets, I guess you can win the championship.
    Have you ever tried this? If so, please tell me!
  2. Will you make the training code available? Thank you very much!
  3. I am confused about the unit of fx and fy, is it (mm) or (pixels)?
    Looking forward to your reply!

输出显示没有gt_depth

@JUGGHM 作者大大您好,我有俩个问题想请教一下;
1、我有一个室内的数据集,我似乎应该跑dataset mode,但是我注意到这个mode需要深度数据作为输入,这个让我很困惑?如果我确实应该跑这个模式,我应该怎么做呢?
2、我用这个室内的数据集跑in-the-wild模式,我设置了内参,代码也可以正常的跑完,但是显示缺少gt_depth数据,这个是哪里有问题吗?我看到有和我类似问题的, 但是那个回复并没有解释我的疑惑,所以我想再进一步请教一下
image

image

关于模型泛化性的问题

大佬您好,我看您这篇论文中用了大量的RGBD数据集,模型的泛化性比较好是不是主要原因是混合了大量数据集呢?还是说这种强泛化性也得益于将不同相机都规范化的做法?我之前做的是针对于室内场景的单模态RGB深度估计,我混合了几个公开RGBD数据集,还有我自己采集的一些实验场景的RGBD数据(普通混合,并没有用过这种相机规范化做法)。但是,泛化性依旧不是很好,只要物体换个场景,我这边预测的深度误差就非常大了。不知道使用您这种相机规范化方法,只是混合室内场景的数据,泛化性能不能有所提高?

Using Mapillary Dataset

Hi there, very inspiring work! I've been telling everyone around me that this is my favourate paper recently, and I personally regard Metric3D as one of the milestones in MDE.

I have a question about Mapillary Planet-Scale Depth Dataset that is uesd in this paper: what is the scale of the ground truth in Mapillary dataset? When we read depth images and convert them into metric 2D maps, we divide them using a scaling factor like metric_depth = depth / 512 or something, and provide metric depth supervision. I tried to find this scaling factor for a long time and couldn't get any clue, I wonder how you get the metric depth from the Mapillary dataset?

Looking forward to your reply!
Mochu

Single-machine multi-GPU training or inference

Thank you for contributing such a great job.
I want to use Metric3D for single-machine multi-GPU training or testing, but I found that it is missing some information, such as: dist_params in the configuration.
Do you have any suggestions on this?

Test case

Can I give you a code to test a single image?
I would appreciate your reply.

ScanNet evaluation set

Hello, thanks for your great work. Could you let me know which subset of ScanNet you used for evaluation? I would be even better if you could share the file list.

[Q] Replicate Dense-SLAM Mapping

Hello, excellent work!!
By the way, may I please ask you about the Dense-SLAM experiment in the paper?

To my understanding, the weight for the "Droid" part is an off-the-shelf model that is trained on a TartanAir (released here: https://github.com/princeton-vl/DROID-SLAM), and the other hyperparameter is following the demo configuration of that repository. Is this correct?

In addition, I'm just curious about the input image: What the size of the image is fed into the Droid module? Or following one of the the cited papers, reshaped into a bit smaller size, like [192,640]?

Thank you very much for your support!

How is the camera model (4-channel map) used during model inference?

Thanks for contributing such a good job! When I studied the test code (do_test.py), I found that you have converted the camera intrinsic parameters into a 4-channel map (principle point location and FoV map) as the camera model. The camera model is further passed into the model inference part.

However, when I checked the decoder code (HourGlassDecoder.py), I found that the camera model does not seem to be used, so I would like to ask where the camera model is used during model inference? Is it possible to perform model inference as described in the paper using only camera intrinsic parameters without generating a camera model?

Looking forward to your reply and tips!

关于论文3.1章中公式的问题

论文3.1节中, 深度,物体面积,物体成像面积,以及焦距之间的几何关系,为什么是\hat{S}比上\hat{S}^\prime,而不是\hat{S}比上\hat{S}^\prime外面再加个根号啊。假设物体是一个圆,投影也是一个圆,假设深度是焦距的2倍,那么圆的面积应该是投影面积的4倍呀(如下图所示)。想不明白,请教一下。
q

How to combine Metric3d and Droid-slam?

Hello, I ran Metric3d and Droid-slam separately yesterday, but I have a question. Can you provide some ideas on how to combine the two? The depth map is estimated through Metric3d and then input into Droid- Is auxiliary modeling performed in slam? Is this done to give the map a scale?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.