Giter Site home page Giter Site logo

apple / arkitscenes Goto Github PK

View Code? Open in Web Editor NEW
625.0 625.0 54.0 132 KB

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.

License: Other

Python 99.20% Shell 0.80%

arkitscenes's People

Contributors

afshindn avatar arik1089 avatar gofinge avatar levintsky avatar peterzhefu avatar sekunde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arkitscenes's Issues

Incorrect empty gt verification?

Looks like here we need to replace len(gt) == 0 with len(gt) == 0 or len(gt.get('data', [])) == 0 otherwise for scene 42897846 we get ValueError: need at least one array to concatenate here. Am I missing something?

Pretrained Depth Upsampling Model

Dear ARKitScene Team,

Last time I viewed the github, I feel that I saw the pretrained depth upsampling models previously.
I must be mistaken as I do not find it on github.

Is it possible to release the pretrained depth upsampling models?

Thank you in advance.

Error in downloading [Bug Report]

Hi I noticed that the variable 'splits' is not defined anywhere in your 'download_data.py'

running:

python3 download_data.py upsampling --download_dir .../ARkitScenes --split Validation --video_id_csv depth_upsampling/upsampling_train_val_splits.csv 

yields error

...
...
Downloading file /home/nandometzger/data/ARkitScenes/upsampling/Validation/48458667.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 53.7M  100 53.7M    0     0  18.1M      0  0:00:02  0:00:02 --:--:-- 18.1M
Unzipping zip file /home/nandometzger/data/ARkitScenes/upsampling/Validation/48458667.zip
Traceback (most recent call last):
  File "/home/nandometzger/misc/ARKitScenes/download_data.py", line 273, in <module>
    download_data(args.dataset,
  File "/home/nandometzger/misc/ARKitScenes/download_data.py", line 202, in download_data
    if dataset == 'upsampling' and VALIDATION in splits:
NameError: name 'splits' is not defined

gt depth and gt color

hi, I download upsampling data, there are 4 types assets, confidence, lowres_depth, highres_depth and wide.
Q1: is the highres_depth corresponding to wide_rgb,
Q2: is the 'lowres_depth' captured by iphone and meanwhile corresponding to wide?

raw IMU information

Hi, thanks for your great work in 3d understanding.
Would you have plan to add raw imu information into the open dataset for more tasks(e.g VIO)?

Duplicate scans / multiple rooms in some scenes

Hi team,

Thanks for releasing the Faro point clouds. It looks like some scenes such as 421006 have 2 copies of the point cloud, each containing 4 scans -

  • 169952, 169953, 169955, 169962
  • 170029, 170030, 170034, 170036

A visualization -

image

Additionally some scenes like 422013 have 2 rooms that are far apart -
image

Can you let us know how to handle these?

No raw assets given for video id xxxxxxx

Hi I tried to download the raw dataset. The command is

python download_data.py raw --split Training --video_id_csv raw/raw_train_val_splits.csv --download_dir .

but it only shows "Warning: No raw assets given for video id xxxxxxxx" for all ids in the csv file.

About lowres_wide.traj for ultra and vga images

Hello,
Thank you for your sharing!
Could you please provide some details about this dataset, such as :

  1. Whether the lowres_wide.traj can be used as the extrinsic of the images in vga_wide and ultra_wide dictionary...
  2. The total size of raw data using the script download_data.py, it seemed over 7TB...

How should I synchronize the .mov video and low-resolution depthmaps?

I'm trying to download the datasets from RAW splits of ARKitScenes(the views of images in depth upsampling split is too sparse for me). I find there are only .MOV video and low-resolution depthmaps in most of scenes of RAW. So I want to synchronize the video and low-resolution depthmaps. How should I make it? How the timestamps of depthmaps and .mov videos are synchronized?

Raw Dataset Size

Hi,

thank you for providing this massive dataset! What is approximately the size of the raw data? Is it around 13 TB?

Thanks in advance!

Access to https://docs-assets.developer.apple.com/ml-research/datasets/arkitscenes/v1 denied

I would like to download the dataset but the access to https://docs-assets.developer.apple.com/ml-research/datasets/arkitscenes/v1 is denied

When using the example command, e.g.

python download_data.py raw --split Training --video_id 47334522 --download_dir raw

I get the error message
KeyError: 'ARkitscense_url'

and if I want to access in a browser I get:

AccessDenied Access Denied FY8DYK2G1GDQK3YN T9z6QNoG8j1Tv7a6BG8IqEsyp+cqlx0Avgj7wKSq4kWbsCKyZZn/I3J8aP0OWdjAsu9OxP5yWI4=

Many thanks in advance for your help.

Question about the depth data collected from Apple device.

Hello, this is really a nice work,
However, as far as I know, the lidar equipped in Apple device can only acquire 9X64 points at a time. So I wonder how can you acquire the depth map in real-time?
Is it generated by fusing the depth information from the lidar sensor and other information(such as RGB and IMU) through "sceneDepth" API?

Question about camera orientation (portrait and landscape)

I am trying to extract frames from the Raw dataset and running into trouble/confusion related to the orientation of the images which vary between Portrait and Landscape modes from video to video.

Here are a few questions I have on this topic.

  1. Is there any annotation or way of determining what the correct orientation is for the Raw images/annotations/intrinsic? Most seem to be rotated by -90 degrees but not all as far as I can tell. It seems that videos in Landscape mode are mostly not rotated but could occasionally be upside down in my tests.

  2. Are the Raw videos always in the "correct" orientation? They seem to be at a glance, so I have assumed this for now.

  3. Is it known whether the camera operators switch between Landscape and Portrait modes in the middle of a video? If it's not known, then was it an intention?

Thank you and sorry if this is covered somewhere in the code that I missed.

Downloading of raw videos faills for some video ids

Hi, do you have any idea why the following command fails on some of the video ids and succeed on others?

Failed

python download_data.py raw --split Training--video_id 40777060 --download_dir .

Succed

python download_data.py raw --split Validation --video_id 48458667 --download_dir .

Thanks, Noam

The upsampling dataset

Hi, @PeterZheFu

i download the upsampling dataset, there are 4 types assets, confidence, lowres_depth, highres_depth and wide_rgb.

The highres_depth is the ground-truth or the result of upsampling network(MSG/MSPF) ?

If i want to train the network , how to generate the metadata.csv file ? what does this file mean ?

confidence map

Dear authors of the ARKitScenes dataset,

I would like to say thank you for publishing this valuable dataset. I am writing to ask about the method that you used to create the confidence map. Specifically, I am wondering if there is a way to replicate the same method that ARKit uses to create the confidence map for other images.

Thank you for your time and assistance.

RAW download links are broken?

Thank you for providing a well-maintained dataset! And the super helpful APIs.

I got Warning: No raw assets given for video id for all RAW downloads.

Cannot download dataset: 403 error

curl: (22) The requested URL returned error: 403
Error downloading https://docs-assets.developer.apple.com/ml-research/datasets/arkitscenes/v1/raw/Training/42897688/42897688.mov, error: Command 'curl https://docs-assets.developer.apple.com/ml-research/datasets/arkitscenes/v1/raw/Training/42897688/42897688.mov -o /scratch/quanta/Datasets/ARKitScenes_temp/raw/Training/42897688/42897688.mov.tmp --fail' returned non-zero exit status 22.

Download data is too slow

Can you provide or share a link to download the dataset directly? It is slow to download by download_data.py

Camera pose alignment

Hi,

Thanks for releasing this dataset. Since the highres depth map is too sparse to me, I'm trying to re-rendering highres depth map using the given camera parameters and mesh. I suppose the '.traj' contains extrsincs of each frames. Just wondering what is the orientation of raotation and translation, does that represent x,y,z or other coordinate system? which axis face the front of the camera? What is the near and far distance of each depth map?

Best,

Question about .traj

Thank you for the large-scale indoor dataset. I noticed that only 1/6 images have pose parameters in 'lowres_wide.traj' file. Are there more posed images? Does there exist 'vga_wide.traj' files?

Depth Map Downsampling

Hi, thank for sharing such a great dataset!

I was looking at the depth map upsampling task introduced by this dataset.
However, I'm thinking that the current resolution of the ground depth map (1920x1440) is a bit too high for my use case.
So I was wondering if you have already implemented some downsampling script for a depth map, because it is not as straightforward as downsampling RGB images.

Thank you!

Encounter 403 Forbidden When Downloading Mesh File

Hi, thanks for the great work again! I am trying to download the mesh file of each scan produced by ARKit (the low-resolution one). Unfortunately, I encounter 403 forbidden when downloading these files (limit the download file list to mesh only), and I worry about data loss. The following image shows some of my logs.

image

Downloading of mesh files

Hi, when i try to download mesh files

python3 download_data.py raw --split Validation --video_id 48458667 --download_dir . --raw_dataset_assets mesh

i always got errors like that

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>S35X8MRZXWAHDNCD</RequestId><HostId>RUdvpESfdl/jZ20rR9qMA67MmnhqACL1JTE6CXWbyn1qXnqby7vU6nYra1HBRIUX2ypgy2mxLfM=</HostId></Error>

Detection Evaluation Table 2

Hi all,

Thanks for the great dataset, this is very helpful!

I have a few questions regarding the evaluation shown in Table 2 in the paper:

  1. Which intersection-over-union (IoU) threshold t did you use to compute mAP@t ? I assume it is either 0.5 or 0.25 as in VoteNet?
  2. Is the IoU computed over axis-aligned bounding boxes, or oriented bounding boxes?
  3. On which split (val or test) are these numbers reported? Since test is currently not available, what are the numbers on validation?
  4. Are the per-point annotations available?

Thanks a lot!

Best,
Francis

Add GitHub Repository description

Hey ARKitScenes developers, I looked at other Apple Organizations Repository projects and they mostly have a GitHub description, why not this one?

image

Question about raw Faro high-resolution XYZRGB point cloud

Thank you for the impressive large-scale indoor dataset collecting work. It's a significant dataset with many possibilities for high-level application scenarios, and I like it.

It is thoughtful to generate ground truth high-resolution depth maps by discarding geometry which a direct line-of-sight from the novel viewpoint cannot be guaranteed.

But I think if we can get access to raw Faro high-resolution XYZRGB point cloud, it will grant the dataset more potential, such as point cloud completion task.

Will it be possible for us to get access to much more raw data collected in your well-designed data collecting process? We can explore more meaningful settings to explore 3D understanding in such a large-scale real indoor dataset.

ARCamera poses

Hi, thanks for your great work!
I'm working on a data collection APP with ARKit. I notice that you estimate the ground truth poses instead of using ARCamera poses (ARCamera.transform). Why? Is it because the ARCamera poses are inaccurate?

Transformation between ARKit mesh and Point Cloud

Hello,

It looks like the 3D bounding boxes can only be used on the ARKit reconstructions.
Could you provide the transformation between this mesh and the point clouds, so that the bounding boxes can be used on the point cloud as well?
Could you also explain the difference between data['segments']['obbAligned'] and data['segments']['obb'] in the annotations file?

image

Thanks!

Mesh and high-resolution point-clouds in the dataset

Thank you for the useful dataset!
I'm wondering about the source of the mesh data in the 3dod dataset. Were FARO LiDAR scanners used to create these detailed meshes? Also, do these meshes relate to the high-resolution point clouds in the raw data? Your clarification would be appreciated.

.traj Rotation Format

hello,I wanna know the I want to know whether the order of rotating Euler angles is xyz, and whether the coordinate system is w2c

Rotated videos

Hi,
Great dataset!
Some of the videos are flipped or 90-degree rotated. Are there any tags to detect this kind of videos and rotate them back?
thanks

About the camera extrinsics of provided highres depth map

Hi,

Thanks for providing the highres depth map! That is really useful. However, I found some problems regarding to the timestamp.

For each highres depth map, I can't find the corresponding camera extrinsics in .traj file. For example, in Validation/42446103, highres depth map folder contains 42446103_78067.167.png, but 42446103_78067.167 didn't appear in .traj file. I can only find extrinsics for 78067.183 and 78067.083.

Best,

Problems with annotation-files

Since I found missing objects in the object annotation files, I suggest to start a thread were we collect such issues

In the sets 40776203 and 40776204 (Training) the bed in the corner is not labeled. It has the following transformation:

  • in 40776203: center=[ 3.087, -1.471, -1.220 ], dimension=[ 2.320, 1.680, 0.461], rot=[-0.527, 0, 0]
  • in 40776204: center=[ 0.585, -1.064, -1.141 ], dimension=[ 2.320, 1.680, 0.461], rot=[ 1.043, 0, 0]
    bed_missing
    bed_included

Watertight textured mesh

Hi,

Thank you for sharing the dataset. In section 3.2 of the paper, you have metioned that watertight textured mesh could be generated by stereographic projection. Cloud you share these meshes or related code please?

Question about the high resolution wide image?

Thanks for sharing this useful dataset.

One question is that this dataset lacks high resolution wide image of 60fps, where the low-res wide image of 60fps is provided. In my case, the high resolution wide image of 60fps will be helpful since current high resolution wide image is too sparse. I believe that the dense image sequence will be very helpful in indoor recontruction. Could you please provide it?

Normalized Bounding Boxes across Dataset

Hi ARKitScenes Team,

I'm planning to use this dataset along with the scan2cad dataset (https://github.com/skanti/Scan2CAD) for model training. They also have bounding boxes for objects in each scene. However, I need to normalize the bboxes so that theirs and the bboxes from ARKit annotation files would live in the same space (scale, coordinates, etc).

Do you have any idea of how to normalize these two datasets (like in general datasets of scenes with bbox labels) so that the extracted bboxes would have the same scale (a chair bbox in ARKit would have roughly the same scales as a chair bbox in scan-net)?

Thanks a lot!

highres_depth in raw dataset

Dear ARKitScenes developers,

Thanks for the excellent work! I am particularly interested in using high-resolution depth maps in the raw dataset. The ones in the upsampling subset are too sparse for my use case, unfortunately.

However, after I downloaded one of the sequences in the raw dataset using the command below:
python3 download_data.py raw --split Training --video_id 41048190 --download_dir /tmp/Datasets_ssd/
I couldn't find highres_depth. Could you please advise if I missed anything?

On a separate note, I also wonder if there are any plans to release the point clouds captured by the Faro laser scanner and how you computed the confidence for depth maps if it could be disclosed.

.ply file with original color

Hi ARKitScenes researchers,

I am new to 3D points cloud, so I don't know how to reconstruct color (from the original scene) to .ply mesh. Can you please help to give some instructions. I would like to load the .ply file into Unity 3D for my research project.

Thanks,

laser scan alignment

Hi,
I tried to align point clouds from different view points for the laser scan data, but I found out that some data cannot be aligned using the provided pose, the scene with id 421659. Is there anything that I should take care of when I align the point clouds?

Best,

Camera coordinates convention

Hi, thank you for releasing this dataset.
I'd like to know the coordinate convention used to store the camera trajectory rotations. What are the directions corresponding to the X, Y, and Z rotation axes? Can you provide this information in the docs, i.e. as you did for the Hypersim dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.