Hello, I'm trying to convert a recorded ARKit session into an MVE sc

Converting camera pose to mve::CameraInfo about mve HOT 8 CLOSED

romsahel commented on July 17, 2024

Converting camera pose to mve::CameraInfo

from mve.

Comments (8)

simonfuhrmann commented on July 17, 2024 1

Can you use some of ARKit's code directly to generate a rotation matrix from your quaternion, and compare with the get_rot_from_quaternion implementation? One thing to note is that if ARKits representation is actually a 4-vector axis angle representation, the normalization that happens here [1] is probably not correct (and the conversion is probably not correct either):

https://github.com/simonfuhrmann/mve/blob/master/libs/mve/bundle_io.cc#L53-L54

from mve.

simonfuhrmann commented on July 17, 2024 1

Nice work. I'd be curious to see a fusion of all the depth maps with fssr.

from mve.

simonfuhrmann commented on July 17, 2024

Hi. I'm not entirely sure what causes this. If you are certain that the issue has to do with the rotation values, then maybe ARKit uses a different notation/convention for the quaternion, and get_rot_from_quaternion cannot be used without modification to match their quaternion conventions.

There are quite a few ways to express rotations unfortunately. Some notations (angle-axis) use a normalized axis, and a rotation value (in degree or radians) around that axis (4 values in total). Others encode the rotation value around the axis as the length of axis itself (3 values in total). So a few things you could check if the axis is properly normalized; if you're using the correct angle encoding (degree vs. radians); if it's rotating the right way; if the forth dimension of the quaternion is at the correct place (some encode the magnitude in position 0, axis being 1-3, others encode axis in 0-2, magnitude being in 3). Just some ideas...

from mve.

romsahel commented on July 17, 2024

Thank you for the quick response!

Indeed, I had tried different ways to convert from quaternion but none gave satisfactory results.
However, following your idea, I re-checked the ARKit API and the camera extrinsic parameters are actually accessible as a matrix so I extracted that to use directly into MVE (just had to invert some axes to fix the used convention) and it works much better!
So thank you, the position/rotation seems good and I can move around and the points overlap for the most part.

Unfortunately, I'm still getting problems:

On datasets where the camera did not rotate (only translation), the result is pretty good and the mesh is clean, I just get some kind of spherical distortion. Just to show what I'm talking about, here is a comparison between the mesh produced by MVE and the one produce by ARKit, the edge at the bottom is supposed to be a straight wall.

I suspected the distortion parameter but changing its value (to a fixed average value) in the CameraInfo seem to have no effect. Should I undistort the image as well as changing the parameter? I thought scene2pset only used images to colorize the pointset.
Or could it be only a side effect of only having point of views in one direction (camera direction vector is quasi-parallel for each view)?
I'm having even more problematic results in scenes with rotation: although the scene is coherent, lots of artifacts appear. Again, just to show what I'm talking about, on the left is the result from the dataset with no rotation (distorted but clean), on the right I moved freely around the couch (much more messy):

Since translation alone seem fine, I'm suspecting other camera parameters, but I don't see where it could be wrong (apart from the distortion parameter):

mve::CameraInfo cameraInfo;
// focal is the pixel focal length, normalize it using the largest side of the image
cameraInfo.flen = rgbWidth > rgbHeight ? info.focalX / (float) rgbWidth : info.focalY / (float) rgbHeight;
// principalPointOffset is the offset from the top-left corner of the image frame, in pixels
cameraInfo.ppoint[0] = info.principalPointOffsetX / static_cast<float>(rgbWidth);
cameraInfo.ppoint[1] = info.principalPointOffsetY / static_cast<float>(rgbHeight);
// pixels are square on iPhone and iPad but still check properly
cameraInfo.paspect = info.focalX / info.focalY;
// use a fixed value for now: obtained by averaging the lensDistortionCenter value in a PhotoSession, since ARKit does not provide proper lens distortion parameter
// lensDistortionCenter is the offset of the distortion center of the camera lens from the top-left corner of the image.
cameraInfo.dist[0] = 948.0f / static_cast<float>(rgbWidth);
cameraInfo.dist[1] = 720.0f / static_cast<float>(rgbHeight);

I'm sorry for the wall of text! It's just in the odd case you have an idea, it would be a huge help.

from mve.

romsahel commented on July 17, 2024

Okay for my first point: I've found the depthmap_convert_conventions method which solved the distortion problem!

Now for the second point, I think it's actually due to ARKit positions and rotations not being stable enough throughout the AR sessions so that the recorded poses are not fully coherent. It just doesn't appear in the "translation only" dataset because SLAM algorithms are more sensitive to rotation.

from mve.

simonfuhrmann commented on July 17, 2024

Sorry for the late response. Yes, there are two ways to represent depth maps, using "depth" or "range" values. I'm glad you figured that one out. Regarding the remaining issue with the alignment, I am not sure if I have enough context to be of help. Let me throw in some ideas.

If you receive depth maps from ARKit, then radial distortion parameters of the camera model cannot be the issue; all depth maps are assumed to be undistorted. If the color image is used for coloring the depth maps, they also have to be undistorted the same way. Otherwise, color won't align with the geometry.

Does ARKit do some sort of depth map alignment that it doesn't roll into the exported rotation? If that's the case, that would well explain the misalignment. If ARKit is used multiple times (as you said "sessions" in your text above), why/how would ARKit guarantee that multiple sessions have consistent geometry?

from mve.

romsahel commented on July 17, 2024

Hi,
Sorry for not answering earlier myself. As you stated, radial distortion was not the problem and ARKit does the depth map/rgb map alignment for us. So actually, it was just a matter of anchoring the camera poses throughout the session and saving them at the end. I achieved a pretty satisfactory result for now!
I haven't tested yet to run through multiple sessions but I think it would only depend on ARKit's ability to remain successfully consistent using persistent sessions; so that MVE should not see any difference between a single-session scan and a multi-session one.
Anyway, thank you very much for your help and thank you for your amazing work on this library!

from mve.

romsahel commented on July 17, 2024

It's not easy to show a 3d result in 2d but here is a gif of a quick scan (27 seconds ; 53 frames) :

I'm not sure it's the best example since I think the best use case is for scanning bigger surfaces rather than small objects (as the lidar resolution is rather small), however it performs pretty well.

from mve.

Converting camera pose to mve::CameraInfo about mve HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent