<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Possible Idea] Use features instead of geometric center for simpler and faster position recognition about fpcc HOT 10 CLOSED

Levaru commented on June 28, 2024

[Possible Idea] Use features instead of geometric center for simpler and faster position recognition

from fpcc.

Comments (10)

xyjbaal commented on June 28, 2024

I'm honored that you would pay attention to my work.

I'm happy to discuss it with you. Forgive my poor english.

I have uploaded a slide on prediction, which I hope will be helpful.
(NMS picks only one point (the center point) to correspond to an object.)
Is the middle of your part empty, like a ring? If the middle is mostly empty, then the way to find the center point will be difficult to apply to your object.
I believe your idea is right and feasible. You can add m more branches like the center score branch to predict the score of m key points.

The method (center score + NMS) is somewhat similar to the keypoint prediction.
(But I haven't read the related papers on the 3D key point prediction, so please forgive me if I misunderstood it.)

I have a question about your idea, if there are 2 close parts, A and B, then we find out 3 key points of each part, and we name them a1,a2,a3, b1,b2, and b3. Then how to determine that a1 and b2 belong to different parts?

Maybe I should update my knowledge of 3d keypoint prediction.
(I remember the human body key point prediction on image seems to have a method of distinguishing each key point belongs to different people.)

from fpcc.

Levaru commented on June 28, 2024

Thank you for the quick reply! Your English is very good :)
If you are unsure about some sentences, then I can highly recommend translating with DeepL. It's way better than Google Translate but I haven't tried it with Japanese yet.

The slide is very helpful, I see that my understanding was mostly correct.
Yes the middle is empty but I believe that this is a general problem for shapes like this one. A closed box or a cube would have the same problem, since we only see the outer surface. The parts need to be thin in one(eg. disk) or two(eg. cylinder) dimensions. Or maybe I misunderstood something.
What do you mean when saying branch? An additional neural network layer?

I have a question about your idea, if there are 2 close parts, A and B, then we find out 3 key points of each part, and we name them a1,a2,a3, b1,b2, and b3. Then how to determine that a1 and b2 belong to different parts?

You wouldn't be able to determine it directly. Basically the idea is that we don't segment the parts anymore but instead use the DGCNN backbone to predict and collect the keypoints. The (distance-)relationship between the keypoints (a1-a2)&(a2-a3)&(a3-a1) should provide a better match than (b2-a2)&(a2-a3)&(a3-b2) for example. Of course there could be a situation (inaccuracies, missing or bad points, etc.) where (b2-a2)&(a2-a3)&(a3-b2) would be a better match but I hope that the probability for that is low.

Another problem (but that is easily solved) is that some positions are indeterminate. For example: We have a pointcloud where only the top surface of a box is visible. We find the keypoints for the 4 corners. The matching algorithm could now position the box upside down (flipped to the wrong side). This can be easily solved though, by checking if the keypoints would be even visible with this box position. I made a quick sketch to show what I mean:

The method (center score + NMS) is somewhat similar to the keypoint prediction.
(But I haven't read the related papers on the 3D key point prediction, so please forgive me if I misunderstood it.)
[...]
Maybe I should update my knowledge of 3d keypoint prediction.
(I remember the human body key point prediction on image seems to have a method of distinguishing each key point belongs to different people.)

I'm also not familiar with 3d keypoint prediction. I'm just trying to increase the speed and accuracy of my binpicking project. Right now I'm using an object detection model in the 2D image to segment out the parts inside the pointcloud. But this requires a lot of images and it takes ages to label them so I was researching for other segmentation methods and that's how I found your paper. The ability to automatically simulate and label several thousand pointclouds is amazing, compared to drawing countless bounding boxes for multiple days.

from fpcc.

xyjbaal commented on June 28, 2024

Thank you. DeepL is really better than Google Translate.

I suddenly realized I had made a big mistake.
The five parts used in the paper are flat or thin, so there is no problem using the geometric center point as a reference point to calculate the center score for other points.
BUT if it is a cube box as you described, there is a big problem.
If it is a box-shape or a cube-shape object, we should use the center of the visible part as the reference point to calculate the center score.
If it is a circle-shaped object, I think my method is not feasible.

I drew a sketch to explain my meaning.
(I didn't even know we could upload images)

Image annotation is really time-consuming. This is why I try to segment parts directly from point clouds.

At the moment, I have 2 ideas about how to up speed.

In fact, DGCNN is a relatively slow feature extractor. You may consider replacing it with another one. I've improved DGCNN, and the new feature extractor can be about twice as fast.
However, I found that sparse convolution can be more faster. I wonder if we can use sparse convolution as a feature extractor. Sparse convolution seems to run only in Linux and on specific types of GPU.
Our final goal is to estimate the pose. Whether ICP or other methods, there is a process of downsampling. So consider downsampling the point cloud of the whole scene first, then segmenting it, and take the result of the segmentation to ICP to calculate the pose. (without downsampling again)

from fpcc.

xyjbaal commented on June 28, 2024

What do you mean when saying branch? An additional neural network layer?

Yes, add several structures like the center score branch to predict other key points.

Some of my thoughts:

I use the center point as a reference point because it's well defined, and the local information of the center point is not easily disturbed by the points of other parts.
If I use the corner as the key point, the local information of the corner can be easily disturbed by other parts' points. So I wonder if the network can learn it.
It may be feasible if multiple points near the middle of the object are used as reference points.

from fpcc.

Levaru commented on June 28, 2024

If it is a box-shape or a cube-shape object, we should use the center of the visible part as the reference point to calculate the center score.

So it's not necessary for the "center" to be a geometric center? How would this work if the "box" has a trapezoid shape or any other similar shape that is not symmetric, on which side do I put the "center"? If the surface with the center point is on the bottom, then there wouldn't be any point with a high center score visible. Or am I misunderstanding something?

(P.S.: After reading through your paper again I finally got it (I think). My example with the trapezoid shape and the resulting occlusion is stupid, because in that case one should actually use the geometric center.)

What I'm still unsure about is how to choose d_max for the dataset generation and training. If it's too small I will risk the center points being occluded in certain positions and if it's too big the center point will be unprecise. The values in your code for the ring and gear shaft also don't seem to correspond to any dimension of the parts bounding box.

In fact, DGCNN is a relatively slow feature extractor. You may consider replacing it with another one. I've improved DGCNN, and the new feature extractor can be about twice as fast.

Yes, I noticed that it's slower than a 2D object detection model, but the time saved annotating images is really worth it. Is the improved DGCNN already included in this papers code or is it the one in your other repository?

However, I found that sparse convolution can be more faster. I wonder if we can use sparse convolution as a feature extractor. Sparse convolution seems to run only in Linux and on specific types of GPU.

I'm going to look into that, thank you for the tip!

Our final goal is to estimate the pose. Whether ICP or other methods, there is a process of downsampling. So consider downsampling the point cloud of the whole scene first, then segmenting it, and take the result of the segmentation to ICP to calculate the pose.

I'm already doing all of that. At the beginning I was actually performing the pose estimation on the whole pointcloud, which was quite slow and inaccurate. My current problem now is the segmentation accuracy.

I'm sorry if I'm bothering you with my questions. I believe your method is really great for some shapes and I'm just trying to figure out the limitations. And again, thank you very very much for your answers and thoughts, so far!

from fpcc.

xyjbaal commented on June 28, 2024

So it's not necessary for the "center" to be a geometric center? How would this work if the "box" has a trapezoid shape or any other similar shape that is not symmetric, on which side do I put the "center"? If the surface with the center point is on the bottom, then there wouldn't be any point with a high center score visible. Or am I misunderstanding something?

I think that if it's a box, maybe you should use the center of the visible part as a reference point instead of the geometric center. e.g. If face A of the box is visible, then the reference point is the center of face A. If face B of the box is visible, then the reference point is the center of face B.

This is what I did when segmenting a new object, the wave-dissipating block. https://www.fudotetra.co.jp/solution/block/tetrapod/

What I'm still unsure about is how to choose d_max for the dataset generation and training. If it's too small I will risk the center points being occluded in certain positions and if it's too big the center point will be unprecise. The values in your code for the ring and gear shaft also don't seem to correspond to any dimension of the parts bounding box.

The center score calculated by the equation: s_i = 1 - (||p_i - c_i||/d)^β, here I set d = d_max. This is because d_max is more general and easy to give a clear definition.
I would have considered d as a hyperparameter and given an optimal for each part.
As you said, it is possible to use different d for different parts to get the best result.
It is obvious that it is not reasonable to use d_max of gear to calculate the center socres.

What we need to consider is a balancing problem.

Problem with a large d:

In the prediction phase, you need a larger r_nms to suppress other points. This could cause the center of the closer object to be suppressed by another. For example, two gears that are parallel and next to each other.

Problem with a small d:

causing many points to have a score of 0, and the network cannot learn it.
In the prediction phase, some object points will be obscured ( as you said)

The above problems are weaknesses of the NMS algorithm, and I'm trying to modify it. So far, it seems that you can try more different d. I think you can gradually reduce d, as I did in the ablation experiment. Use a smaller d for the data set. Find a suitable set of (d, Screening radius r_nms, and a minimum threshold θ_th).

Is the improved DGCNN already included in this papers code or is it the one in your other repository?

The new network structure I expect to publish by September. But I don't think it will be faster than sparse convolution.

I'm sorry if I'm bothering you with my questions. I believe your method is really great for some shapes and I'm just trying to figure out the limitations. And again, thank you very very much for your answers and thoughts, so far!

I understand this. FPCC can't be applied to all parts.
Hopefully, we can find a more general method for more objcets.
I'm pleased to discuss this with you.
Discussing ideas with people from all over the world is the fascinating of IT.

from fpcc.

Levaru commented on June 28, 2024

I think that if it's a box, maybe you should use the center of the visible part as a reference point instead of the geometric center. e.g. If face A of the box is visible, then the reference point is the center of face A. If face B of the box is visible, then the reference point is the center of face B.

But if I have a full bin of those parts, both sides can be visible depending on the position. So I would need to train both reference points but this would mess up the segmentation, wouldn't it? In that case FPCC would not be suitable for parts where the geometric center is not on the part (like rings) and parts where surfaces are far away from the center (like boxes). Or do you mean that I would need to train two separate models and perform the inference twice?

There is one last thing that I don't understand (maybe it's a knowledge gap):
In the paper you used a d_max value of 0.08 for the gear shaft. The resulting sphere with such a radius would look like this:

Looking at figure 8 this appears to be the case:

In the paper, in section [3.2. Inference Phase] the following is mentioned:

Note that, we find the nearest center point c_k of
point p_i in the feature embedding space, and then calculate
Euclidean distance d(p_i, c_k) between p_i and c_k in 3D space.
If d(p_i, c_k) exceeds d_max, p_i is regarded as noise, and an
instance label will not be assigned to p_i

When applied to the gear shaft and with d_max=0.08 this would result in the ends never being assigned a label, no? But judging from the images in the paper (Figure 7 and Figure 8) this doesn't appear to be the case. How does that work?

Looking at the code in fpcc_test.py the max_feature_distance is set to None. I guess the images from the paper were generated with the same setting, at least this would explain it from my understanding.

I'm pleased to discuss this with you.
Discussing ideas with people from all over the world is the fascinating of IT.

I agree :)

from fpcc.

xyjbaal commented on June 28, 2024

FPCC isn't suitable for parts where the geometric center is not on the part (like rings).
For the box, I think FPCC might be OK.
Although the center point of each box is different, it might be feasible.

Looking at the code in fpcc_test.py the max_feature_distance is set to None. I guess the images from the paper were generated with the same setting, at least this would explain it from my understanding.

You understanding is correct.
If you want to enable max_feature_distance, the value of max_feature_distance should be set strictly to the maximum radius of the object.

The result in the paper should not use max_feature_distance.

from fpcc.

Levaru commented on June 28, 2024

You understanding is correct.
If you want to enable max_feature_distance, the value of max_feature_distance should be set strictly to the maximum radius of the object.
The result in the paper should not use max_feature_distance.

I see, this explains a lot. I was very confused at first because of the difference between the code and paper.

FPCC isn't suitable for parts where the geometric center is not on the part (like rings).
For the box, I think FPCC might be OK.
Although the center point of each box is different, it might be feasible.

I'm sorry but I still don't understand what you mean by using the center of the visible part. The visible part is always different depending on the object position inside the bin.

from fpcc.

xyjbaal commented on June 28, 2024

Yes, the center point of each object is different.
You can find the center (c1, c2. c3,...) of the visible part of each object,
and then calculate the center score of the points of 1st object by c1, 2nd object by c2, and so on.
I'm not sure if this method will works for your objects.
But I think it's worth a try.

from fpcc.

[Possible Idea] Use features instead of geometric center for simpler and faster position recognition about fpcc HOT 10 CLOSED

Comments (10)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent