Giter Site home page Giter Site logo

alexma011 / pytorch-polygon-rnn Goto Github PK

View Code? Open in Web Editor NEW
111.0 7.0 42.0 50 KB

Pytorch implementation of Polygon-RNN(http://www.cs.toronto.edu/polyrnn/poly_cvpr17/)

License: GNU General Public License v3.0

Python 100.00%
pytorch polygon-rnn cvpr-2017 deep-learning instance-segmentation

pytorch-polygon-rnn's People

Contributors

alexma011 avatar kongsea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-polygon-rnn's Issues

Visualizing the net-structure when training the net

Hi @AlexMa011 , Thanks for your great work. Now when i use "make_dot" function that from graphviz package to visulize the net-structure, i got a weird result.
(1.) Firstly, i add two lines code in the tarin.py script as follow:

    r = net(x,x1,x2,x3)
    g = make_dot(r, params=dict(net.named_parameters()))
    g.render('graph',view=False)
    result = r.contiguous().view(-1, 28*28+3)

When i run the script train.py, the program just stopped at this line code:g = make_dot(r, params=dict(net.named_parameters()))` , then i killed the thread by "Ctrl+C" shortcut. I can see
a dot format file called "graph" , there is the content of "graph":

digraph { graph [size="12,12"] node [align=left fontsize=12 height=0.2 ranksep=0.1 shape=box style=filled] 140010753187920 [label=GatherBackward] 140010753109776 -> 140010753187920 140010753109776 [label=ViewBackward] 140010753242896 -> 140010753109776 140010753242896 [label=ThAddmmBackward] 140010753243024 -> 140010753242896 140010753243024 [label=ExpandBackward] 140010753190480 -> 140010753243024 140010753190480 [label=BroadcastBackward] 140010753243280 -> 140010753190480 140010753243280 [label="module.model1.0.weight (64, 3, 3, 3)" fillcolor=lightblue] 140010753243152 -> 140010753190480 140010753243152 [label="module.model1.0.bias (64)" fillcolor=lightblue] 140010753243344 -> 140010753190480 140010753243344 [label="module.model1.1.weight (64)" fillcolor=lightblue] 140010753243408 -> 140010753190480 140010753243408 [label="module.model1.1.bias (64)" fillcolor=lightblue] 140010753243472 -> 140010753190480 140010753243472 [label="module.model1.3.weight (64, 64, 3, 3)" fillcolor=lightblue] 140010753243536 -> 140010753190480 140010753243536 [label="module.model1.3.bias (64)" fillcolor=lightblue] 140010753243600 -> 140010753190480 140010753243600 [label="module.model1.4.weight (64)" fillcolor=lightblue] 140010753243664 -> 140010753190480 140010753243664 [label="module.model1.4.bias (64)" fillcolor=lightblue] 140010753243728 -> 140010753190480 140010753243728 [label="module.model1.7.weight (128, 64, 3, 3)" fillcolor=lightblue] 140010753243792 -> 140010753190480 140010753243792 [label="module.model1.7.bias (128)" fillcolor=lightblue] 140010753243856 -> 140010753190480 140010753243856 [label="module.model1.8.weight (128)" fillcolor=lightblue] 140010753243920 -> 140010753190480 140010753243920 [label="module.model1.8.bias (128)" fillcolor=lightblue] 140010753243984 -> 140010753190480 140010753243984 [label="module.model1.10.weight (128, 128, 3, 3)" fillcolor=lightblue] 140010753244048 -> 140010753190480 140010753244048 [label="module.model1.10.bias (128)" fillcolor=lightblue] 140010753244112 -> 140010753190480 140010753244112 [label="module.model1.11.weight (128)" fillcolor=lightblue] 140010753244176 -> 140010753190480 140010753244176 [label="module.model1.11.bias (128)" fillcolor=lightblue] 140010753244240 -> 140010753190480 140010753244240 [label="module.model2.0.weight (256, 128, 3, 3)" fillcolor=lightblue] 140010753244304 -> 140010753190480 140010753244304 [label="module.model2.0.bias (256)" fillcolor=lightblue] 140010753244368 -> 140010753190480 140010753244368 [label="module.model2.1.weight (256)" fillcolor=lightblue] 140010753244432 -> 140010753190480 140010753244432 [label="module.model2.1.bias (256)" fillcolor=lightblue] 140010753244496 -> 140010753190480 140010753244496 [label="module.model2.3.weight (256, 256, 3, 3)" fillcolor=lightblue] 140010753244560 -> 140010753190480 140010753244560 [label="module.model2.3.bias (256)" fillcolor=lightblue] 140010753244624 -> 140010753190480 140010753244624 [label="module.model2.4.weight (256)" fillcolor=lightblue] 140010753244688 -> 140010753190480 140010753244688 [label="module.model2.4.bias (256)" fillcolor=lightblue] 140010753244752 -> 140010753190480 140010753244752 [label="module.model2.6.weight (256, 256, 3, 3)" fillcolor=lightblue] 140010753244816 -> 140010753190480 140010753244816 [label="module.model2.6.bias (256)" fillcolor=lightblue] 140010753244880 -> 140010753190480 140010753244880 [label="module.model2.7.weight (256)" fillcolor=lightblue] 140010753244944 -> 140010753190480 140010753244944 [label="module.model2.7.bias (256)" fillcolor=lightblue] 140010753245008 -> 140010753190480 140010753245008 [label="module.model3.0.weight (512, 256, 3, 3)" fillcolor=lightblue] 140010753245072 -> 140010753190480 140010753245072 [label="module.model3.0.bias (512)" fillcolor=lightblue] 140010753245136 -> 140010753190480 140010753245136 [label="module.model3.1.weight (512)" fillcolor=lightblue] 140010753286224 -> 140010753190480 140010753286224 [label="module.model3.1.bias (512)" fillcolor=lightblue] 140010753286288 -> 140010753190480 140010753286288 [label="module.model3.3.weight (512, 512, 3, 3)" fillcolor=lightblue] 140010753286352 -> 140010753190480 140010753286352 [label="module.model3.3.bias (512)" fillcolor=lightblue] 140010753286416 -> 140010753190480 140010753286416 [label="module.model3.4.weight (512)" fillcolor=lightblue] 140010753286480 -> 140010753190480 140010753286480 [label="module.model3.4.bias (512)" fillcolor=lightblue] 140010753286544 -> 140010753190480 140010753286544 [label="module.model3.6.weight (512, 512, 3, 3)" fillcolor=lightblue] 140010753286608 -> 140010753190480 140010753286608 [label="module.model3.6.bias (512)" fillcolor=lightblue] 140010753286672 -> 140010753190480 140010753286672 [label="module.model3.7.weight (512)" fillcolor=lightblue] 140010753286736 -> 140010753190480 140010753286736 [label="module.model3.7.bias (512)" fillcolor=lightblue] 140010753286800 -> 140010753190480 140010753286800 [label="module.model4.1.weight (512, 512, 3, 3)" fillcolor=lightblue] 140010753286864 -> 140010753190480 140010753286864 [label="module.model4.1.bias (512)" fillcolor=lightblue] 140010753286928 -> 140010753190480 140010753286928 [label="module.model4.2.weight (512)" fillcolor=lightblue] 140010753286992 -> 140010753190480 140010753286992 [label="module.model4.2.bias (512)" fillcolor=lightblue] 140010753287056 -> 140010753190480 140010753287056 [label="module.model4.4.weight (512, 512, 3, 3)" fillcolor=lightblue] 140010753287120 -> 140010753190480 140010753287120 [label="module.model4.4.bias (512)" fillcolor=lightblue] 140010753287184 -> 140010753190480 140010753287184 [label="module.model4.5.weight (512)" fillcolor=lightblue] 140010753287248 -> 140010753190480 140010753287248 [label="module.model4.5.bias (512)" fillcolor=lightblue] 140010753287312 -> 140010753190480 140010753287312 [label="module.model4.7.weight (512, 512, 3, 3)" fillcolor=lightblue] 140010753287376 -> 140010753190480 140010753287376 [label="module.model4.7.bias (512)" fillcolor=lightblue] 140010753287440 -> 140010753190480 140010753287440 [label="module.model4.8.weight (512)" fillcolor=lightblue] 140010753287504 -> 140010753190480 140010753287504 [label="module.model4.8.bias (512)" fillcolor=lightblue] 140010753287568 -> 140010753190480 140010753287568 [label="module.convlayer1.0.weight (128, 128, 3, 3)" fillcolor=lightblue] 140010753287632 -> 140010753190480 140010753287632 [label="module.convlayer1.0.bias (128)" fillcolor=lightblue] 140010753287696 -> 140010753190480 140010753287696 [label="module.convlayer1.2.weight (128)" fillcolor=lightblue] 140010753287760 -> 140010753190480 140010753287760 [label="module.convlayer1.2.bias (128)" fillcolor=lightblue] 140010753287824 -> 140010753190480 140010753287824 [label="module.convlayer2.0.weight (128, 256, 3, 3)" fillcolor=lightblue] 140010753287888 -> 140010753190480 140010753287888 [label="module.convlayer2.0.bias (128)" fillcolor=lightblue] 140010753287952 -> 140010753190480 140010753287952 [label="module.convlayer2.2.weight (128)" fillcolor=lightblue] 140010753288016 -> 140010753190480 140010753288016 [label="module.convlayer2.2.bias (128)" fillcolor=lightblue] 140010753288080 -> 140010753190480 140010753288080 [label="module.convlayer3.0.weight (128, 512, 3, 3)" fillcolor=lightblue] 140010753288144 -> 140010753190480 140010753288144 [label="module.convlayer3.0.bias (128)" fillcolor=lightblue] 140010753288208 -> 140010753190480 140010753288208 [label="module.convlayer3.2.weight (128)" fillcolor=lightblue] 140010753288272 -> 140010753190480 140010753288272 [label="module.convlayer3.2.bias (128)" fillcolor=lightblue] 140010753288336 -> 140010753190480 140010753288336 [label="module.convlayer4.0.weight (128, 512, 3, 3)" fillcolor=lightblue] 140010753288400 -> 140010753190480 140010753288400 [label="module.convlayer4.0.bias (128)" fillcolor=lightblue] 140010753288464 -> 140010753190480 140010753288464 [label="module.convlayer4.2.weight (128)" fillcolor=lightblue] 140010753288528 -> 140010753190480 140010753288528 [label="module.convlayer4.2.bias (128)" fillcolor=lightblue] 140010753288592 -> 140010753190480 140010753288592 [label="module.convlayer5.0.weight (128, 512, 3, 3)" fillcolor=lightblue] 140010753288656 -> 140010753190480 140010753288656 [label="module.convlayer5.0.bias (128)" fillcolor=lightblue] 140010753288720 -> 140010753190480 140010753288720 [label="module.convlayer5.2.weight (128)" fillcolor=lightblue] 140010753288784 -> 140010753190480 140010753288784 [label="module.convlayer5.2.bias (128)" fillcolor=lightblue] 140010753288848 -> 140010753190480 140010753288848 [label="module.linear2.weight (787, 1568)" fillcolor=lightblue] 140010753288912 -> 140010753190480 140010753288912 [label="module.linear2.bias (787)" fillcolor=lightblue] 140010753288976 -> 140010753190480 140010753288976 [label="module.lstmlayer.weight_ih_l0 (6272, 7846)" fillcolor=lightblue] 140010753289040 -> 140010753190480 140010753289040 [label="module.lstmlayer.weight_hh_l0 (6272, 1568)" fillcolor=lightblue] 140010753289104 -> 140010753190480 140010753289104 [label="module.lstmlayer.bias_ih_l0 (6272)" fillcolor=lightblue] 140010753289168 -> 140010753190480 140010753289168 [label="module.lstmlayer.bias_hh_l0 (6272)" fillcolor=lightblue] 140010753289232 -> 140010753190480 ...... }
Finally, I use the command dot -Tpdf graph -o graph_.pdf to convert the dot format to pdf for easily visual.
I got a weird result:
issue
I don't know what's wrong with my code and hope for your appreciative help or your suggestions for visulizing the net structure of polygon-rnn. Thank U again~.

How does the `newdataset` handle the polygon-processing?

Hi!

I am currently trying to figure out what the following lines of code do. In the newdataset-class, how are you encoding each polygon, and how is the output of newdataset.__getitem__ supposed to be interpreted?

point_num = len(json_file['polygon'])
polygon = np.array(json_file['polygon'])
point_count = 2
# img_array = np.zeros([data_num, 3, 224, 224])
label_array = np.zeros([self.length, 28 * 28 + 3])
label_index_array = np.zeros([self.length])
if point_num < self.length - 3:
for points in polygon:
index_a = int(points[0] / 8)
index_b = int(points[1] / 8)
index = index_b * 28 + index_a
label_array[point_count, index] = 1
label_index_array[point_count] = index
point_count += 1
label_array[point_count, 28 * 28] = 1
label_index_array[point_count] = 28 * 28
for kkk in range(point_count + 1, self.length):
if kkk % (point_num + 3) == point_num + 2:
index = 28 * 28
elif kkk % (point_num + 3) == 0:
index = 28 * 28 + 1
elif kkk % (point_num + 3) == 1:
index = 28 * 28 + 2
else:
index_a = int(polygon[kkk % (point_num + 3) - 2][0] / 8)
index_b = int(polygon[kkk % (point_num + 3) - 2][1] / 8)
index = index_b * 28 + index_a
label_array[kkk, index] = 1
label_index_array[kkk] = index
else:
scale = point_num * 1.0 / (self.length - 3)
index_list = (np.arange(0, self.length - 3) * scale).astype(int)
for points in polygon[index_list]:
index_a = int(points[0] / 8)
index_b = int(points[1] / 8)
index = index_b * 28 + index_a
label_array[point_count, index] = 1
label_index_array[point_count] = index
point_count += 1
for kkk in range(point_count, self.length):
index = 28 * 28
label_array[kkk, index] = 1
label_index_array[kkk] = index

If you could provide some insight in your thought process, that would be lovely!
Cheers in advance!

LICENSE

Hi there,

Is this an MIT or Apache license?

Horia

Differences to Original Polygon RNN Paper

Hi,

Thanks for the re-implementation you provided. I had two questions regarding your code:
1- How are you estimating the first vertex for the polygon in your code?
2- Why do you have the LSTM layer after the two-layer ConvLSTM? Is this following the method in the paper or for what reason?
thx

Could you please share the datasets?

It is exciting to read this beautiful code. So I star it immediately. However I do not understand how to find and feed the data with polygon from the website. In other words, there are no json files with good format I've found.
so could you please share the datasets, so that we can train and test with this code more easily? thanks for your kindness work!

Running Correction code

Is it possible to run the correction method using this code as described in the paper? Essentially I want to feed in the corrected polygon to generate new predictions using that?

Cityscapes Results

Hi Alex!

Thanks for trying this out. One of the reasons why we have a first point network in our model instead of using any first point in the RNN (using a start token) is because a polygon is circularly symmetrical (which is not the case for language), and therefore the first point is not a well defined object.

It'd be interesting to know what scores you got on Cityscapes since I see you said in another issue that it was better than the polygon-rnn from CVPR 2017.

Thanks,
Amlan

Training is throwing error.

I am getting error while doing training. Please help me:

$ python train.py --gpu_id 0 --batch_size 8  --lr 0.0001
Traceback (most recent call last):
  File "train.py", line 12, in <module>
    from test import test
  File "/home/ai/Documents/vineet/pytorch-polygon-rnn/test.py", line 14, in <module>
    from utils.utils import img2tensor
ImportError: No module named 'utils.utils'

I think there has to be file(utils.py) in utils folder which consist of img2tensor, iou and getbboxfromkps. So please include function in utils folder or change test.py such that

from utils.utils import img2tensor
from utils.utils import iou, getbboxfromkps

can be replaced by other function.
Or am I need to install packages other than requirements package?
please correct me if I am wrong.

Testing on CItyscapes, the channels are wrong

It is exciting to read this beautiful code. So I star it immediately. And I ran this code according to your steps. But Why am I getting this error?
Traceback (most recent call last):
File "test.py", line 130, in
re = net.module.test(xx, 60)
File "/home/mbw/poly1/pytorch-polygon-rnn-master/model.py", line 203, in test
output1 = self.model1(input_data1)
File "/home/mbw/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/mbw/anaconda3/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mbw/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/mbw/anaconda3/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight[64, 3, 3, 3], so expected input[1, 4, 224, 224] to have 3 channels, but got 4 channels instead

Images/performance metrics

Hi,

Thanks for your work on this. Are you able to provide some metrics and/or images of the results you are getting with your implementation in the README?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.