Giter Site home page Giter Site logo

ibm / matrix-capsules-with-em-routing Goto Github PK

View Code? Open in Web Editor NEW
87.0 12.0 26.0 539 KB

A TensorFlow implementation of "Matrix Capsules with EM Routing" by Hinton et al. (2018).

License: Apache License 2.0

Python 99.71% Shell 0.29%
capsules hinton matrix-capsules em-routing dynamic-routing capsnet capsule-networks

matrix-capsules-with-em-routing's People

Contributors

ashleygritzman avatar imgbotapp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

matrix-capsules-with-em-routing's Issues

failed to run cuBLAS routine cublasGemmBatchedEx issue

Hi Ashley,
Thanks for your great work.
When I ran the code, it failed with below information:

*2019-10-03 13:25:00.047383: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-10-03 13:25:00.477672: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-03 13:25:00.478676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.695
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.53GiB
2019-10-03 13:25:00.478708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-10-03 13:25:08.880050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-03 13:25:08.880113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-10-03 13:25:08.880130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-10-03 13:25:08.881112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7286 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-03 13:25:09.813176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-10-03 13:25:09.813211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-03 13:25:09.813215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-10-03 13:25:09.813218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-10-03 13:25:09.813350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7286 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-03 13:26:52.896021: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2019-10-03 13:26:52.897701: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
2019-10-03 13:26:53 CRITICAL: Traceback (most recent call last):
File "/home/jeff/anaconda2/envs/tf_36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
return fn(args)
File "/home/jeff/anaconda2/envs/tf_36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/jeff/anaconda2/envs/tf_36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[3612672,4,4], b.shape=[3612672,4,4], m=4, n=4, k=4, batch_size=3612672
[[{{node tower_0/lyr.conv_caps1/votes/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/lyr.conv_caps1/votes/Tile_1, tower_0/lyr.conv_caps1/votes/Tile, ^swap_out_tower_0/gradients/tower_0/lyr.conv_caps1/votes/MatMul_grad/MatMul_1_0, ^swap_out_tower_0/gradients/tower_0/lyr.conv_caps1/votes/MatMul_grad/MatMul_1)]]
[[{{node tower_0/class_caps/activation_out/_23}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1082_tower_0/class_caps/activation_out", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

My computer system information:
Linux Ubuntu 16.04
Nvidia GPU GeForce GTX 1070, 8G
CUDA 9.0/cuDNN 7.3
Python 3.6.8
Tensorflow version: 1.11.0-gpu

I met this problem first time with CUDA 9.2 and cuDNN 7.6, I downgraded them to CUDA 9.0 and cuDNN 7.3, but still has this issue.
I also tried to reduce 'bath_size' from 64 to 2, but still have the same problem. Any idea, why it failed?

pip requirements need to be fixed

I try to use pip install -r requirements.txt to install the dependencies.
However, some requirements are not met:

mkl-fft==1.0.12 (only 1.0.6 is available)
mkl-random==1.0.2 (only 1.0.1.1)
mkl-service==2.0.2 (not found)

spatial_routing_matrix = utl.create_routing_map(child_space=1, k=1, s=1) ?

Hi Ashley,
In layers.py, 'def fc_caps( )' creat spatial_routing_matrix with 'spatial_routing_matrix = utl.create_routing_map(child_space=1, k=1, s=1)' , where child_space is 1, but i think it's not necessary to be 1 over this point, you know along the tensor shape alteration flow before (64, 7, 7, 8, *) ---> (64, 5, 5, 16, *) , the child_space should be 5 instead of 1.
And with child_space=1, the newly generated spatial_routing_matrix has shape (1,1), that will make the 'em_routing()' thereafter incorrect.
How do you think about that? maybe my reasoning is wrong somewhere?

Kindly regards
Jeff

Routing by agreement with Transformer-based for NMT

Hello all :)

Iโ€™m trying to use Routing by agreement with TRANSFORMER-BASED for NMT task. The proposed idea is to use each output of head attention as an input capsule for a capsule network to fuse the semantic and spatial information from different heads to help boost the correction of sentence output. As below:

routing

The implementation code is here, and Pytorch issue is here.

I have got so bad results. Kindly, I need and suggestion to work on.

I look forward to your feedback.

Try to continue to train from a checkpoint

Hi guys,

When I try to continue to train the network from a ckpt_dir, I use the flag "load_dir" to do that.
python3 train_val.py --load_dir=./logs/smallNORB/20200103_/train/checkpoint
But the code returns:
"load_ckpt directory exists but cannot find a valid
checkpoint to resore, consider using the reset flag
"
I have checked the dir and there is some checkpoints from previous training.
Is there some mistakes that I made in this process?

cost_j_h = (beta_v + 0.5*tf.log(var_j)) ?

Hi Ashley,

For 'def m_step()' in em_routing.py, I can see you code 'cost_j_h = (beta_v + 0.5tf.log(var_j)) * rr_prime_sum * layer_norm_factor' prior 'cost_j = tf.reduce_sum(cost_j_h, axis=-1, keepdims=True, name="cost_j")' .
My question is whether this lead to beta_v been multiplied by 'h' times? because 'beta_v + 0.5
tf.log(var_j)' will broadcast beta_v over all the h elements along the last dimension.
According formula (2) in 'M ATRIX CAPSULES WITH EM
ROUTING' paper, it should be put something like (beta_v + Sum of cost_j_h) instead of Sum of (beta_v + cost_j_h), how do you think? maybe i am wrong.

Kindly regards
Jeff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.