Giter Site home page Giter Site logo

lmnt-com / haste Goto Github PK

View Code? Open in Web Editor NEW
311.0 13.0 29.0 227 KB

Haste: a fast, simple, and open RNN library

License: Apache License 2.0

Makefile 1.13% Python 31.44% C++ 66.98% C 0.45%
deep-learning machine-learning pytorch tensorflow api python cpp algorithm rnn lstm

haste's People

Contributors

manipopopo avatar nammingi avatar shaper avatar sharvil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

haste's Issues

Install fails on AWS DLAMI v. 29.0

As you know the code does not work on K80 for me now, so I went on to experiment on Amazon EC2.

I have launched a p2.xlarge instance with Deep Learning AMI (Amazon Linux 2) Version 29.0.

I executed following commands:

source activate tensorflow2_p36
git clone https://github.com/lmnt-com/haste
cd haste
make haste_tf
pip install haste_tf-*.whl
python
import haste_tf

Now I get the following error:


Traceback (most recent call last):
File "", line 1, in
File "/home/ec2-user/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/haste_tf/init.py", line 22, in
from .gru import GRU
File "/home/ec2-user/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/haste_tf/gru.py", line 32, in
LIB = tf.load_op_library(pkg_resources.resource_filename(name, 'libhaste_tf.so'))
File "/home/ec2-user/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/load_library.py", line 57, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Experiments with adding things to LD_LIBRARY_PATH all failed. As a matter of fact, LD_LIBRARY_PATH already contains cuda libs with cudaart.so.10.0 there.

Zoneout remains during eval()

I noticed that the zoneout is still applied even after I call model.eval() and I'm assuming that this is not the desired behavior. I'm therefore manually changing the zoneout value to 0 during evaluation. I only tried it for IndRNN in pytorch.

PIP Install

With PyTorch, I am used to installing via pip without needing to build the library. Would that be possible to support? I don't have too much experience with building libraries, which is why I see this as important.

It also makes it much easier to work with this library on a laptop, which is my primary development environment.

lib/blas.h:25:50: error: field initializer is not constant

Hi,

I was trying to build the haste_pytorch but it failed to build the code at the first step ("make haste_pytorch"). Here is the error message I got:

$ make haste_pytorch 
nvcc -ccbin g++ -gencode arch=compute_37,code=compute_37 -gencode arch=compute_60,code=compute_60 -c lib/lstm_forward_gpu.cu.cc -o lib/lstm_forward_gpu.o -std=c++11 -x cu -Xcompiler -fPIC -I/usr/include/eigen3 -I/lib/cuda-9.0/include -Ilib -O3
lib/blas.h:25:50: error: field initializer is not constant
   static constexpr decltype(cublasHgemm)* gemm = cublasHgemm;
                                                  ^
lib/blas.h:30:53: error: field initializer is not constant
   static constexpr decltype(cublasSgemm)* gemm = cublasSgemm;
                                                     ^
lib/blas.h:35:53: error: field initializer is not constant
   static constexpr decltype(cublasDgemm)* gemm = cublasDgemm;
                                                     ^
make: *** [Makefile:30: haste] Error 1

Failing to install / work

Using pytorch 1.5, cuda 10.1, python 3.7.7 in a clean virtual environment, using the haste-0.4 code under releases.

It properly compiles libhaste.a, then through tracing the makefile it attemps to execute:

python setup.py haste_pytorch -q bdist_wheel

which results in:

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'bdist_wheel'

After manually running "setup.py haste_pytorch install" as their is no bdist_wheel in the /tmp python packaging directory:

----> 1 import haste_pytorch                                                    
                                                                                
.../testing/lib/python3.7/site-packages/haste_pytorch-0.
4.0-py3.7-linux-x86_64.egg/haste_pytorch/__init__.py in <module>                
     19                                                                         
     20                                                                         
---> 21 from .gru import GRU                                                    
     22 from .indrnn import IndRNN                                              
     23 from .lstm import LSTM                                                  
                                                                                
.../testing/lib/python3.7/site-packages/haste_pytorch-0.
4.0-py3.7-linux-x86_64.egg/haste_pytorch/gru.py in <module>                     
     17                                                                         
     18                                                                         
---> 19 import haste_pytorch_lib as LIB                                         
     20 import torch                                                            
     21 import torch.nn as nn       

ImportError: libc10.so: cannot open shared object file: No such file or directory

Creating and installing an egg gets the same error.

Numerical precision problem encounterd.

Hello, I found there may be some numerical precision problems in some of the rnn routines.

I compiled the haste_pytorch and modified check function 'self_consistency' at haste/validation/pytorch.py like below, I use
function 'cal_err_pointwise' to compute a pointwise relative error between two tensors, in this manner I know how close two tensors are and I can check whether the outputs of rnns are reasonable.
I compared the output tensors of both CUDA routine and torch script routine in pytorch wrapper, but found that the max relative error is like 0.1~1 magnitude, which is fairly large. Below is my check function and results.

def cal_err_pointwise(a, b):
  if a is None or b is None:
      return None
  a = a.cpu().detach().numpy()
  b = b.cpu().detach().numpy()
  if np.all(np.equal(a, b)):
      return 0.0
  diff = a - b
  denom = np.abs(b) + np.ones_like(b) * 1e-10
  ratio = np.abs(diff) / denom
  err_mean = np.mean(ratio)
  err_max = np.max(ratio)
  return err_mean, err_max


cal_err = cal_err_pointwise

def self_consistency(rnn, x):
  x_cuda = x.clone().cuda()
  x_cpu = x.clone().cpu()
  x_cuda.requires_grad_(True)
  x_cpu.requires_grad_(True)

  y1, _ = rnn.cuda().forward(x_cuda)
  y1.backward(torch.ones_like(y1))
  y2, _ = rnn.cpu().forward(x_cpu)
  y2.backward(torch.ones_like(y2))

  g1 = x_cpu.grad.data
  g2 = x_cuda.grad.data

  print('-' * 8 + " self consistency " + '-' * 8)
  print("output rel err (mean, max) : {0}".format(cal_err(y1, y2)))
  print("grad   rel err (mean, max) : {0}".format(cal_err(g1, g2)))
  print(torch.max(torch.abs(y1.cpu()-y2.cpu())))
  print(torch.max(torch.abs(g1.cpu()-g2.cpu())))

My check results, the values wrapped with ** is the large relative errors

[indrnn]
-------- self consistency --------
output rel err (mean, max) : (1.4625937e-06, **0.28469396**)
grad   rel err (mean, max) : (**0.0006560313**, **669.45105**)
tensor(6.5565e-07, grad_fn=<MaxBackward1>)
tensor(3.3379e-06)

[layer_norm_gru]
-------- self consistency --------
output rel err (mean, max) : (5.0523818e-06, **4.9286914**)
grad   rel err (mean, max) : (1.2550343e-05, **4.330911**)
tensor(2.5630e-06, grad_fn=<MaxBackward1>)
tensor(1.2398e-05)

[layer_norm_indrnn]
-------- self consistency --------
output rel err (mean, max) : (1.2726755e-06, **0.087144986**)
grad   rel err (mean, max) : (6.146369e-06, **0.6983186**)
tensor(1.1474e-06, grad_fn=<MaxBackward1>)
tensor(4.7684e-06)

[layer_norm_lstm]
-------- self consistency --------
output rel err (mean, max) : (5.336079e-06, **0.3929247**)
grad   rel err (mean, max) : (2.4412904e-05, **1.0231713**)
tensor(1.4633e-05, grad_fn=<MaxBackward1>)
tensor(0.0470)

[lstm]
-------- self consistency --------
output rel err (mean, max) : (1.0738823e-06, **0.34573397**)
grad   rel err (mean, max) : (2.7127278e-06, **0.19694966**)
tensor(6.2585e-07, grad_fn=<MaxBackward1>)
tensor(3.4571e-06)
    native consistency
tensor(1.4901e-07, device='cuda:0', grad_fn=<MaxBackward1>)
tensor(9.5367e-07, device='cuda:0')

haste_pytorch: Gradient for kernel/recurrent_kernel becomes zero when trained on gpu

Hi I have been trying to haste_pytorch (the trainning speed of haste is phenomenal!) but I found that the gradients for kernel/recurrent_kernel become zero when the model is trained on gpu. The below is a simple code snippets I tried to test on:

lstm_layer = haste.LSTM(input_size=128, hidden_size=256, batch_first = True)
output = torch.nn.Linear(256*5, 1)

lstm_layer.cuda()
output.cuda()

x = torch.rand([1, 5, 128]).cuda()
target = torch.zeros(1).cuda()
loss_func = torch.nn.MSELoss()
optim = torch.optim.Adam(list(lstm_layer.parameters()) + list(output.parameters()))

for i in range(5):
    y, _ = lstm_layer(x)
    y = y.contiguous().view(1,-1)
    y = output(y).squeeze()

    loss = loss_func(y, target)
    loss.backward()
    optim.step()
    for n, p in lstm_layer.named_parameters():
        print(n, p.grad)
    optim.zero_grad()

Print out:
kernel tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') recurrent_kernel tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') bias tensor([-1.8202e-10, 3.7714e-09, 2.8942e-09, ..., 1.0455e-08, 2.6969e-09, 1.6647e-08], device='cuda:0')

The gradients for kernel/recurrent_kernel become non-zero once "cuda()" are replaced by "cpu()".

Most grateful if you can provide some insight on it.

Many thanks for your help.

Make haste does not work (C++ compile error - field initializer is not constant)

Downloading haste and trying to make it with 'make haste' results in following C++ compiler error(s):


lib/blas.h:25:50: error: field initializer is not constant
static constexpr decltype(cublasHgemm)* gemm = cublasHgemm;
^
lib/blas.h:30:53: error: field initializer is not constant
static constexpr decltype(cublasSgemm)* gemm = cublasSgemm;
^
lib/blas.h:35:53: error: field initializer is not constant
static constexpr decltype(cublasDgemm)* gemm = cublasDgemm;
^
make: *** [haste] Error 1

my assumption (which might well be wrong) is that in CUDA Toolkit version I use (tried 10.2 and 10.1) NVIDIA changed the signatures of cublas[H/S/D]gemm and they don't work as constant initializers anymore.

Can you please look into it, but at least provide exact version of CUDA you have successfully built haste against?

haste_pytorch does not install properly with conda cudatoolkit?

Hi, So I've been trying to install haste on my 3 machines with only one successful. (One of them installed, but Segmentation fault as soon as I throw tensors to GPU to be used with haste, the other one just errors out at some g++ compiler step in wheel)

So I've tried to move to docker, with repo2docker to make the process easier for my coworkers. However the image build still crashes at haste-pytorch build stage, while using conda for build process. As the haste pypi documentation state cuda toolkit 10.1+ is required, I assumed conda provided cuda toolkit sufficed, but seems like in the build-from-scratch process, this is not the case?

If I'm doing something wrong/misinterpreting, feedback would be more than welcome!

Support zoneout on lstm cell state and add recurrent dropout

hi,any plan about these two questions

  1. lstm zoneout on cell state the same with hidden state

    if zoneout_prob:
    if training:
    h[-1] = (h[-1] - h[-2]) * zoneout_mask[t] + h[-2]
    else:
    h[-1] = zoneout_prob * h[-2] + (1 - zoneout_prob) * h[-1]

  2. add recurrent dropout the same with keras
    https://github.com/tensorflow/tensorflow/blob/fcc4b966f1265f466e82617020af93670141b009/tensorflow/python/keras/layers/recurrent.py#L2450-L2459

thanks!

Segmentation fault on Cuda 10.0

Hello @sharvil ,

I been using the CPU IndRNN implementation of hast during my development. The results have been great.

When I moved to production, that presumably will use the cuda version of IndRNN. But every time I hit a Segmentation fault during the very first start of training.

I tried to get a stack trace of the error, that I attached below, but do not seen very informative. There is anything to fix this?

Python 3.8, GPU T4 16 GB, Pytorch 1.6, Cuda 10.0.

Edit : After upgrading to 10.2, it worked

WARNING:root:Start a training with 400 max_iterations, using a INDRNN with 2 layers , and 384 of hidden size.
[Thread 0x7fff87baf700 (LWP 5801) exited]
[Thread 0x7fff8a3b0700 (LWP 5800) exited]
[Thread 0x7fff8abb1700 (LWP 5799) exited]
[Thread 0x7fff9fdc4700 (LWP 5798) exited]
[Thread 0x7fffa05c5700 (LWP 5797) exited]
[Thread 0x7fffa2dc6700 (LWP 5796) exited]
  0%|          | 0/1 [00:00<?, ?it/s]
Thread 1 "python3.8" received signal SIGSEGV, Segmentation fault.
__GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:65
65      ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007fff4bde2132 in ?? () from /usr/local/cuda/lib64/libcublas.so.10.0
#2  0x00007fff4bddb289 in ?? () from /usr/local/cuda/lib64/libcublas.so.10.0
#3  0x00007fff4bdde15f in cublasSgemm_v2 () from /usr/local/cuda/lib64/libcublas.so.10.0
#4  0x00007fff60c9376d in haste::v0::indrnn::ForwardPass<float>::Run (this=this@entry=0x7fffffff9968, steps=steps@entry=934, W=<optimized out>, u=0x7fff01a5b600, b=b@entry=0x7fff01a5bc00, x=x@entry=0x7ffefe800000, 
    h=0x7ffefed5e400, workspace=0x7ffefeebce00, zoneout_prob=zoneout_prob@entry=0.5, zoneout_mask=0x7ffefec00000) at lib/indrnn_forward_gpu.cu.cc:139
#5  0x00007fff60c75510 in (anonymous namespace)::<lambda()>::<lambda()>::operator() (__closure=<optimized out>) at frameworks/pytorch/indrnn.cc:60
#6  (anonymous namespace)::<lambda()>::operator() (__closure=<optimized out>) at frameworks/pytorch/indrnn.cc:60
#7  (anonymous namespace)::indrnn_forward (training=<optimized out>, zoneout_prob=<optimized out>, x=..., h0=..., kernel=..., recurrent_scale=..., bias=..., zoneout_mask=...) at frameworks/pytorch/indrnn.cc:60
#8  0x00007fff60c77493 in pybind11::detail::argument_loader<bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>::call_impl<at::Tensor, at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, pybind11::gil_scoped_release> (f=<optimized out>, this=0x7fffffff9aa0)
    at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/cast.h:1931
#9  pybind11::detail::argument_loader<bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>::call<at::Tensor, pybind11::gil_scoped_release, at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)>(at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)) && (f=<optimized out>, this=0x7fffffff9aa0)
    at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/cast.h:1908
#10 void pybind11::cpp_function::initialize<at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), at::Tensor, bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [15], pybind11::call_guard<pybind11::gil_scoped_release> >(at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), at::Tensor (*)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [15], pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (__closure=<optimized out>, call=...)
    at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/pybind11.h:155
#11 0x00007fff60c5f47d in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=
    (True, <float at remote 0x7ffff6964110>, <Tensor at remote 0x7fff8f1b3440>, <Tensor at remote 0x7fff8f206380>, <Parameter at remote 0x7fff8f17e680>, <Parameter at remote 0x7fff8f17e700>, <Parameter at remote 0x7fff8f17e740>, <Tensor at remote 0x7fff8f206a80>), kwargs_in=0x0) at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/pybind11.h:620
#12 0x00000000004fac34 in cfunction_call_varargs (kwargs=<optimized out>, args=<optimized out>, func=<built-in method indrnn_forward of PyCapsule object at remote 0x7fff8f260d80>) at ../Objects/call.c:742
#13 PyCFunction_Call () at ../Objects/call.c:772
#14 0x000000000055adf7 in do_call_core (kwdict=0x0, 
    callargs=(True, <float at remote 0x7ffff6964110>, <Tensor at remote 0x7fff8f1b3440>, <Tensor at remote 0x7fff8f206380>, <Parameter at remote 0x7fff8f17e680>, <Parameter at remote 0x7fff8f17e700>, <Parameter at remote 0x7fff8f17e740>, <Tensor at remote 0x7fff8f206a80>), func=<built-in method indrnn_forward of PyCapsule object at remote 0x7fff8f260d80>, tstate=<optimized out>) at ../Python/ceval.c:4983
#15 _PyEval_EvalFrameDefault () at ../Python/ceval.c:3559
#16 0x0000000000555060 in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x7fff8f162040, for file /usr/local/lib/python3.8/dist-packages/haste_pytorch/indrnn.py, line 60, in forward (ctx=<IndRNNFunctionBackward at remote 0x7fff8f1f2c80>, training=True, zoneout_prob=<float at remote 0x7ffff6964110>, inputs=(<Tensor at remote 0x7fff8f1b3440>, <Tensor at remote 0x7fff8f206380>, <Parameter at remote 0x7fff8f17e680>, <Parameter at remote 0x7fff8f17e700>, <Parameter at remote 0x7fff8f17e740>, <Tensor at remote 0x7fff8f206a80>))) at ../Python/ceval.c:741
#17 _PyEval_EvalCodeWithName () at ../Python/ceval.c:4298
#18 0x00000000004f9d9d in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fff8f22ed78, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#19 0x00000000004fbae2 in PyVectorcall_Call (kwargs=0x0, tuple=<optimized out>, callable=<function at remote 0x7fff8f26f5e0>) at ../Objects/call.c:199
#20 PyObject_Call (kwargs=0x0, args=<optimized out>, callable=<function at remote 0x7fff8f26f5e0>) at ../Objects/call.c:227
#21 PyEval_CallObjectWithKeywords (kwargs=0x0, args=<optimized out>, callable=<function at remote 0x7fff8f26f5e0>) at ../Objects/call.c:809
#22 PyObject_CallObject () at ../Objects/call.c:817
#23 0x00007ffff1d06a45 in THPFunction_apply(_object*, _object*) () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#24 0x00000000004facba in cfunction_call_varargs (kwargs=<optimized out>, args=<optimized out>, func=<built-in method apply of FunctionMeta object at remote 0x446cd20>) at ../Objects/call.c:757
#25 PyCFunction_Call () at ../Objects/call.c:772
#26 0x00000000004f95d9 in _PyObject_MakeTpCall () at ../Objects/call.c:159
#27 0x000000000055ad0e in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=<built-in method apply of FunctionMeta object at remote 0x446cd20>) at ../Include/cpython/abstract.h:125
#28 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0xb0a990) at ../Python/ceval.c:4963
#29 _PyEval_EvalFrameDefault () at ../Python/ceval.c:3469
#30 0x00000000004f9d0a in function_code_fastcall (globals=<optimized out>, nargs=4, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
#31 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fff8f1677f0, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:410

TF: Dropout still applies on inference if training = False

I think it affects all layers which have 'dropout' - on inference
tf.nn.dropout[weights['recurrent_kernel'], rate = self.dropout)
is passed onto your C++ / CUDA part. Smth of this sort is more appropriate:
tf.nn.dropout[weights['recurrent_kernel'], rate = (self.dropout if training else 0.0))

I am not sure if zoneout is applied correctly under the hood too, does 'training' parameter to CUDA part switches it off correctly? I hacked around it amending relevant code as following:
h, c, _ = LIB.haste_lstm( x, weights['kernel'], tf.nn.dropout(weights['recurrent_kernel'], rate = (self.dropout if training else 0.0)), weights['bias'], zoneout_mask if training else tf.zeros([0, 0, 0], dtype=self.dtype), training=training, zoneout_prob=(self.zoneout if training else 0.0))

Install on pip on systems without cuda

Hello,
In our CI we install the haste_pytorch but it is failing because we do not have cuda. On #2 you said that is possible to run on CPU only scenarios.

Would be possible to install haste_pytorch without requiring cuda?

Multiple Layers LSTM

I really wanted to use the Haste API in order to do a multiple layer LSTM. But I realized that, in order to do so you need the intermediate hidden states of a previous layer in order to feed the next layer, and I see no trivial solution for the problem with the support of your API. I wonder how the haste community of developers deals with multiple layers of LSTM.
Thank you so much for your time, and have a nice day.

haste_tf\libhaste_tf.so not found

Hi, I tried to use haste for tf for testing reccurent_dropout.
However, I got that error message while
import haste_tf as haste

I am using Windows10 and Anaconda. I installed via pip.

Thats the stacktrace:

NotFoundError                             Traceback (most recent call last)
<ipython-input-1-9cc6bdc626a2> in <module>
----> 1 import haste_tf as haste
      2 import tensorflow as tf
      3 
      4 import tensorflow_addons as tfa
      5 from tensorflow import keras

~\anaconda3\envs\tf_nightly_env\lib\site-packages\haste_tf\__init__.py in <module>
     20 
     21 from ._version import __version__  # generated in setup.py
---> 22 from .gru import GRU
     23 from .gru_cell import GRUCell
     24 from .indrnn import IndRNN

~\anaconda3\envs\tf_nightly_env\lib\site-packages\haste_tf\gru.py in <module>
     30 
     31 
---> 32 LIB = tf.load_op_library(pkg_resources.resource_filename(__name__, 'libhaste_tf.so'))
     33 
     34 

~\anaconda3\envs\tf_nightly_env\lib\site-packages\tensorflow\python\framework\load_library.py in load_op_library(library_filename)
     56     RuntimeError: when unable to load the library or get the python wrappers.
     57   """
---> 58   lib_handle = py_tf.TF_LoadLibrary(library_filename)
     59   try:
     60     wrappers = _pywrap_python_op_gen.GetPythonWrappers(

Stateful in Pytorch

Inspecting Pytorch's source code, I don't think stateful=True is supported, though there's a custom implementation, so it appears doable. Any planned support? It's crucial in my application

README PyTorch example

This is minor, but the three PyTorch layers defined in the README should be put on the GPU, e.g.

norm_lstm_layer = haste.LayerNormLSTM(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05).cuda()

since the input is a CUDA tensor.

Bidirectional masking?

Hi!

I found that this library supports the lengths parameter for the LSTM. What does this mean for the backward component of a bidirectional LSTM, if there is padding at the end of the sequence? Is the padding ignored?

Thanks!

HASTE produces wrong gradients on K80 device

I have tested HASTE on two different instance types on AWS (for reproducibility):

p2.xlarge (K80 instance)
p3.2xlarge (V100 instance)

Both instances were using stock Deep Learning AMI (Amazon Linux 2) Version 29.0 - ami-0b0b075706e19de29

Following sequence of commands was used to install the HASTE:

(0) Change symlink of /usr/local/cuda to point from /usr/local/cuda-10.0 to /usr/local/cuda-10.1 (see another issue that without this HASTE does not install properly).
(1) source activate tensorflow2_p36
(2) git clone https://github.com/lmnt-com/haste
(3) cd haste
(4) make haste_tf
(5) pip install haste_tf-*.whl

then from jupyter notebook the following:

%env CUDA_VISIBLE_DEVICES=0
import numpy as np
import pickle
import tensorflow as tf
#gpus = tf.config.experimental.list_physical_devices('GPU')
#tf.config.experimental.set_memory_growth(gpus[0], True)
import haste_tf as haste
from tensorflow.python.keras import layers as L
from tensorflow.python.keras import backend as K

embedding_size = 100 #n_channels
lstm_nunits = 200
ntimestamps = 300
batch_size = 16

class HasteLSTM(tf.keras.layers.Layer):
    def __init__(self, num_units, dropout, zoneout, shape):
      super(HasteLSTM, self).__init__()
      self.haste_lstm = haste.LSTM(num_units = num_units, dropout = dropout, zoneout = zoneout, direction='unidirectional')
      self.haste_lstm.build(shape)

    def call(self, inputs, training):
       return self.haste_lstm(inputs, training = training)

haste_lstm = HasteLSTM(lstm_nunits, 0.00, 0.00, [batch_size, ntimestamps, embedding_size])

#not really a CuDNN but a normal LSTM, so number of parameters matches
cudnn_lstm = L.LSTM(lstm_nunits, return_sequences = True, unit_forget_bias = False)



dummy_input  = tf.random.normal([batch_size, ntimestamps, embedding_size])
dummy_target = np.zeros(shape=(batch_size, ntimestamps, lstm_nunits))

for i in range(dummy_target.shape[0]):
    for j in range(dummy_target.shape[1]):
        dummy_target[i,j,np.random.randint(0, lstm_nunits)] = 1 #one in random position for each timestamp


input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = haste_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classificiton task

model_haste = tf.keras.Model(inputs=input_, outputs=model_, name='haste_model')

input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = cudnn_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classification task

model_cudnn = tf.keras.Model(inputs=input_, outputs=model_, name='cudnn_model')

total_trainable = 0
haste_trainable = []
for w in haste_lstm.haste_lstm.trainable_variables:
    K.set_value(w, np.zeros_like(w.numpy()))
    haste_trainable.append(w)
    total_trainable += w.numpy().flatten().shape[0]
print("HASTE has total %d trainable variables!" % total_trainable)

total_trainable = 0
cudnn_trainable = []
for w in cudnn_lstm.trainable_weights:
    K.set_value(w, np.zeros_like(w.numpy()))
    cudnn_trainable.append(w)
    total_trainable += w.numpy().flatten().shape[0]
print("CuDNN has total %d trainable variables!" % total_trainable)


#check HASTE gradients on the dummy example
with tf.GradientTape() as tape:
    prediction = model_haste(dummy_input, training=True)
    loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
    
gradients = tape.gradient(loss, haste_trainable)

print("HASTE maxabs of each grad:")
for grad in gradients:
    print (np.max(np.abs(grad)))
    

print("Non-HASTE maxabs of each grad:")
#check CuDNN (actually - plain LSTM) gradients on the dummy example
with tf.GradientTape() as tape:
    prediction = model_cudnn(dummy_input, training=True)
    loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
    
gradients = tape.gradient(loss, cudnn_trainable)
for grad in gradients:
    print (np.max(np.abs(grad)))

On p2.xlarge (K80) the following is the output:

env: CUDA_VISIBLE_DEVICES=0
HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
0.0
0.0
Non-HASTE maxabs of each grad:
6.3259706
0.0
7.397908

On p3.2xlarge (V100) the following is the output:

env: CUDA_VISIBLE_DEVICES=0
HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
7.004616
6.2311497
Non-HASTE maxabs of each grad:
6.231148
0.0
7.0048447

Gradients appear to be broken on K80 device.

[Feature] Windows build support

Logs; the ultimate error reads:

process_begin: CreateProcess(NULL, ar -crv libhaste.a lib/*.o, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [haste] Error 2

Any suggestions or alternate install methods? Don't know which "file" isn't found -- thanks.


Details:

  • First it complained of cl.exe, so I added to PATH "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\" where cl.exe is
  • Using GNU Make 3.81 -- Win 10 -- Windows SDK 10.0 -- VSC 2017 & 2019
  • Have CUDA 10.1, cuDNN 7.6
  • Ran cd path_to_haste; make haste_tf in Anaconda Powershell Prompt virtualenv

Back propagation in Haste

I am using Pytorch, and all I want to know is how do I optimize the parameters of haste_lstm. Suppose optim is a PyTorch optimizer. How do I pass the parameters of haste to optim? So that its backpropagation also interferes in the haste parameters.

layer_norm_gru_cell

In the layer_norm_gru_cell.py in build() for tf-version 2.3 and 2.5:

self.gamma = v1.get_variable('gamma', initializer=v1.initializers.ones())

is throwing an error

The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of tf.keras.initializers.*' and shape` should be fully defined.

Is it necessary to add a shape=1 to the get_variable call()?

Can't run haste layers in Keras

Hello,

I know this seems more of a debugging problem/problem on my side, but get the following error message when running my code, and it only appears when running it with a haste layer:

Traceback (most recent call last):
  File "<string>", line 1331, in haste_lstm
  File "<string>", line 1379, in haste_lstm_eager_fallback
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 280, in args_to_matching_eager
    ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 280, in <listcomp>
    ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 339, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 265, in constant
    allow_broadcast=True)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 276, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 301, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/keras_tensor.py", line 274, in __array__
    'Cannot convert a symbolic Keras input/output to a numpy array. '
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/snap/pycharm-professional/237/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/snap/pycharm-professional/237/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/time-series-on-joints-emg/src/all_in_one_file.py", line 394, in <module>
    x, state = haste1(x, training=True)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/haste_tf/base_rnn.py", line 115, in __call__
    result, state = self.fw_layer(inputs, sequence_length, training)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/haste_tf/lstm.py", line 218, in __call__
    zoneout_prob=self.zoneout)
  File "<string>", line 1339, in haste_lstm
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 122, in dispatch
    result = dispatcher.handle(op, args, kwargs)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1450, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
    input_list)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
    inputs, input_masks, args, kwargs)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
    return self._infer_output_signature(inputs, args, kwargs, input_masks)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 863, in _infer_output_signature
    outputs = call_fn(inputs, *args, **kwargs)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1327, in _call_wrapper
    return self._call_wrapper(*args, **kwargs)
  File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1359, in _call_wrapper
    result = self.function(*args, **kwargs)
TypeError: haste_lstm() missing 1 required positional argument: 'training'

I construct the model with the following code:

inputs = k_l.Input(shape=(train_x.shape[1], train_x.shape[2]))
direction = 'unidirectional' if args.model == 'GRU' else 'bidirectional'
haste1 = haste.LSTM(args.hidden_size, direction=direction, zoneout=0.1, dropout=args.dropout_time)
fc1 = k_l.Dense(args.dense_layers[0], activation='relu', kernel_initializer='he_uniform')
dr1 = k_l.Dropout(0.2)
fc2 = k_l.Dense(1)

x, state = haste1(inputs, training=True)
x = fc1(inputs)
x = dr1(x)
outputs = fc2(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss=loss_func, optimizer=optimizer)
model_hist = model.fit(train_x, train_y, epochs=args.epochs, batch_size=args.batch_size, verbose=1,
                       validation_data=val_data, callbacks=keras_callbacks)

train_x numpy array shape is (21788, 1000, 4)
OS: Ubuntu 20.04
Python version: 3.7
Keras: 2.4.3
Tensorflow: 2.4.1
numpy: 1.19.5
GPU: GTX 1060
CUDA: 11.2

Normally I wouldn't post those error messages on github, but as the code would run without the haste layer, I suspect that the cause of the error lies somewhere close to it, and this repo seems to be the best place to ask and I didn't find any solutions elsewhere. I hope you can help me, I'd really like to try out your implementation for my dataset.

haste_tf compilation fails with "‘bfloat16’ in namespace ‘Eigen’ does not name a type"

First of all, thank you for this wonderful library.

I'm unable to compile haste_tf with Tensorflow 2.7.0 and the most recent haste codebase from the master branch. The root error seems to be: ‘bfloat16’ in namespace ‘Eigen’ does not name a type. Installation with pip looks successful at first glance, but silently fails to build libhaste_tf.so. I've assumed there's something wrong with my setup and tried the Colab linked in the README, but it fails with the same errors. I'd appreciate any assistance.

Activation function in IndRNN

Hello.

Is it possible to apply any activation function between hidden states in IndRNN in tensorflow framework?

Currently I don't see any argument similar to "activation"

Keyword Arguments:
  kernel_initializer: (optional) the initializer to use for the input
    matrix weights. Defaults to `glorot_uniform`.
  recurrent_initializer: (optional) the initializer to use for the
    recurrent scale weights. Defaults to uniform random in [-0.5, 0.5].
    Note that this initialization scheme is different than in the original
    authors' implementation. See https://github.com/lmnt-com/haste/issues/7
    for details.
  bias_initializer: (optional) the initializer to use for the bias vector.
    Defaults to `zeros`.
  kernel_transform: (optional) a function with signature
    `(kernel: Tensor) -> Tensor` that transforms the kernel before it is
    used. Defaults to the identity function.
  recurrent_transform: (optional) a function with signature
    `(recurrent_scale: Tensor) -> Tensor` that transforms the recurrent
    scale vector before it is used. Defaults to the identity function.
  bias_transform: (optional) a function with signature
    `(bias: Tensor) -> Tensor` that transforms the bias before it is used.
    Defaults to the identity function.
  zoneout: (optional) float, sets the zoneout rate for Zoneout
    regularization. Defaults to 0.
  dtype: (optional) the data type for this layer. Defaults to `tf.float32`.
  name: (optional) string, the name for this layer.

Support for PyTorch packed sequences

It seems like Haste does not support packed sequences for dealing with variable-length sequences in PyTorch. Any chance this gets implemented in the near future? Thanks for the great work.

Testing?

Hey!

What is the testing strategy that you're using? I'm having a hard time determining if I can trust your implementation. (I reviewed the validation directory and it seemed like the tests mainly focused on the core implementation. It wasn't clear things like Zoneout or LayerNorm were being tested as well.)

Also, thank you for supporting a CPU! For us, while most training is performed on a GPU, I still test and run my code on a CPU because it's more productive to use my laptop instead of the cloud to develop.

Thanks!

[Pytorch] Pass in hidden and cell state in forward pass?

Is it possible to pass in the hidden and cell state to the (LayerNorm)LSTM's forward pass? To be clear:

y, state = norm_lstm_layer(x)    # current API
y, state = norm_lstm_layer(x, (h0, c0))  # desired API, also same as nn.LSTM's API

Maybe I'm missing something because it seems like a pretty standard use case. For example,

  1. At inference time, I am decoding one time step at a time (in order to use the previous output as the current input). That means I need to pass in the last hidden and cell state as well.
  2. Initializing the LSTM with some representation.

Thanks! And this looks great btw; I've had many annoyances having to implement custom rnn's (layer norm, recurrent dropout, etc.) in pytorch in the past.

Building on google colab gives lots of warnings

I am working in google colab with pytorch and haste and running into a weird issue.

The code below gives RuntimeError: CUDA error: invalid configuration argument. when feature_sizes=[512]. It works fine for feature_sizes=[256], feature_sizes=[128], etc.
Maybe be related to something with CUDA kernels grid sizes pytorch/pytorch#28927
The error is on the logits = self.fc_logits(hidden_states) line

I received a TON of warnings when building haste on google colab. Is it possible the package wasn't properly built?

Also when doing haste.__version__ I get the error AttributeError: module 'haste_pytorch' has no attribute '__version__'

Pytorch version torch.__version__ is '1.5.0+cu101'

CODE:

class LSTMEmbedDiscNet(nn.Module):
    """
    An LSTM discriminator that operates on word indexes.
    IMPORANT: feature_sizes=[512] gives RuntimeError: CUDA error: invalid configuration argument.
    Maybe related to https://github.com/pytorch/pytorch/pull/28927
    Use feature_sizes=[256] or lower.
    """

    def __init__(self, feature_sizes=[512], vocab_size=5726, use_layer_norm=True, trainable_embedding_size=64, dropout=0.1, pad_token=0, embedding_source=None, vocab_file=None, position_dim=8):
        super().__init__()
        self._feature_sizes = feature_sizes
        self._vocab_size = vocab_size
        self._use_layer_norm = use_layer_norm
        self._trainable_embedding_size = trainable_embedding_size
        self._embedding_source = embedding_source
        self._vocab_file = vocab_file
        self._dropout = dropout
        self._pad_token = pad_token
        self._position_dim = position_dim
        if self._embedding_source:
            assert vocab_file

        self.define_module()

    def define_module(self):
        if self._embedding_source:
            assert False # TODO
        else:
            self.embed = nn.Embedding(self._vocab_size, self._trainable_embedding_size)
        self.drop = nn.Dropout(p=self._dropout)
        self.fc_embed_hidden = nn.Linear(self._trainable_embedding_size + self._position_dim, self._feature_sizes[0])
        self.encoder_cell = haste.LayerNormLSTM(input_size=self._feature_sizes[0], hidden_size=self._feature_sizes[0], batch_first=True)
        self.fc_logits = nn.Linear(self._feature_sizes[0], 1)

    def forward(self, sequence, sequence_length):
        """Connect to the graph.

        Args:
            sequence: A [batch_size, max_sequence_length] tensor of int. For example
            the indices of words as sampled by the generator.
            sequence_length: A [batch_size] tensor of int. Length of the sequence.
            is_training: Boolean, False to disable dropout.

        Returns:
            A [batch_size, max_sequence_length, feature_size] tensor of floats. For
            each sequence in the batch, the features should (hopefully) allow to
            distinguish if the value at each timestep is real or generated.
        """
        device = sequence.device
        batch_size, max_sequence_length = sequence.size()

        embeddings = self.drop(self.embed(sequence)) # batch_size, max_sequence_length, self._embedding_size
        embeddings_pos = append_position_signal(embeddings, self._position_dim)
        lstm_inputs = self.fc_embed_hidden(embeddings_pos) # batch_size, max_sequence_length, self._feature_sizes[0]

        hidden_states, _ = self.encoder_cell(lstm_inputs)
        logits = self.fc_logits(hidden_states)
        logits_flat = logits.squeeze(2)

        # Mask past first PAD symbol
        # mask = utils.get_mask_past_symbol(sequence, self._pad_token)
        # masked_logits_flat = logits_flat * mask
        # return masked_logits_flat
        return logits_flat

def test_lstm_embed_disc_net():
    d_batch = 4
    d_max_seq_len = 52
    device = torch.device('cuda:0')

    model = LSTMEmbedDiscNet().to(device).train()
    d_vocab = model._vocab_size
    assert model

    texts = torch.randint(low=0, high=d_vocab, size=(d_batch, d_max_seq_len)).to(device)
    text_lens = torch.randint(low=3, high=d_max_seq_len, size=(d_batch,)).to(device)

    logits = model(texts, text_lens)
    assert logits.size() == (d_batch, d_max_seq_len)

test_lstm_embed_disc_net()

ERROR:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-54-11fbf5027e87> in <module>()
     79     assert logits.size() == (d_batch, d_max_seq_len)
     80 
---> 81 test_lstm_embed_disc_net()

5 frames

<ipython-input-54-11fbf5027e87> in test_lstm_embed_disc_net()
     76     text_lens = torch.randint(low=3, high=d_max_seq_len, size=(d_batch,)).to(device)
     77 
---> 78     logits = model(texts, text_lens)
     79     assert logits.size() == (d_batch, d_max_seq_len)
     80 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-54-11fbf5027e87> in forward(self, sequence, sequence_length)
     55 
     56         hidden_states, _ = self.encoder_cell(lstm_inputs)
---> 57         logits = self.fc_logits(hidden_states)
     58         logits_flat = logits.squeeze(2)
     59 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1610         ret = torch.addmm(bias, input, weight.t())
   1611     else:
-> 1612         output = input.matmul(weight.t())
   1613         if bias is not None:
   1614             output += bias

RuntimeError: CUDA error: invalid configuration argument

Reproducibility

Thanks for your implementation.

Are there reproducibility benchmarks vs. the CuDNN variants (do the layers behave approx. the same w/ same configurations)?

Pypi release

Hello,
I´m in awe with this library. I got great results with LayerNorm LSTM.

Do you have plans for releasing on pypi? With would allow more users to use it!

Problem installing on Ubuntu 20.04, tensorflow 2.2

I'm getting an issue while installing using make haste_tf or just make

File "setup.py", line 54
with open(f'tf/_version.py', 'wt') as f:
^
SyntaxError: invalid syntax
make: *** [Makefile:61: haste_tf] Error 1

not sure what the problem might be here, I can train models using gpu in tensorflow so I don't think it's a cuda or tensorflow problem but I'm not sure.

CUDA error: an illegal memory access was encountered

When I run an RNN with the example (e.g., GRU, IndRNN) I get illegal memory access error.

import torch 
import haste_pytorch as haste 

x = torch.rand([25, 5, 128]).cuda() 

gru_layer = haste.GRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05) 
gru_layer.cuda()               
y, state = gru_layer(x)        
y.mean().backward()

Results in:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-66139d497ca7> in <module>
      7 gru_layer.cuda()
      8 y, state = gru_layer(x)
----> 9 y.mean().backward()

RuntimeError: CUDA error: an illegal memory access was encountered

I'm using Pytorch 1.7.1+cu110 and Python 3.7.3.
Haste is from the github master, compiled by make haste_pytorch.

Supporting RWKV (a RNN that can match transformer LM & zero-shot performance at 1B+ params)

Hi guys. I am working on RWKV, which might be the only RNN (no attention!) that can match transformer LM & zero-shot performance at 1B+ params:
https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r_rwkv3_scaling_rnn_to_15b_and_reach_transformer/

I am using some CUDA in my project too. Probably we can collaborate to promote RNN and scale it to 100B+ params :)

The RWKV discord: https://discord.gg/bDSBUMeFpc

Github: https://github.com/BlinkDL/RWKV-LM

CUDA stuff: https://github.com/BlinkDL/RWKV-CUDA

Install on Windows

Hi,

I am trying to install haste_pytorch on windows using Pip as following:

pip install haste_pytorch

However, I am getting this error:

WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Collecting haste_pytorch
  Using cached haste_pytorch-0.5.0rc0.tar.gz (44 kB)
Using legacy setup.py install for haste-pytorch, since package 'wheel' is not installed.
Installing collected packages: haste-pytorch
    Running setup.py install for haste-pytorch ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\program files (x86)\microsoft visual studio\shared\python37_64\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"'; __file__='"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\helfa\AppData\Local\Temp\pip-record-3mvw8fjq\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\program files (x86)\microsoft visual studio\shared\python37_64\Include\haste-pytorch'
         cwd: C:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\
    Complete output (46 lines):
    c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages\setuptools\dist.py:454: UserWarning: Normalizing '0.5.0-rc0' to '0.5.0rc0'
      warnings.warn(tmpl.format(**locals()))
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.7
    creating build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\base_rnn.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\gru.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\indrnn.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\layer_norm_gru.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\layer_norm_indrnn.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\layer_norm_lstm.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\lstm.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\_version.py -> build\lib.win-amd64-3.7\haste_pytorch
    copying frameworks\pytorch\__init__.py -> build\lib.win-amd64-3.7\haste_pytorch
    running build_ext
    C:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\utils\cpp_extension.py:334: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
      warnings.warn(msg.format('we could not find ninja.'))
    'make' is not recognized as an internal or external command,
    operable program or batch file.
    C:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\utils\cpp_extension.py:270: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
      warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
    building 'haste_pytorch_lib' extension
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    creating build\temp.win-amd64-3.7\Release\frameworks
    creating build\temp.win-amd64-3.7\Release\frameworks\pytorch
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\gru.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\gru.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    gru.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\indrnn.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\indrnn.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    indrnn.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\layer_norm_gru.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_gru.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    layer_norm_gru.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\layer_norm_indrnn.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_indrnn.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    layer_norm_indrnn.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\layer_norm_lstm.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_lstm.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    layer_norm_lstm.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\lstm.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\lstm.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    lstm.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\support.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\support.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
    support.cc
    C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:. "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib64" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64" /LIBPATH:C:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\lib "/LIBPATH:c:\program files (x86)\microsoft visual studio\shared\python37_64\libs" "/LIBPATH:c:\program files (x86)\microsoft visual studio\shared\python37_64\PCbuild\amd64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" haste.lib cublas.lib cudart.lib c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_haste_pytorch_lib build\temp.win-amd64-3.7\Release\frameworks/pytorch\gru.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\indrnn.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_gru.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_indrnn.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_lstm.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\lstm.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\support.obj /OUT:build\lib.win-amd64-3.7\haste_pytorch_lib.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\frameworks/pytorch\haste_pytorch_lib.cp37-win_amd64.lib
    LINK : fatal error LNK1181: cannot open input file 'haste.lib'
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.26.28801\\bin\\HostX86\\x64\\link.exe' failed with exit status 1181
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\program files (x86)\microsoft visual studio\shared\python37_64\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"'; __file__='"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\helfa\AppData\Local\Temp\pip-record-3mvw8fjq\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\program files (x86)\microsoft visual studio\shared\python37_64\Include\haste-pytorch' Check the logs for full command output.

IndRNNs

I'd like to suggest support for IndRNNs; in my experiments on EEG seizure classification w/ very long sequences, they've dominated LSTMs & GRUs. While already also much faster, IndRNNs would benefit from a CuDNN-like speedup in large stacks, and from Layer Normalization for working w/ 1000+ timesteps.

Minimal tf.keras code below; default weight initialization should be handled differently - can clarify post-approval.


IndRNN Cell
from tensorflow.python.keras import activations
from tensorflow.python.keras import backend as K
from tensorflow.python.keras import constraints
from tensorflow.python.keras import initializers
from tensorflow.python.keras import regularizers
from tensorflow.python.keras.engine.base_layer import Layer
from tensorflow.python.keras.utils import tf_utils
from tensorflow.python.ops import math_ops
from tensorflow.python.training.tracking import data_structures
from tensorflow.python.util.tf_export import keras_export
from tensorflow.python.keras.layers.recurrent import DropoutRNNCellMixin



@keras_export(v1=['keras.layers.IndRNNCell'])
class IndRNNCell(DropoutRNNCellMixin, Layer):
  def __init__(self,
               units,
               activation='tanh',
               use_bias=True,
               recurrent_clip_min=-1,
               recurrent_clip_max=-1,
               kernel_initializer='glorot_normal',
               recurrent_initializer=None,
               bias_initializer='zeros',
               kernel_regularizer=None,
               recurrent_regularizer=None,
               bias_regularizer=None,
               kernel_constraint=None,
               recurrent_constraint=None,
               bias_constraint=None,
               dropout=0.,
               recurrent_dropout=0.,
               implementation=1,
               **kwargs):
    super(IndRNNCell, self).__init__(**kwargs)
    
    if recurrent_clip_min is None or recurrent_clip_max is None:
        recurrent_clip_min = None
        recurrent_clip_max = None

    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias
    self.recurrent_clip_min = recurrent_clip_min
    self.recurrent_clip_max = recurrent_clip_max

    self.kernel_initializer = initializers.get(kernel_initializer)
    if self.recurrent_initializer is None:
        self.recurrent_initializer = initializers.uniform(-1.0, 1.0)
    else:
        self.recurrent_initializer = initializers.get(recurrent_initializer)
    self.bias_initializer = initializers.get(bias_initializer)

    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.recurrent_regularizer = regularizers.get(recurrent_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)

    self.kernel_constraint = constraints.get(kernel_constraint)
    self.recurrent_constraint = constraints.get(recurrent_constraint)
    self.bias_constraint = constraints.get(bias_constraint)

    self.dropout = min(1., max(0., dropout))
    self.recurrent_dropout = min(1., max(0., recurrent_dropout))

    self.state_size = data_structures.NoDependency([self.units])
    self.output_size = self.units

  @tf_utils.shape_type_conversion
  def build(self, input_shape):
    input_dim = input_shape[-1]
    self.timesteps = input_shape[1]
    self._process_recurrent_clip()

    self.kernel = self.add_weight(
        shape=(input_dim, self.units),
        name='kernel',
        initializer=self.kernel_initializer,
        regularizer=self.kernel_regularizer,
        constraint=self.kernel_constraint)
    self.recurrent_kernel = self.add_weight(
        shape=(self.units,),
        name='recurrent_kernel',
        initializer=self.recurrent_initializer,
        regularizer=self.recurrent_regularizer,
        constraint=self.recurrent_constraint)

    if self.use_bias:
      self.bias = self.add_weight(
          shape=(self.units,),
          name='bias',
          initializer=self.bias_initializer,
          regularizer=self.bias_regularizer,
          constraint=self.bias_constraint)
    else:
      self.bias = None    
    self.built = True


  def call(self, inputs, states, training=None):
    h_tm1 = states[0]  # previous memory state

    dp_mask = self.get_dropout_mask_for_cell(inputs, training, count=1)
    rec_dp_mask = self.get_recurrent_dropout_mask_for_cell(
        h_tm1, training, count=1)

    if 0. < self.dropout < 1.:
      inputs = inputs * dp_mask[0]
    if 0. < self.recurrent_dropout < 1.:
      h_tm1 = h_tm1 * rec_dp_mask[0]
    
    h = K.dot(inputs, self.kernel)
    h += math_ops.multiply(h_tm1, self.recurrent_kernel)
    if self.use_bias:
      h = K.bias_add(h, self.bias)

    h = self.activation(h)
    return h, [h]

Biases in final IndRNN layer are 0

I assume this is not supposed to happen, but I checked the model's parameters after training and these were the values from the final IndRNN layer in my model:
rnn2.bias Parameter containing: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)

This is my module:
self.rnn = haste.IndRNN(input_dim, hidden_dim, batch_first=True, zoneout=0.1, return_state_sequence=True) self.rnn2 = haste.IndRNN(hidden_dim, 64, batch_first=True, zoneout=0.075) self.d1 = nn.Dropout(0.15)

And forward function:
out, (hn) = self.rnn(x) out, (hn) = self.rnn2(self.d1(out))

Will Haste work without CUDA?

Can this be used without CUDA? Specifically, I have my laptop for local development and my HPC for actual training. Will the Haste LSTM work on my local as well? (Slower, which is fine, but that way I can maintain one codebase)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.