lmnt-com / haste Goto Github PK
View Code? Open in Web Editor NEWHaste: a fast, simple, and open RNN library
License: Apache License 2.0
Haste: a fast, simple, and open RNN library
License: Apache License 2.0
As you know the code does not work on K80 for me now, so I went on to experiment on Amazon EC2.
I have launched a p2.xlarge instance with Deep Learning AMI (Amazon Linux 2) Version 29.0.
I executed following commands:
source activate tensorflow2_p36
git clone https://github.com/lmnt-com/haste
cd haste
make haste_tf
pip install haste_tf-*.whl
python
import haste_tf
Now I get the following error:
Experiments with adding things to LD_LIBRARY_PATH all failed. As a matter of fact, LD_LIBRARY_PATH already contains cuda libs with cudaart.so.10.0 there.
I noticed that the zoneout is still applied even after I call model.eval() and I'm assuming that this is not the desired behavior. I'm therefore manually changing the zoneout value to 0 during evaluation. I only tried it for IndRNN in pytorch.
It seems than Bidirectional Rnn and multi-layer Rnn can't be supported? Do you have plans to support these?
With PyTorch, I am used to installing via pip
without needing to build the library. Would that be possible to support? I don't have too much experience with building libraries, which is why I see this as important.
It also makes it much easier to work with this library on a laptop, which is my primary development environment.
Hi,
I was trying to build the haste_pytorch but it failed to build the code at the first step ("make haste_pytorch"). Here is the error message I got:
$ make haste_pytorch
nvcc -ccbin g++ -gencode arch=compute_37,code=compute_37 -gencode arch=compute_60,code=compute_60 -c lib/lstm_forward_gpu.cu.cc -o lib/lstm_forward_gpu.o -std=c++11 -x cu -Xcompiler -fPIC -I/usr/include/eigen3 -I/lib/cuda-9.0/include -Ilib -O3
lib/blas.h:25:50: error: field initializer is not constant
static constexpr decltype(cublasHgemm)* gemm = cublasHgemm;
^
lib/blas.h:30:53: error: field initializer is not constant
static constexpr decltype(cublasSgemm)* gemm = cublasSgemm;
^
lib/blas.h:35:53: error: field initializer is not constant
static constexpr decltype(cublasDgemm)* gemm = cublasDgemm;
^
make: *** [Makefile:30: haste] Error 1
During inference, I won't have a CUDA enabled GPU, so will my saved model work with default PyTorch LSTMs?
Using pytorch 1.5, cuda 10.1, python 3.7.7 in a clean virtual environment, using the haste-0.4 code under releases.
It properly compiles libhaste.a, then through tracing the makefile it attemps to execute:
python setup.py haste_pytorch -q bdist_wheel
which results in:
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: invalid command 'bdist_wheel'
After manually running "setup.py haste_pytorch install" as their is no bdist_wheel in the /tmp python packaging directory:
----> 1 import haste_pytorch
.../testing/lib/python3.7/site-packages/haste_pytorch-0.
4.0-py3.7-linux-x86_64.egg/haste_pytorch/__init__.py in <module>
19
20
---> 21 from .gru import GRU
22 from .indrnn import IndRNN
23 from .lstm import LSTM
.../testing/lib/python3.7/site-packages/haste_pytorch-0.
4.0-py3.7-linux-x86_64.egg/haste_pytorch/gru.py in <module>
17
18
---> 19 import haste_pytorch_lib as LIB
20 import torch
21 import torch.nn as nn
ImportError: libc10.so: cannot open shared object file: No such file or directory
Creating and installing an egg gets the same error.
Hello, I found there may be some numerical precision problems in some of the rnn routines.
I compiled the haste_pytorch and modified check function 'self_consistency' at haste/validation/pytorch.py like below, I use
function 'cal_err_pointwise' to compute a pointwise relative error between two tensors, in this manner I know how close two tensors are and I can check whether the outputs of rnns are reasonable.
I compared the output tensors of both CUDA routine and torch script routine in pytorch wrapper, but found that the max relative error is like 0.1~1 magnitude, which is fairly large. Below is my check function and results.
def cal_err_pointwise(a, b):
if a is None or b is None:
return None
a = a.cpu().detach().numpy()
b = b.cpu().detach().numpy()
if np.all(np.equal(a, b)):
return 0.0
diff = a - b
denom = np.abs(b) + np.ones_like(b) * 1e-10
ratio = np.abs(diff) / denom
err_mean = np.mean(ratio)
err_max = np.max(ratio)
return err_mean, err_max
cal_err = cal_err_pointwise
def self_consistency(rnn, x):
x_cuda = x.clone().cuda()
x_cpu = x.clone().cpu()
x_cuda.requires_grad_(True)
x_cpu.requires_grad_(True)
y1, _ = rnn.cuda().forward(x_cuda)
y1.backward(torch.ones_like(y1))
y2, _ = rnn.cpu().forward(x_cpu)
y2.backward(torch.ones_like(y2))
g1 = x_cpu.grad.data
g2 = x_cuda.grad.data
print('-' * 8 + " self consistency " + '-' * 8)
print("output rel err (mean, max) : {0}".format(cal_err(y1, y2)))
print("grad rel err (mean, max) : {0}".format(cal_err(g1, g2)))
print(torch.max(torch.abs(y1.cpu()-y2.cpu())))
print(torch.max(torch.abs(g1.cpu()-g2.cpu())))
My check results, the values wrapped with ** is the large relative errors
[indrnn]
-------- self consistency --------
output rel err (mean, max) : (1.4625937e-06, **0.28469396**)
grad rel err (mean, max) : (**0.0006560313**, **669.45105**)
tensor(6.5565e-07, grad_fn=<MaxBackward1>)
tensor(3.3379e-06)
[layer_norm_gru]
-------- self consistency --------
output rel err (mean, max) : (5.0523818e-06, **4.9286914**)
grad rel err (mean, max) : (1.2550343e-05, **4.330911**)
tensor(2.5630e-06, grad_fn=<MaxBackward1>)
tensor(1.2398e-05)
[layer_norm_indrnn]
-------- self consistency --------
output rel err (mean, max) : (1.2726755e-06, **0.087144986**)
grad rel err (mean, max) : (6.146369e-06, **0.6983186**)
tensor(1.1474e-06, grad_fn=<MaxBackward1>)
tensor(4.7684e-06)
[layer_norm_lstm]
-------- self consistency --------
output rel err (mean, max) : (5.336079e-06, **0.3929247**)
grad rel err (mean, max) : (2.4412904e-05, **1.0231713**)
tensor(1.4633e-05, grad_fn=<MaxBackward1>)
tensor(0.0470)
[lstm]
-------- self consistency --------
output rel err (mean, max) : (1.0738823e-06, **0.34573397**)
grad rel err (mean, max) : (2.7127278e-06, **0.19694966**)
tensor(6.2585e-07, grad_fn=<MaxBackward1>)
tensor(3.4571e-06)
native consistency
tensor(1.4901e-07, device='cuda:0', grad_fn=<MaxBackward1>)
tensor(9.5367e-07, device='cuda:0')
Hi I have been trying to haste_pytorch (the trainning speed of haste is phenomenal!) but I found that the gradients for kernel/recurrent_kernel become zero when the model is trained on gpu. The below is a simple code snippets I tried to test on:
lstm_layer = haste.LSTM(input_size=128, hidden_size=256, batch_first = True)
output = torch.nn.Linear(256*5, 1)
lstm_layer.cuda()
output.cuda()
x = torch.rand([1, 5, 128]).cuda()
target = torch.zeros(1).cuda()
loss_func = torch.nn.MSELoss()
optim = torch.optim.Adam(list(lstm_layer.parameters()) + list(output.parameters()))
for i in range(5):
y, _ = lstm_layer(x)
y = y.contiguous().view(1,-1)
y = output(y).squeeze()
loss = loss_func(y, target)
loss.backward()
optim.step()
for n, p in lstm_layer.named_parameters():
print(n, p.grad)
optim.zero_grad()
Print out:
kernel tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') recurrent_kernel tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') bias tensor([-1.8202e-10, 3.7714e-09, 2.8942e-09, ..., 1.0455e-08, 2.6969e-09, 1.6647e-08], device='cuda:0')
The gradients for kernel/recurrent_kernel become non-zero once "cuda()" are replaced by "cpu()".
Most grateful if you can provide some insight on it.
Many thanks for your help.
Downloading haste and trying to make it with 'make haste' results in following C++ compiler error(s):
my assumption (which might well be wrong) is that in CUDA Toolkit version I use (tried 10.2 and 10.1) NVIDIA changed the signatures of cublas[H/S/D]gemm and they don't work as constant initializers anymore.
Can you please look into it, but at least provide exact version of CUDA you have successfully built haste against?
Hi, So I've been trying to install haste on my 3 machines with only one successful. (One of them installed, but Segmentation fault
as soon as I throw tensors to GPU to be used with haste, the other one just errors out at some g++ compiler step in wheel)
So I've tried to move to docker, with repo2docker to make the process easier for my coworkers. However the image build still crashes at haste-pytorch
build stage, while using conda
for build process. As the haste pypi documentation state cuda toolkit 10.1+ is required, I assumed conda provided cuda toolkit sufficed, but seems like in the build-from-scratch process, this is not the case?
If I'm doing something wrong/misinterpreting, feedback would be more than welcome!
hi,any plan about these two questions
lstm zoneout on cell state the same with hidden state
haste/frameworks/pytorch/layer_norm_lstm.py
Lines 62 to 66 in 9da2454
add recurrent dropout the same with keras
https://github.com/tensorflow/tensorflow/blob/fcc4b966f1265f466e82617020af93670141b009/tensorflow/python/keras/layers/recurrent.py#L2450-L2459
thanks!
Hello @sharvil ,
I been using the CPU IndRNN implementation of hast during my development. The results have been great.
When I moved to production, that presumably will use the cuda version of IndRNN. But every time I hit a Segmentation fault during the very first start of training.
I tried to get a stack trace of the error, that I attached below, but do not seen very informative. There is anything to fix this?
Python 3.8, GPU T4 16 GB, Pytorch 1.6, Cuda 10.0.
Edit : After upgrading to 10.2, it worked
WARNING:root:Start a training with 400 max_iterations, using a INDRNN with 2 layers , and 384 of hidden size.
[Thread 0x7fff87baf700 (LWP 5801) exited]
[Thread 0x7fff8a3b0700 (LWP 5800) exited]
[Thread 0x7fff8abb1700 (LWP 5799) exited]
[Thread 0x7fff9fdc4700 (LWP 5798) exited]
[Thread 0x7fffa05c5700 (LWP 5797) exited]
[Thread 0x7fffa2dc6700 (LWP 5796) exited]
0%| | 0/1 [00:00<?, ?it/s]
Thread 1 "python3.8" received signal SIGSEGV, Segmentation fault.
__GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:65
65 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:65
#1 0x00007fff4bde2132 in ?? () from /usr/local/cuda/lib64/libcublas.so.10.0
#2 0x00007fff4bddb289 in ?? () from /usr/local/cuda/lib64/libcublas.so.10.0
#3 0x00007fff4bdde15f in cublasSgemm_v2 () from /usr/local/cuda/lib64/libcublas.so.10.0
#4 0x00007fff60c9376d in haste::v0::indrnn::ForwardPass<float>::Run (this=this@entry=0x7fffffff9968, steps=steps@entry=934, W=<optimized out>, u=0x7fff01a5b600, b=b@entry=0x7fff01a5bc00, x=x@entry=0x7ffefe800000,
h=0x7ffefed5e400, workspace=0x7ffefeebce00, zoneout_prob=zoneout_prob@entry=0.5, zoneout_mask=0x7ffefec00000) at lib/indrnn_forward_gpu.cu.cc:139
#5 0x00007fff60c75510 in (anonymous namespace)::<lambda()>::<lambda()>::operator() (__closure=<optimized out>) at frameworks/pytorch/indrnn.cc:60
#6 (anonymous namespace)::<lambda()>::operator() (__closure=<optimized out>) at frameworks/pytorch/indrnn.cc:60
#7 (anonymous namespace)::indrnn_forward (training=<optimized out>, zoneout_prob=<optimized out>, x=..., h0=..., kernel=..., recurrent_scale=..., bias=..., zoneout_mask=...) at frameworks/pytorch/indrnn.cc:60
#8 0x00007fff60c77493 in pybind11::detail::argument_loader<bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>::call_impl<at::Tensor, at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, pybind11::gil_scoped_release> (f=<optimized out>, this=0x7fffffff9aa0)
at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/cast.h:1931
#9 pybind11::detail::argument_loader<bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>::call<at::Tensor, pybind11::gil_scoped_release, at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)>(at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)) && (f=<optimized out>, this=0x7fffffff9aa0)
at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/cast.h:1908
#10 void pybind11::cpp_function::initialize<at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), at::Tensor, bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [15], pybind11::call_guard<pybind11::gil_scoped_release> >(at::Tensor (*&)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), at::Tensor (*)(bool, float, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [15], pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (__closure=<optimized out>, call=...)
at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/pybind11.h:155
#11 0x00007fff60c5f47d in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=
(True, <float at remote 0x7ffff6964110>, <Tensor at remote 0x7fff8f1b3440>, <Tensor at remote 0x7fff8f206380>, <Parameter at remote 0x7fff8f17e680>, <Parameter at remote 0x7fff8f17e700>, <Parameter at remote 0x7fff8f17e740>, <Tensor at remote 0x7fff8f206a80>), kwargs_in=0x0) at /usr/local/lib/python3.8/dist-packages/torch/include/pybind11/pybind11.h:620
#12 0x00000000004fac34 in cfunction_call_varargs (kwargs=<optimized out>, args=<optimized out>, func=<built-in method indrnn_forward of PyCapsule object at remote 0x7fff8f260d80>) at ../Objects/call.c:742
#13 PyCFunction_Call () at ../Objects/call.c:772
#14 0x000000000055adf7 in do_call_core (kwdict=0x0,
callargs=(True, <float at remote 0x7ffff6964110>, <Tensor at remote 0x7fff8f1b3440>, <Tensor at remote 0x7fff8f206380>, <Parameter at remote 0x7fff8f17e680>, <Parameter at remote 0x7fff8f17e700>, <Parameter at remote 0x7fff8f17e740>, <Tensor at remote 0x7fff8f206a80>), func=<built-in method indrnn_forward of PyCapsule object at remote 0x7fff8f260d80>, tstate=<optimized out>) at ../Python/ceval.c:4983
#15 _PyEval_EvalFrameDefault () at ../Python/ceval.c:3559
#16 0x0000000000555060 in PyEval_EvalFrameEx (throwflag=0,
f=Frame 0x7fff8f162040, for file /usr/local/lib/python3.8/dist-packages/haste_pytorch/indrnn.py, line 60, in forward (ctx=<IndRNNFunctionBackward at remote 0x7fff8f1f2c80>, training=True, zoneout_prob=<float at remote 0x7ffff6964110>, inputs=(<Tensor at remote 0x7fff8f1b3440>, <Tensor at remote 0x7fff8f206380>, <Parameter at remote 0x7fff8f17e680>, <Parameter at remote 0x7fff8f17e700>, <Parameter at remote 0x7fff8f17e740>, <Tensor at remote 0x7fff8f206a80>))) at ../Python/ceval.c:741
#17 _PyEval_EvalCodeWithName () at ../Python/ceval.c:4298
#18 0x00000000004f9d9d in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fff8f22ed78, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#19 0x00000000004fbae2 in PyVectorcall_Call (kwargs=0x0, tuple=<optimized out>, callable=<function at remote 0x7fff8f26f5e0>) at ../Objects/call.c:199
#20 PyObject_Call (kwargs=0x0, args=<optimized out>, callable=<function at remote 0x7fff8f26f5e0>) at ../Objects/call.c:227
#21 PyEval_CallObjectWithKeywords (kwargs=0x0, args=<optimized out>, callable=<function at remote 0x7fff8f26f5e0>) at ../Objects/call.c:809
#22 PyObject_CallObject () at ../Objects/call.c:817
#23 0x00007ffff1d06a45 in THPFunction_apply(_object*, _object*) () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#24 0x00000000004facba in cfunction_call_varargs (kwargs=<optimized out>, args=<optimized out>, func=<built-in method apply of FunctionMeta object at remote 0x446cd20>) at ../Objects/call.c:757
#25 PyCFunction_Call () at ../Objects/call.c:772
#26 0x00000000004f95d9 in _PyObject_MakeTpCall () at ../Objects/call.c:159
#27 0x000000000055ad0e in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=<built-in method apply of FunctionMeta object at remote 0x446cd20>) at ../Include/cpython/abstract.h:125
#28 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0xb0a990) at ../Python/ceval.c:4963
#29 _PyEval_EvalFrameDefault () at ../Python/ceval.c:3469
#30 0x00000000004f9d0a in function_code_fastcall (globals=<optimized out>, nargs=4, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
#31 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fff8f1677f0, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:410
I think it affects all layers which have 'dropout' - on inference
tf.nn.dropout[weights['recurrent_kernel'], rate = self.dropout)
is passed onto your C++ / CUDA part. Smth of this sort is more appropriate:
tf.nn.dropout[weights['recurrent_kernel'], rate = (self.dropout if training else 0.0))
I am not sure if zoneout is applied correctly under the hood too, does 'training' parameter to CUDA part switches it off correctly? I hacked around it amending relevant code as following:
h, c, _ = LIB.haste_lstm( x, weights['kernel'], tf.nn.dropout(weights['recurrent_kernel'], rate = (self.dropout if training else 0.0)), weights['bias'], zoneout_mask if training else tf.zeros([0, 0, 0], dtype=self.dtype), training=training, zoneout_prob=(self.zoneout if training else 0.0))
Hello,
In our CI we install the haste_pytorch but it is failing because we do not have cuda. On #2 you said that is possible to run on CPU only scenarios.
Would be possible to install haste_pytorch without requiring cuda?
I really wanted to use the Haste API in order to do a multiple layer LSTM. But I realized that, in order to do so you need the intermediate hidden states of a previous layer in order to feed the next layer, and I see no trivial solution for the problem with the support of your API. I wonder how the haste community of developers deals with multiple layers of LSTM.
Thank you so much for your time, and have a nice day.
Hi, I tried to use haste for tf for testing reccurent_dropout.
However, I got that error message while
import haste_tf as haste
I am using Windows10 and Anaconda. I installed via pip.
Thats the stacktrace:
NotFoundError Traceback (most recent call last)
<ipython-input-1-9cc6bdc626a2> in <module>
----> 1 import haste_tf as haste
2 import tensorflow as tf
3
4 import tensorflow_addons as tfa
5 from tensorflow import keras
~\anaconda3\envs\tf_nightly_env\lib\site-packages\haste_tf\__init__.py in <module>
20
21 from ._version import __version__ # generated in setup.py
---> 22 from .gru import GRU
23 from .gru_cell import GRUCell
24 from .indrnn import IndRNN
~\anaconda3\envs\tf_nightly_env\lib\site-packages\haste_tf\gru.py in <module>
30
31
---> 32 LIB = tf.load_op_library(pkg_resources.resource_filename(__name__, 'libhaste_tf.so'))
33
34
~\anaconda3\envs\tf_nightly_env\lib\site-packages\tensorflow\python\framework\load_library.py in load_op_library(library_filename)
56 RuntimeError: when unable to load the library or get the python wrappers.
57 """
---> 58 lib_handle = py_tf.TF_LoadLibrary(library_filename)
59 try:
60 wrappers = _pywrap_python_op_gen.GetPythonWrappers(
Inspecting Pytorch's source code, I don't think stateful=True
is supported, though there's a custom implementation, so it appears doable. Any planned support? It's crucial in my application
This is minor, but the three PyTorch layers defined in the README should be put on the GPU, e.g.
norm_lstm_layer = haste.LayerNormLSTM(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05).cuda()
since the input is a CUDA tensor.
Hi!
I found that this library supports the lengths
parameter for the LSTM
. What does this mean for the backward component of a bidirectional LSTM, if there is padding at the end of the sequence? Is the padding ignored?
Thanks!
I have tested HASTE on two different instance types on AWS (for reproducibility):
p2.xlarge (K80 instance)
p3.2xlarge (V100 instance)
Both instances were using stock Deep Learning AMI (Amazon Linux 2) Version 29.0 - ami-0b0b075706e19de29
Following sequence of commands was used to install the HASTE:
(0) Change symlink of /usr/local/cuda to point from /usr/local/cuda-10.0 to /usr/local/cuda-10.1 (see another issue that without this HASTE does not install properly).
(1) source activate tensorflow2_p36
(2) git clone https://github.com/lmnt-com/haste
(3) cd haste
(4) make haste_tf
(5) pip install haste_tf-*.whl
then from jupyter notebook the following:
%env CUDA_VISIBLE_DEVICES=0
import numpy as np
import pickle
import tensorflow as tf
#gpus = tf.config.experimental.list_physical_devices('GPU')
#tf.config.experimental.set_memory_growth(gpus[0], True)
import haste_tf as haste
from tensorflow.python.keras import layers as L
from tensorflow.python.keras import backend as K
embedding_size = 100 #n_channels
lstm_nunits = 200
ntimestamps = 300
batch_size = 16
class HasteLSTM(tf.keras.layers.Layer):
def __init__(self, num_units, dropout, zoneout, shape):
super(HasteLSTM, self).__init__()
self.haste_lstm = haste.LSTM(num_units = num_units, dropout = dropout, zoneout = zoneout, direction='unidirectional')
self.haste_lstm.build(shape)
def call(self, inputs, training):
return self.haste_lstm(inputs, training = training)
haste_lstm = HasteLSTM(lstm_nunits, 0.00, 0.00, [batch_size, ntimestamps, embedding_size])
#not really a CuDNN but a normal LSTM, so number of parameters matches
cudnn_lstm = L.LSTM(lstm_nunits, return_sequences = True, unit_forget_bias = False)
dummy_input = tf.random.normal([batch_size, ntimestamps, embedding_size])
dummy_target = np.zeros(shape=(batch_size, ntimestamps, lstm_nunits))
for i in range(dummy_target.shape[0]):
for j in range(dummy_target.shape[1]):
dummy_target[i,j,np.random.randint(0, lstm_nunits)] = 1 #one in random position for each timestamp
input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = haste_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classificiton task
model_haste = tf.keras.Model(inputs=input_, outputs=model_, name='haste_model')
input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = cudnn_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classification task
model_cudnn = tf.keras.Model(inputs=input_, outputs=model_, name='cudnn_model')
total_trainable = 0
haste_trainable = []
for w in haste_lstm.haste_lstm.trainable_variables:
K.set_value(w, np.zeros_like(w.numpy()))
haste_trainable.append(w)
total_trainable += w.numpy().flatten().shape[0]
print("HASTE has total %d trainable variables!" % total_trainable)
total_trainable = 0
cudnn_trainable = []
for w in cudnn_lstm.trainable_weights:
K.set_value(w, np.zeros_like(w.numpy()))
cudnn_trainable.append(w)
total_trainable += w.numpy().flatten().shape[0]
print("CuDNN has total %d trainable variables!" % total_trainable)
#check HASTE gradients on the dummy example
with tf.GradientTape() as tape:
prediction = model_haste(dummy_input, training=True)
loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
gradients = tape.gradient(loss, haste_trainable)
print("HASTE maxabs of each grad:")
for grad in gradients:
print (np.max(np.abs(grad)))
print("Non-HASTE maxabs of each grad:")
#check CuDNN (actually - plain LSTM) gradients on the dummy example
with tf.GradientTape() as tape:
prediction = model_cudnn(dummy_input, training=True)
loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
gradients = tape.gradient(loss, cudnn_trainable)
for grad in gradients:
print (np.max(np.abs(grad)))
On p2.xlarge (K80) the following is the output:
env: CUDA_VISIBLE_DEVICES=0
HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
0.0
0.0
Non-HASTE maxabs of each grad:
6.3259706
0.0
7.397908
On p3.2xlarge (V100) the following is the output:
env: CUDA_VISIBLE_DEVICES=0
HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
7.004616
6.2311497
Non-HASTE maxabs of each grad:
6.231148
0.0
7.0048447
Gradients appear to be broken on K80 device.
Logs; the ultimate error reads:
process_begin: CreateProcess(NULL, ar -crv libhaste.a lib/*.o, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [haste] Error 2
Any suggestions or alternate install methods? Don't know which "file" isn't found -- thanks.
Details:
"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\"
where cl.exe iscd path_to_haste; make haste_tf
in Anaconda Powershell Prompt
virtualenvI am using Pytorch, and all I want to know is how do I optimize the parameters of haste_lstm. Suppose optim is a PyTorch optimizer. How do I pass the parameters of haste to optim? So that its backpropagation also interferes in the haste parameters.
It would be nice to support some form of layer normalization in LSTM and GRU layer (example https://github.com/pytorch/pytorch/blob/master/benchmarks/fastrnns/custom_lstms.py#L171)
In the layer_norm_gru_cell.py in build() for tf-version 2.3 and 2.5:
self.gamma = v1.get_variable('gamma', initializer=v1.initializers.ones())
is throwing an error
The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of
tf.keras.initializers.*' and
shape` should be fully defined.
Is it necessary to add a shape=1 to the get_variable call()?
Hello,
I know this seems more of a debugging problem/problem on my side, but get the following error message when running my code, and it only appears when running it with a haste layer:
Traceback (most recent call last):
File "<string>", line 1331, in haste_lstm
File "<string>", line 1379, in haste_lstm_eager_fallback
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 280, in args_to_matching_eager
ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 280, in <listcomp>
ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
return func(*args, **kwargs)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 339, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 265, in constant
allow_broadcast=True)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 276, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 301, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/keras_tensor.py", line 274, in __array__
'Cannot convert a symbolic Keras input/output to a numpy array. '
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/snap/pycharm-professional/237/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/snap/pycharm-professional/237/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/time-series-on-joints-emg/src/all_in_one_file.py", line 394, in <module>
x, state = haste1(x, training=True)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/haste_tf/base_rnn.py", line 115, in __call__
result, state = self.fw_layer(inputs, sequence_length, training)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/haste_tf/lstm.py", line 218, in __call__
zoneout_prob=self.zoneout)
File "<string>", line 1339, in haste_lstm
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 122, in dispatch
result = dispatcher.handle(op, args, kwargs)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1450, in handle
return TFOpLambda(op)(*args, **kwargs)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 863, in _infer_output_signature
outputs = call_fn(inputs, *args, **kwargs)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1327, in _call_wrapper
return self._call_wrapper(*args, **kwargs)
File "/mnt/SSD/Marko/Dokumente/Uni/SoSe21/MA/LSTM_testproject/envs/LSTM_testproject/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1359, in _call_wrapper
result = self.function(*args, **kwargs)
TypeError: haste_lstm() missing 1 required positional argument: 'training'
I construct the model with the following code:
inputs = k_l.Input(shape=(train_x.shape[1], train_x.shape[2]))
direction = 'unidirectional' if args.model == 'GRU' else 'bidirectional'
haste1 = haste.LSTM(args.hidden_size, direction=direction, zoneout=0.1, dropout=args.dropout_time)
fc1 = k_l.Dense(args.dense_layers[0], activation='relu', kernel_initializer='he_uniform')
dr1 = k_l.Dropout(0.2)
fc2 = k_l.Dense(1)
x, state = haste1(inputs, training=True)
x = fc1(inputs)
x = dr1(x)
outputs = fc2(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss=loss_func, optimizer=optimizer)
model_hist = model.fit(train_x, train_y, epochs=args.epochs, batch_size=args.batch_size, verbose=1,
validation_data=val_data, callbacks=keras_callbacks)
train_x numpy array shape is (21788, 1000, 4)
OS: Ubuntu 20.04
Python version: 3.7
Keras: 2.4.3
Tensorflow: 2.4.1
numpy: 1.19.5
GPU: GTX 1060
CUDA: 11.2
Normally I wouldn't post those error messages on github, but as the code would run without the haste layer, I suspect that the cause of the error lies somewhere close to it, and this repo seems to be the best place to ask and I didn't find any solutions elsewhere. I hope you can help me, I'd really like to try out your implementation for my dataset.
First of all, thank you for this wonderful library.
I'm unable to compile haste_tf
with Tensorflow 2.7.0 and the most recent haste
codebase from the master
branch. The root error seems to be: ‘bfloat16’ in namespace ‘Eigen’ does not name a type
. Installation with pip looks successful at first glance, but silently fails to build libhaste_tf.so
. I've assumed there's something wrong with my setup and tried the Colab linked in the README, but it fails with the same errors. I'd appreciate any assistance.
Can we have the pytorch "return_state_sequence" = True in tensorflow version like in return_seq=True in tensorflow / keras original code ?
Hello.
Is it possible to apply any activation function between hidden states in IndRNN in tensorflow framework?
Currently I don't see any argument similar to "activation"
Keyword Arguments:
kernel_initializer: (optional) the initializer to use for the input
matrix weights. Defaults to `glorot_uniform`.
recurrent_initializer: (optional) the initializer to use for the
recurrent scale weights. Defaults to uniform random in [-0.5, 0.5].
Note that this initialization scheme is different than in the original
authors' implementation. See https://github.com/lmnt-com/haste/issues/7
for details.
bias_initializer: (optional) the initializer to use for the bias vector.
Defaults to `zeros`.
kernel_transform: (optional) a function with signature
`(kernel: Tensor) -> Tensor` that transforms the kernel before it is
used. Defaults to the identity function.
recurrent_transform: (optional) a function with signature
`(recurrent_scale: Tensor) -> Tensor` that transforms the recurrent
scale vector before it is used. Defaults to the identity function.
bias_transform: (optional) a function with signature
`(bias: Tensor) -> Tensor` that transforms the bias before it is used.
Defaults to the identity function.
zoneout: (optional) float, sets the zoneout rate for Zoneout
regularization. Defaults to 0.
dtype: (optional) the data type for this layer. Defaults to `tf.float32`.
name: (optional) string, the name for this layer.
It seems like Haste does not support packed sequences for dealing with variable-length sequences in PyTorch. Any chance this gets implemented in the near future? Thanks for the great work.
Hey!
What is the testing strategy that you're using? I'm having a hard time determining if I can trust your implementation. (I reviewed the validation
directory and it seemed like the tests mainly focused on the core implementation. It wasn't clear things like Zoneout
or LayerNorm
were being tested as well.)
Also, thank you for supporting a CPU! For us, while most training is performed on a GPU, I still test and run my code on a CPU because it's more productive to use my laptop instead of the cloud to develop.
Thanks!
Is there a way to expose LayerNormGRUCell Cuda into python ?
Is it possible to pass in the hidden and cell state to the (LayerNorm)LSTM's forward pass? To be clear:
y, state = norm_lstm_layer(x) # current API
y, state = norm_lstm_layer(x, (h0, c0)) # desired API, also same as nn.LSTM's API
Maybe I'm missing something because it seems like a pretty standard use case. For example,
Thanks! And this looks great btw; I've had many annoyances having to implement custom rnn's (layer norm, recurrent dropout, etc.) in pytorch in the past.
I am working in google colab with pytorch and haste and running into a weird issue.
The code below gives RuntimeError: CUDA error: invalid configuration argument.
when feature_sizes=[512]
. It works fine for feature_sizes=[256]
, feature_sizes=[128]
, etc.
Maybe be related to something with CUDA kernels grid sizes pytorch/pytorch#28927
The error is on the logits = self.fc_logits(hidden_states)
line
I received a TON of warnings when building haste on google colab. Is it possible the package wasn't properly built?
Also when doing haste.__version__
I get the error AttributeError: module 'haste_pytorch' has no attribute '__version__'
Pytorch version torch.__version__
is '1.5.0+cu101'
CODE:
class LSTMEmbedDiscNet(nn.Module):
"""
An LSTM discriminator that operates on word indexes.
IMPORANT: feature_sizes=[512] gives RuntimeError: CUDA error: invalid configuration argument.
Maybe related to https://github.com/pytorch/pytorch/pull/28927
Use feature_sizes=[256] or lower.
"""
def __init__(self, feature_sizes=[512], vocab_size=5726, use_layer_norm=True, trainable_embedding_size=64, dropout=0.1, pad_token=0, embedding_source=None, vocab_file=None, position_dim=8):
super().__init__()
self._feature_sizes = feature_sizes
self._vocab_size = vocab_size
self._use_layer_norm = use_layer_norm
self._trainable_embedding_size = trainable_embedding_size
self._embedding_source = embedding_source
self._vocab_file = vocab_file
self._dropout = dropout
self._pad_token = pad_token
self._position_dim = position_dim
if self._embedding_source:
assert vocab_file
self.define_module()
def define_module(self):
if self._embedding_source:
assert False # TODO
else:
self.embed = nn.Embedding(self._vocab_size, self._trainable_embedding_size)
self.drop = nn.Dropout(p=self._dropout)
self.fc_embed_hidden = nn.Linear(self._trainable_embedding_size + self._position_dim, self._feature_sizes[0])
self.encoder_cell = haste.LayerNormLSTM(input_size=self._feature_sizes[0], hidden_size=self._feature_sizes[0], batch_first=True)
self.fc_logits = nn.Linear(self._feature_sizes[0], 1)
def forward(self, sequence, sequence_length):
"""Connect to the graph.
Args:
sequence: A [batch_size, max_sequence_length] tensor of int. For example
the indices of words as sampled by the generator.
sequence_length: A [batch_size] tensor of int. Length of the sequence.
is_training: Boolean, False to disable dropout.
Returns:
A [batch_size, max_sequence_length, feature_size] tensor of floats. For
each sequence in the batch, the features should (hopefully) allow to
distinguish if the value at each timestep is real or generated.
"""
device = sequence.device
batch_size, max_sequence_length = sequence.size()
embeddings = self.drop(self.embed(sequence)) # batch_size, max_sequence_length, self._embedding_size
embeddings_pos = append_position_signal(embeddings, self._position_dim)
lstm_inputs = self.fc_embed_hidden(embeddings_pos) # batch_size, max_sequence_length, self._feature_sizes[0]
hidden_states, _ = self.encoder_cell(lstm_inputs)
logits = self.fc_logits(hidden_states)
logits_flat = logits.squeeze(2)
# Mask past first PAD symbol
# mask = utils.get_mask_past_symbol(sequence, self._pad_token)
# masked_logits_flat = logits_flat * mask
# return masked_logits_flat
return logits_flat
def test_lstm_embed_disc_net():
d_batch = 4
d_max_seq_len = 52
device = torch.device('cuda:0')
model = LSTMEmbedDiscNet().to(device).train()
d_vocab = model._vocab_size
assert model
texts = torch.randint(low=0, high=d_vocab, size=(d_batch, d_max_seq_len)).to(device)
text_lens = torch.randint(low=3, high=d_max_seq_len, size=(d_batch,)).to(device)
logits = model(texts, text_lens)
assert logits.size() == (d_batch, d_max_seq_len)
test_lstm_embed_disc_net()
ERROR:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-54-11fbf5027e87> in <module>()
79 assert logits.size() == (d_batch, d_max_seq_len)
80
---> 81 test_lstm_embed_disc_net()
5 frames
<ipython-input-54-11fbf5027e87> in test_lstm_embed_disc_net()
76 text_lens = torch.randint(low=3, high=d_max_seq_len, size=(d_batch,)).to(device)
77
---> 78 logits = model(texts, text_lens)
79 assert logits.size() == (d_batch, d_max_seq_len)
80
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
<ipython-input-54-11fbf5027e87> in forward(self, sequence, sequence_length)
55
56 hidden_states, _ = self.encoder_cell(lstm_inputs)
---> 57 logits = self.fc_logits(hidden_states)
58 logits_flat = logits.squeeze(2)
59
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py in forward(self, input)
85
86 def forward(self, input):
---> 87 return F.linear(input, self.weight, self.bias)
88
89 def extra_repr(self):
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1610 ret = torch.addmm(bias, input, weight.t())
1611 else:
-> 1612 output = input.matmul(weight.t())
1613 if bias is not None:
1614 output += bias
RuntimeError: CUDA error: invalid configuration argument
Thanks for your implementation.
Are there reproducibility benchmarks vs. the CuDNN variants (do the layers behave approx. the same w/ same configurations)?
Hello,
I´m in awe with this library. I got great results with LayerNorm LSTM.
Do you have plans for releasing on pypi? With would allow more users to use it!
I'm getting an issue while installing using make haste_tf or just make
File "setup.py", line 54
with open(f'tf/_version.py', 'wt') as f:
^
SyntaxError: invalid syntax
make: *** [Makefile:61: haste_tf] Error 1
not sure what the problem might be here, I can train models using gpu in tensorflow so I don't think it's a cuda or tensorflow problem but I'm not sure.
everething is identical insted of perplacing pytorch LSTM to one of your implementation leads to NAN loss
When I run an RNN with the example (e.g., GRU, IndRNN) I get illegal memory access error.
import torch
import haste_pytorch as haste
x = torch.rand([25, 5, 128]).cuda()
gru_layer = haste.GRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)
gru_layer.cuda()
y, state = gru_layer(x)
y.mean().backward()
Results in:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-66139d497ca7> in <module>
7 gru_layer.cuda()
8 y, state = gru_layer(x)
----> 9 y.mean().backward()
RuntimeError: CUDA error: an illegal memory access was encountered
I'm using Pytorch 1.7.1+cu110
and Python 3.7.3
.
Haste is from the github master, compiled by make haste_pytorch
.
Hi guys. I am working on RWKV, which might be the only RNN (no attention!) that can match transformer LM & zero-shot performance at 1B+ params:
https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r_rwkv3_scaling_rnn_to_15b_and_reach_transformer/
I am using some CUDA in my project too. Probably we can collaborate to promote RNN and scale it to 100B+ params :)
The RWKV discord: https://discord.gg/bDSBUMeFpc
Github: https://github.com/BlinkDL/RWKV-LM
CUDA stuff: https://github.com/BlinkDL/RWKV-CUDA
Hi,
I am trying to install haste_pytorch on windows using Pip as following:
pip install haste_pytorch
However, I am getting this error:
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Collecting haste_pytorch
Using cached haste_pytorch-0.5.0rc0.tar.gz (44 kB)
Using legacy setup.py install for haste-pytorch, since package 'wheel' is not installed.
Installing collected packages: haste-pytorch
Running setup.py install for haste-pytorch ... error
ERROR: Command errored out with exit status 1:
command: 'c:\program files (x86)\microsoft visual studio\shared\python37_64\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"'; __file__='"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\helfa\AppData\Local\Temp\pip-record-3mvw8fjq\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\program files (x86)\microsoft visual studio\shared\python37_64\Include\haste-pytorch'
cwd: C:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\
Complete output (46 lines):
c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages\setuptools\dist.py:454: UserWarning: Normalizing '0.5.0-rc0' to '0.5.0rc0'
warnings.warn(tmpl.format(**locals()))
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\base_rnn.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\gru.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\indrnn.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\layer_norm_gru.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\layer_norm_indrnn.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\layer_norm_lstm.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\lstm.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\_version.py -> build\lib.win-amd64-3.7\haste_pytorch
copying frameworks\pytorch\__init__.py -> build\lib.win-amd64-3.7\haste_pytorch
running build_ext
C:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\utils\cpp_extension.py:334: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
'make' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\utils\cpp_extension.py:270: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
building 'haste_pytorch_lib' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\frameworks
creating build\temp.win-amd64-3.7\Release\frameworks\pytorch
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\gru.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\gru.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
gru.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\indrnn.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\indrnn.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
indrnn.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\layer_norm_gru.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_gru.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
layer_norm_gru.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\layer_norm_indrnn.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_indrnn.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
layer_norm_indrnn.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\layer_norm_lstm.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_lstm.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
layer_norm_lstm.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\lstm.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\lstm.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
lstm.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\helfa\AppData\Local\Temp\pip-install-vk46qkk_\haste-pytorch\lib "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\torch\csrc\api\include -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\TH -IC:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\include\THC "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-Ic:\program files (x86)\microsoft visual studio\shared\python37_64\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpframeworks/pytorch\support.cc /Fobuild\temp.win-amd64-3.7\Release\frameworks/pytorch\support.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=haste_pytorch_lib -D_GLIBCXX_USE_CXX11_ABI=0
support.cc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:. "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib64" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64" /LIBPATH:C:\Users\helfa\AppData\Roaming\Python\Python37\site-packages\torch\lib "/LIBPATH:c:\program files (x86)\microsoft visual studio\shared\python37_64\libs" "/LIBPATH:c:\program files (x86)\microsoft visual studio\shared\python37_64\PCbuild\amd64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" haste.lib cublas.lib cudart.lib c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_haste_pytorch_lib build\temp.win-amd64-3.7\Release\frameworks/pytorch\gru.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\indrnn.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_gru.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_indrnn.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\layer_norm_lstm.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\lstm.obj build\temp.win-amd64-3.7\Release\frameworks/pytorch\support.obj /OUT:build\lib.win-amd64-3.7\haste_pytorch_lib.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\frameworks/pytorch\haste_pytorch_lib.cp37-win_amd64.lib
LINK : fatal error LNK1181: cannot open input file 'haste.lib'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.26.28801\\bin\\HostX86\\x64\\link.exe' failed with exit status 1181
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\program files (x86)\microsoft visual studio\shared\python37_64\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"'; __file__='"'"'C:\\Users\\helfa\\AppData\\Local\\Temp\\pip-install-vk46qkk_\\haste-pytorch\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\helfa\AppData\Local\Temp\pip-record-3mvw8fjq\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\program files (x86)\microsoft visual studio\shared\python37_64\Include\haste-pytorch' Check the logs for full command output.
RNN
s use the zoneout mask of the first step to compute gradients for all steps.
For example, maybe we could change GRU
backward from
haste/lib/gru_backward_gpu.cu.cc
Line 371 in 3d92d70
zoneout_mask ? zoneout_mask + i * NH : nullptr
I'd like to suggest support for IndRNNs; in my experiments on EEG seizure classification w/ very long sequences, they've dominated LSTMs & GRUs. While already also much faster, IndRNNs would benefit from a CuDNN-like speedup in large stacks, and from Layer Normalization for working w/ 1000+ timesteps.
Minimal tf.keras
code below; default weight initialization should be handled differently - can clarify post-approval.
from tensorflow.python.keras import activations
from tensorflow.python.keras import backend as K
from tensorflow.python.keras import constraints
from tensorflow.python.keras import initializers
from tensorflow.python.keras import regularizers
from tensorflow.python.keras.engine.base_layer import Layer
from tensorflow.python.keras.utils import tf_utils
from tensorflow.python.ops import math_ops
from tensorflow.python.training.tracking import data_structures
from tensorflow.python.util.tf_export import keras_export
from tensorflow.python.keras.layers.recurrent import DropoutRNNCellMixin
@keras_export(v1=['keras.layers.IndRNNCell'])
class IndRNNCell(DropoutRNNCellMixin, Layer):
def __init__(self,
units,
activation='tanh',
use_bias=True,
recurrent_clip_min=-1,
recurrent_clip_max=-1,
kernel_initializer='glorot_normal',
recurrent_initializer=None,
bias_initializer='zeros',
kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
kernel_constraint=None,
recurrent_constraint=None,
bias_constraint=None,
dropout=0.,
recurrent_dropout=0.,
implementation=1,
**kwargs):
super(IndRNNCell, self).__init__(**kwargs)
if recurrent_clip_min is None or recurrent_clip_max is None:
recurrent_clip_min = None
recurrent_clip_max = None
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.recurrent_clip_min = recurrent_clip_min
self.recurrent_clip_max = recurrent_clip_max
self.kernel_initializer = initializers.get(kernel_initializer)
if self.recurrent_initializer is None:
self.recurrent_initializer = initializers.uniform(-1.0, 1.0)
else:
self.recurrent_initializer = initializers.get(recurrent_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.recurrent_regularizer = regularizers.get(recurrent_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.recurrent_constraint = constraints.get(recurrent_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.dropout = min(1., max(0., dropout))
self.recurrent_dropout = min(1., max(0., recurrent_dropout))
self.state_size = data_structures.NoDependency([self.units])
self.output_size = self.units
@tf_utils.shape_type_conversion
def build(self, input_shape):
input_dim = input_shape[-1]
self.timesteps = input_shape[1]
self._process_recurrent_clip()
self.kernel = self.add_weight(
shape=(input_dim, self.units),
name='kernel',
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.recurrent_kernel = self.add_weight(
shape=(self.units,),
name='recurrent_kernel',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
if self.use_bias:
self.bias = self.add_weight(
shape=(self.units,),
name='bias',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.built = True
def call(self, inputs, states, training=None):
h_tm1 = states[0] # previous memory state
dp_mask = self.get_dropout_mask_for_cell(inputs, training, count=1)
rec_dp_mask = self.get_recurrent_dropout_mask_for_cell(
h_tm1, training, count=1)
if 0. < self.dropout < 1.:
inputs = inputs * dp_mask[0]
if 0. < self.recurrent_dropout < 1.:
h_tm1 = h_tm1 * rec_dp_mask[0]
h = K.dot(inputs, self.kernel)
h += math_ops.multiply(h_tm1, self.recurrent_kernel)
if self.use_bias:
h = K.bias_add(h, self.bias)
h = self.activation(h)
return h, [h]
I assume this is not supposed to happen, but I checked the model's parameters after training and these were the values from the final IndRNN layer in my model:
rnn2.bias Parameter containing: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)
This is my module:
self.rnn = haste.IndRNN(input_dim, hidden_dim, batch_first=True, zoneout=0.1, return_state_sequence=True) self.rnn2 = haste.IndRNN(hidden_dim, 64, batch_first=True, zoneout=0.075) self.d1 = nn.Dropout(0.15)
And forward function:
out, (hn) = self.rnn(x) out, (hn) = self.rnn2(self.d1(out))
Can this be used without CUDA? Specifically, I have my laptop for local development and my HPC for actual training. Will the Haste LSTM work on my local as well? (Slower, which is fine, but that way I can maintain one codebase)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.