Giter Site home page Giter Site logo

public_plstm's Introduction

Phased LSTM

This is the official repository of "Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences," presented as an oral presentation at NIPS 2016, by Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu.

Rule of Thumb

In general, if you are using ~1000 timesteps or more in your input sequence, you can benefit from PLSTM.

If you're only answering bAbI tasks or doing negative log-likelihood on some paragraph of text, you're unlikely to see improvement from this model. However, for long sequences (e.g., whole-text summarization), or sequences which are fusing input from multiple sensors with different timing (e.g., one going at 3 Hz and the other at 25 Hz), this model is both natural and efficient.

Making it work well for speech and NLP is still experimental and ongoing work. If this is of interest to you, let me know and I can give you an update.

Now available in TensorFlow and Keras!

Freq Task 1

To run the first task, run the shell script a_freq_task.sh. It should load the first task with default parameters, training each model under each condition for 70 epochs. Afterwards, you can open A_Freq_Task.ipynb to render the results, which should show the following:

Freq Task A

Freq Task 2

To run the second task, run the shell script b_freq_combo_task.sh. It should load the second task with default parameters, training each model with the more complex stimuli for 300 epochs (a long time!). Afterwards, you can open B_Freq_Combo_Task.ipynb to render the results, which should show the following:

Freq Task A

It runs the same Python file as in task 1, but the data iterator is changed to be the more complex version.

PLSTM Notes

The essence of the PLSTM code (plstm.py) is the following lines:

def calc_time_gate(time_input_n):
    # Broadcast the time across all units
    t_broadcast = time_input_n.dimshuffle([0,'x'])
    # Get the time within the period
    in_cycle_time = T.mod(t_broadcast + shift_broadcast, period_broadcast)
    # Find the phase
    is_up_phase = T.le(in_cycle_time, on_mid_broadcast)
    is_down_phase = T.gt(in_cycle_time, on_mid_broadcast)*T.le(in_cycle_time, on_end_broadcast)
    # Set the mask
    sleep_wake_mask = T.switch(is_up_phase, in_cycle_time/on_mid_broadcast,
                        T.switch(is_down_phase,
                            (on_end_broadcast-in_cycle_time)/on_mid_broadcast,
                                off_slope*(in_cycle_time/period_broadcast)))

    return sleep_wake_mask

This creates the rhythmic mask based on some time_input_n which is a vector of times, one time for all neurons for each sample in the batch. The timestamp is broadcast to form a 2-tensor of size [batch_size, num_neurons] which contains the timestamp at each neuron for each item in the batch (at one timestep), and stores this in t_broadcast. We calculate the in_cycle_time, which ranges between 0 and the period length for each neuron. Then, subsequently, we use that in_cycle_time to figure out if it is in the is_up_phase, is_down_phase, or just the off phase. Then, we use T.switch to apply the correct transformation for each phase.

Once the mask is generated, we simply mask the cell state with the sleep-wake cycle (plstm.py):

def step_masked(input_n, time_input_n, mask_n, cell_previous, hid_previous, *args):
    cell, hid = step(input_n, time_input_n, cell_previous, hid_previous, *args)

    # Get time gate openness
    sleep_wake_mask = calc_time_gate(time_input_n)

    # Sleep if off, otherwise stay a bit on
    cell = sleep_wake_mask*cell + (1.-sleep_wake_mask)*cell_previous
    hid = sleep_wake_mask*hid + (1.-sleep_wake_mask)*hid_previous

Implementation notes

PLSTM was originally written in Theano. There are some subtle differences between e.g., Theano and Tensorflow. Some issues worth keeping in mind are:

  • Make sure r_on can't be negative
  • Make sure the period can't be negative
  • Check to see what mod(-1, 5) is to make sure it lines up with your intuition (e.g., negative symmetric or cyclical)
  • Think about whether or not you want to abs the phase shift

Also note that this doesn't take advantage of any sparse BLAS code. The latest TensorFlow code has some good CuSPARSE support, and the gemvi sparse instructions are great for computing the dense_matrix x sparse vector operations we need for Phased LSTM, and should absolutely offer speedups at the sparsity levels that are shown here. But, as far as I know, no one has yet publicly implemented this.

Default parameters

Generally, for "standard" tasks, you have an input of several hundred to a couple thousand steps and your neurons tend to be overcomplete. For this situation, the default parameters given here are pretty good:

  • Period drawn from np.exp(np.random.uniform(1, 6)), i.e., (2.71, 403) timesteps per cycle, where the chance of getting a period between 5 and 10 is the same as getting a period between 50 and 100.
  • An on ratio of around 5%; sometimes, for hard problems, you'll need to either turn on learning for this parameter, which gradually expands r_on towards 100% (because why not? The neuron can do better if it is on more often. However, an interesting avenue of research is adding an L2 cost to this, which is equivalent to having SGD find an accurate solution while minimizing compute cost). Alternatively, you can fix it at 10%, which generally seems like another good number so far.
  • A phase shift drawn from all possible phase shifts. If you don't cover all phase shifts, or don't have enough neurons, you'll have "holes" in time where no neurons are paying attention.
  • The "timestamp" for a standard input is the integer time index, ranging from 0 to num_timesteps.

Other Tasks

Other tasks are coming soon, when I can clean them up.

Citation

Please use this citation, if the code or paper was useful in your work:

@inproceedings{neil2016phased,
  title={Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences},
  author={Neil, Daniel and Pfeiffer, Michael and Liu, Shih-Chii},
  booktitle={Advances In Neural Information Processing Systems},
  pages={3882--3890},
  year={2016}
}

Installation

Requires Lasagne and Theano. Other versions will be linked as the industrious community of brilliant ML programmers ports the implementation...

Reach out to me!

If you have any questions about this, please reach out to me at: [email protected]

public_plstm's People

Contributors

dannyneil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

public_plstm's Issues

Information: Theano no longer supports downsample used by the latest Lasagne 0.1

Hey Danny,

Was trying to run your script a_freq_task.sh and hit the following error and found out that downsample is no longer supported by Theano which is still called by the latest Lasagne verion 0.1. However, you can solve the problem by installing the following package. Just FYI.
pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip

Reference
aigamedev/scikit-neuralnetwork#235 (comment)

Error

(plstm) nicole@polarsnow:~/git/public_plstm(master)$ a_freq_task.sh 
Traceback (most recent call last):
  File "freq_task.py", line 3, in <module>
    import lasagne
  File "/home/nicole/anaconda3/envs/plstm/lib/python2.7/site-packages/lasagne/__init__.py", line 19, in <module>
    from . import layers
  File "/home/nicole/anaconda3/envs/plstm/lib/python2.7/site-packages/lasagne/layers/__init__.py", line 7, in <module>
    from .pool import *
  File "/home/nicole/anaconda3/envs/plstm/lib/python2.7/site-packages/lasagne/layers/pool.py", line 6, in <module>
    from theano.tensor.signal import downsample
ImportError: cannot import name downsample

Question: How to use Phased LSTM for regression data?

Hey Danny,

I could send you a mail directly but I guess it's better to keep track here so that other people can have a look.

In the Phased LSTM paper, you discussed mostly about classification problems.

Do you have any ideas how Phased LSTM could be used for a regression problem with asynchronous data?

Let's say, I have a sensor that sends data asynchronously. I would like to be able to predict the next data point. But its value depends on the time of arrival. It's not the same if it comes 1 second after or 10 minutes after.

When it's synchronous data, it's quite easy because you can assume that data points are spanned every minute. In this case, it would just be forecasting at t+1. But when it comes to asynchronous data, I guess it's more tricky as the next point could come 15 seconds or 25 seconds later. So if we give the next data point to predict, the model will not have the information of when this data point actually arrived.

Toy example is:

data_point_1 {value = 0.02, timestamp = 0010}
data_point_2 {value = 0.04, timestamp = 0023}
data_point_3 {value = 0.01, timestamp = 0035}
data_point_4 {value = -0.02, timestamp = 0060}
data_point_5 {value = 0.04, timestamp = 0076}
data_point_6 {value = 0.09, timestamp = 0078}
data_point_7 {value = 0.03, timestamp = 0090}
data_point_8 {value = 0.01, timestamp = 00101}
data_point_9 {value = 0.02, timestamp = 00102}

Let's predict:
data_point_10 {value = 0.05, timestamp = 00106}

We can give data_point_1 up to data_point_9 to the network, along with their timestamps, as inputs. The network can figure out the frequencies and phases of the signals (strength of Phased LSTM!).

But how do we give the target? If we just give data_point_10.value, 0.05, it does not mean much since the timestamp is omitted. I guess we want to give the timestamp too.

For inference we would just query the model with {data_point_1 to data_point_9} and data_point_10.query_timestamp = 00106 (or possibly data_point_9.timestamp + forecasting_time_ahead in the general case), and hope to match data_point_10.value.

Am I correct? How could I improve my thinking?

Thanks!

Sorry :)

Hey Daniel,

I wrote to your email address, but you stopped responding to me after my question about derivatives

Several months later I understand such a question might have been impolite, so I might have ended up banned.
I am sorry for that, I was just 23 :)

A few months ago I've successfully implemented Phased LSTM, and it works great with no bugs

Thanks once again for inventing it, and again sorry if I appeared tactless

Can we use PLSTM for a prediction task on given unevenly spaced time-series data?

I am trying to understand the intuition behind your paper and I might be wrong since I am not so much experienced in the area. Am I wrong to conclude that I can DIRECTLY use PLSTM for a prediction task on unevenly spaced time-series data where the event doesn't happen at regular intervals of time but happens at irregular time steps.

If yes, as shown in your example of N-MNIST, I can use these irregular time steps to be fed to this PLSTM right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.