Giter Site home page Giter Site logo

rnn-seq2seq-learning's Introduction

The repository contains the source code, data, and some of the experimental notebooks for the paper accepted at the 16 ICGI, titled Learning Transductions and Alignments with RNN Seq2seq Models. Through unified training/evaluation conditions and comprehensive experiments, I compared the learning results across tasks, different model configurations, and generalization performance etc., and highlighted factors that influence the learning capabilities and generalization capacity of RNN seq2seq models.

For more details or everything related to the experiments, including data, training logs, trained models, and experimental results, please check this project folder On Google Drive or this OSF project, which contains a complete copy of everything.

Transduction Tasks

For a given string, the four transduction tasks can be defined as follows:

  • Identity: the given string itself. Ex: abc ===> abc
  • Reversal: the reverse of the given string. Ex: abc ===> cba
  • Total reduplication: two copies of the given string. Ex: abc ===> abcabc
  • Quadratic copying: make n copies of the given string of length n. Ex: abc ===> abcabcabc

These four functions have been traditionally studied under the viewpoint of Finite State Tranducers (FSTs) and characterized accordingly. The FST-theoretic characterizations propose the following complexity hierarchy for these tasks: quadratic copying (polyregular function) > total reduplication (regular) > reversal (regular) > identity (rational).

Learning Input-target Alignments

RNN seq2seq models take an encoder-decoder structure, where the decoder only “writes” after the encoder “reads” all the input symbols, unlike the read-and-write operation seen in FSTs.

The following figure shows the conjectured mechanism for RNN seq2seq models learning identity and reversal functions. Other two functions can be learnt in a similar manner, and input specified reduplication additionally requires counting.

Basic Findings

  • Generalization abilities: RNN seq2seq models, attentional or not, are prone to learning a function that fits the training or in-distribution data. Their out-of-distribution generalization abilities are highly limited. In other words, they are not learning the underlying data generation functions, probably because of the inherent limitation of auto-regressive models.

    • Please note that, task complexity is strongly tied to the structure of the learner. For example, RNNs of few hundred parameters can easily learn the function of identity whereas RNNs seq2seq models cannot. See: RNNs-learn-identity.
  • Attention: makes learning alignment between input and target sequences significantly more efficient and robust, but does not overcome the out-of-distribution generalization limitation.

  • Task complexity: for attention-less RNN seq2seq models, it is empirically found that quadratic copying > total reduplication > identity > reversal. The relative complexity between identity and reversal is due to the long-term dependency learning issue that comes with RNNs trained with gradient descent and backpropagation.

  • RNN variants: The effect of RNN variants on the seq2seq models is a complicated one and interacts with other factors, e.g., attention and the task to learn. Generally, GRU and LSTM are generally more expressive than SRNN. SRNN cannot count.

The following figure shows full-sequence accuracy (on unseen examples) per input length across the four tasks for the three types of RNN seq2seq models, where only input lengths 6-15 are seen during training.

Reproduce the results

To reproduce the results, simply download everything in the project folder, upload the folder to your Google Drive, and re-run the notebook in notebooks in a GPU runtime. Alternatively, you can also save the project folder to your Google Drive by clicking the "Add shortcut to Drive" button. In doing so, you should be able to run any of these notebooks successfully, but the results will not be saved on your Google Drive (unless you download the entire folder and upload it to your Google Drive).

It is recommened that you subscribe to Google Pro+ in order to reproduce the results.

Citation

@misc{zw2023rnn-seq2seq,
  doi = {10.48550/ARXIV.2303.06841},
  url = {https://arxiv.org/abs/2303.06841},
  author = {Wang, Zhengxiang},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Learning Transductions and Alignments with RNN Seq2seq Models},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution 4.0 International}
}

Reuse the code

  • The project folder provides a very neat but customizable pipeline to conduct experiments using RNN seq2seq models. Just make sure that your data is also saved in an identical format as the one provided inside the data folder.

  • If you prefer highly customized codebase to run experiments on command line, consider the following two repositories of mine:

rnn-seq2seq-learning's People

Contributors

jaaack-wang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.