dasguptar / treelstm.pytorch Goto Github PK

View Code? Open in Web Editor NEW

550.0 550.0 140.0 51 KB

Tree LSTM implementation in PyTorch

License: MIT License

Python 68.29% Shell 2.74% Java 27.57% Dockerfile 1.40%

deep-learning deeplearning machine-learning machinelearning pytorch recursive-neural-networks tree-lstm treelstm

treelstm.pytorch's People

Contributors

Stargazers

Watchers

Forkers

peratham benjamesbabala jackie840129 tsingcoo ttpro1995 inilien gaphex royshan kurtespinosa ydjbuaa jaimecamacaro stevenlol joelxiangnanchen ml-ai-nlp-ir ruizhang2016 resurgo-genetics nininininini ryh95 pcgreat soumith weili-nlp aarshpatel kevinlemon ml-lab codeaudit tsdaemon qwinpin vikingmew shrjain lizhao12 jcjs raghavendranpm pierre-wong sentimentanalysis-eric gzliuyun lajd viveksck shubhampachori12110095 houcy syx528911137 murari023 liangsli yucoian codelibra kheraaniruddh wayneouyang songfgh yingwang-clare jizg afcarl nicemartin shiweiba tungk wang802xy xnsam wenzhu888 dgopstein dreamdrink ai3dvision kopfyf chiuyeelau eycab misoknisky abagaria shakedbr lirongnisl sumeetkr hit-joseph xiaojino jinfengr gym0569 frankey419 madblackcat zhaoyizhaoyi tikitaka-ball bonshillings adoskk guareniszhen spmanxut 365318663 easonfzw formulaone9275 xiaoqis0821 shiquanyang fanglanting ai-coder smilenrhyme liuxiaojiao shadowkun tirthikas wanglilin xiahaimeng preesee futong liao1234566 nhamlv-55 jhnlp rhtrht sudipta90 amyxie361

treelstm.pytorch's Issues

Does current TreeLSTM support batch size?

It seems batch size is still not supported from the code? In the forward function of ChildSumTreeLSTM, it seems that it only support process a single tree in one forward.

 def forward(self, tree, inputs):
    for idx in range(tree.num_children):
        self.forward(tree.children[idx], inputs)

    if tree.num_children == 0:
        child_c = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
        child_h = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
    else:
        child_c, child_h = zip(* map(lambda x: x.state, tree.children))
        child_c, child_h = torch.cat(child_c, dim=0), torch.cat(child_h, dim=0)

    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
    return tree.state

Can the

I run the sentiment model successfully. My gpus are double 1080ti, and get a 14% in gpu0. Is there an extra way to run it on multigpu? I implement a model in tensorflow fold, but it seems that it can't support multigpu.

can not find packages

lib\CollapseUnaryTransformer.java:3: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.Label;
^
lib\CollapseUnaryTransformer.java:4: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Tree;
^
lib\CollapseUnaryTransformer.java:5: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TreeTransformer;
^
lib\CollapseUnaryTransformer.java:6: 错误: 程序包edu.stanford.nlp.util不存在
import edu.stanford.nlp.util.Generics;

...
What can I do ??

lost the 'lib' directory from original version of tree-lstm, lead to parsing errors

just copy the 'lib' dir from https://github.com/stanfordnlp/treelstm to this code main dir.

How can I get the parsing in the same format for sentences in German

Hi,

I am trying to use this model for parse sentences in German with the dependency parser that is used in this code.

So, the DependencyParse.java file has the following lines:

public static final String TAGGER_MODEL = "standford-tagger/models/english-left3words-distsim.tagger"; public static final String PARSER_MODEL = "edu/standford/nlp/models/nndep/english_SD.gz";

Is it enough to change those lines in order to specify a German tagger and parser?

Thanks in advance for any help,

Why zero out embeddings for special words if they are absent in vocab

Hi,

I noticed that in main.py, you zero out the embeddings for special words if they are absent in vocabulary:

# zero out the embeddings for padding and other special words if they are absent in vocab
for idx, item in enumerate([Constants.PAD_WORD, Constants.UNK_WORD, Constants.BOS_WORD, Constants.EOS_WORD]):
    emb[idx].zero_()

Is there any reason for doing so? Why not using random normal vectors?

Thanks.

Two differences from the original implementation

I got the same result as you, ~0.846 Pearson score. After checking the original implementation, I found two differences.

In your trainer.py file,

def train(self, dataset):
        self.model.train()
        self.optimizer.zero_grad()
        loss, k = 0.0, 0
        indices = torch.randperm(len(dataset))
        for idx in tqdm(range(len(dataset)),desc='Training epoch '+str(self.epoch+1)+''):
            ltree,lsent,rtree,rsent,label = dataset[indices[idx]]
            linput, rinput = Var(lsent), Var(rsent)
            target = Var(map_label_to_target(label,dataset.num_classes))
            if self.args.cuda:
                linput, rinput = linput.cuda(), rinput.cuda()
                target = target.cuda()
            output = self.model(ltree,linput,rtree,rinput)
            err = self.criterion(output, target)
            loss += err.data[0]
            err.backward()           # <------------
            k += 1
            if k%self.args.batchsize==0:
                self.optimizer.step()
                self.optimizer.zero_grad()
        self.epoch += 1
        return loss/len(dataset)

You call .backward() for each sample in the mini-batch, and then perform one step update with self.optimizer.step(). Since the backward() function accumulate the gradients automatically, it seems you need to average both the losses and the gradients over the mini-batch. So I think the arrow line above should be changed to

(err/self.args.batchsize).backward()

The original implementation does not really update the embeddings. It does not include the embedding parameters into the model, and all the parameters of the model are optimized with Adagrad. It updates the embedding parameters with the gradients*learning_rate directly, but the learning_rate is set to 0.
Furthermore, I did some simple calculations. The number of embedding parameters is more than 700000, and 286505 for the other model parameters. Consider the size of the training set is just 4500, it is too small to fine-tune the embeddings.

After I made the two above modifications, I can get 0.854 Pearson score and 0.274 MSE with Adagrad(learning_rate=0.05)

Docker image is broken!

OS: macOS Mojave
Docker Edition: Version 18.03.1-ce-mac65 (24312)
Channel: stable

I tried to build the docker image in order to run the lib without being dependent on my mac setup. The image is actually broken due to not updating the links and procedures to fetch those dependencies:

Step 10/11 : RUN ["/bin/bash", "-c", "pip install -r requirements.txt"] ---> Running in 90832d3e48fe torch-0.4.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform. You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The command '/bin/bash -c pip install -r requirements.txt' returned a non-zero code: 1

Fixed the issue by changing the docker build file:
1- Remove this line:
RUN ["/bin/bash", "-c", "pip install -r requirements.txt"]

2- Run the container in the interactive mode:
docker run -it [IMAGE-NAME]

3- Install the python-3.5, pip3 and their dependencies manually and run the main.py.

This how I ve done it to make sure I am installing the right things, however the best solution would be to do all these changes on the image building level.

Trying to understand cparents.txt in Constituency parsing

I have downloaded the SICK data and obtained the dependency and constituency parsing with the fetch_and_preprocess.sh file.

I am now trying to understand what is the information that is generated in the cparents.txt file.
This is an example:

a.txt -> Two dogs are fighting
a.cparents.txt -> 5 5 7 7 6 0 6

If I am not mistaken, from the cparents.txt I should be able to build the parse tree. Is that right? And how would the tree for this example look like?

Thanks for any help in advance

why parent = parents[idx -1]?

treelstm.pytorch/treelstm/dataset.py

Line 63 in 228a314

parent = parents[idx - 1]

map_label_to_target should init zero tensor

Your map_label_to_target for SICK dataset init random tensor.

def map_label_to_target(label,num_classes):
    target = torch.Tensor(1,num_classes) # this is not zero tensor
    ceil = int(math.ceil(label))
    floor = int(math.floor(label))
    if ceil==floor:
        target[0][floor-1] = 1
    else:
        target[0][floor-1] = ceil - label
        target[0][ceil-1] = label - floor
    return target

However, in treelstm , the author init zero tensor

local targets = torch.zeros(batch_size, self.num_classes)
for j = 1, batch_size do
  local sim = dataset.labels[indices[i + j - 1]] * (self.num_classes - 1) + 1
  local ceil, floor = math.ceil(sim), math.floor(sim)
  if ceil == floor then
    targets[{j, floor}] = 1
  else
    targets[{j, floor}] = ceil - sim
    targets[{j, ceil}] = sim - floor
  end

ChildSumTreeLSTM : fx and fh linear layer are declare but is not used

Line 21, 22

self.fx = nn.Linear(self.in_dim,self.mem_dim)
self.fh = nn.Linear(self.mem_dim,self.mem_dim)

But it is never use

I think you intend to use in line 38, 39.
(perhaps typo ix with fx )

fx = F.torch.unsqueeze(self.ix(inputs),1)
f = F.torch.cat([self.ih(child_hi)+fx for child_hi in child_h], 0)

Error while Compiling

Ubuntu 18.04 Java 11.0.3
running (as part of fetch_and_preprocess.sh)
javac -cp $CLASSPATH lib/*.java -Xlint:unchecked

lib/CollapseUnaryTransformer.java:17: error: error while writing CollapseUnaryTransform
er: /home/eduard_ergenzinger/treelstm.pytorch/lib/CollapseUnaryTransformer.class
public class CollapseUnaryTransformer implements TreeTransformer {
       ^
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked call to PTBTokenizer(Read
er,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
      PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
                                     ^
  where T is a type-variable:
    T extends HasWord declared in class PTBTokenizer
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked conversion
      PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
                                     ^
  required: PTBTokenizer<Word>
  found:    PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked call to PTBTokenizer(Reader
,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
        PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                       ^
  where T is a type-variable:
    T extends HasWord declared in class PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked conversion
        PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                       ^
  required: PTBTokenizer<Word>
  found:    PTBTokenizer
1 error
4 warnings

How to make it with dynamic batching?

This implementation can only process one sample at a time. The performance is limited since the usage of the GPU is low. Is there possibility to make treelstm support dynamic batching such that the GPU can be fully utilized?

Why move output to cpu?

I noticed that in test function of trainer module
you write

output = output.data.squeeze().cpu()

Why you move output to cpu ?

By the way
in SimilarityTreeLSTM of model module
output = self.similarity(lstate, rstate)

Why don't use
output = self.similarity(lhidden, rhidden)

Nodes' hidden representations?

Hello, not an issue, but what's the easiest way to extract the learned hidden embeddings for each node in a ChildSum tree? New to PyTorch, so forgive my ignorance.

Thanks!

Checkpoint saving may be not appropriate.

In your code:

        if best < test_pearson:
            best = test_pearson
            checkpoint = {
                'model': trainer.model.state_dict(), 
                'optim': trainer.optimizer,
                'pearson': test_pearson, 'mse': test_mse,
                'args': args, 'epoch': epoch
                }
            logger.debug('==> New optimum found, checkpointing everything now...')
            torch.save(checkpoint, '%s.pt' % os.path.join(args.save, args.expname))

The test_pearson is used instead of dev_pearson and the test_pearson should not be used to choose your best model.
I got test result (Pearson: 0.8616 MSE: 0.2626) which had the highest dev_pearson score.

compare with the code in the original paper

The difference might be because of differences in the way the word embeddings are updated.
Can you make it more specific?

IndexError: index 54 is out of bounds for dimension 0 with size 54

tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)

len(inputs) == 54
tree.idx == 54

treelstm.pytorch/treelstm/dataset.py

Line 70 in 228a314

tree.idx = idx - 1

more informations

inputs[tree.idx] tensor([[ 3.7410e-02,  5.7619e-02,  3.3822e-01,  ..., -3.5774e-02,
         -7.8579e-02,  1.0644e-02],
        [-2.5287e-02, -2.5835e-01, -7.5715e-02,  ...,  1.2864e-01,
          1.3856e-01,  3.3581e-01],
        [-5.4430e-02, -1.6442e-01, -6.7605e-02,  ...,  1.7388e-01,
         -3.9886e-01, -1.3006e-02],
        ...,
        [-2.5433e-02, -8.0709e-02,  6.2163e-01,  ...,  2.7345e-01,
         -5.6782e-02,  1.8956e-01],
        [-2.4587e-01,  8.9087e-03, -1.5240e-03,  ..., -3.2474e-01,
          1.1630e-02, -1.3252e-01],
        [ 4.9405e-04, -3.5795e-01, -2.2226e-01,  ..., -9.1428e-02,
          2.2649e-01, -2.0806e-01]], device='cuda:0',
       grad_fn=<EmbeddingBackward>)
Traceback (most recent call last):
  File "main.py", line 185, in <module>
    main()
  File "main.py", line 155, in main
    train_loss = trainer.train(train_dataset)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/trainer.py", line 29, in train
    output = self.model(linput, rtree, rinput)
  File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 90, in forward
    rstate, rhidden = self.childsumtreelstm(rtree, rinputs)
  File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  [Previous line repeated 10 more times]
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 48, in forward
    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
IndexError: index 54 is out of bounds for dimension 0 with size 54

classpath error

my current environment:
windows 10
python 3.6
pytorch 0.4
IDE pycharm
I try to run the code preprocess-sick.py and get an error cannot find or load the class
then I try to copy the java cmd to windows cmd window there is an same error raised
error code line:

    cmd = ('java -cp %s DependencyParse -tokpath %s -parentpath %s -relpath %s %s < %s'
           % (cp, tokpath, parentpath, relpath, tokenize_flag, filepath))
    os.system(cmd)

Any plans to support consituency trees?

Constituency trees showed slightly lower performance in the paper, but there should be people wanting to use them under the belief that phrase structures are appropriate for different purposes, including myself :)

download

Why I can't access the nlp.stanford.edu, could you send me the a copy? thank you

Matrix problem

  File ".../treelstm.pytorch/model.py", line 36, in node_forward
    u = F.tanh(self.ux(inputs)+self.uh(child_h_sum))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear.apply(input, self.weight, self.bias)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 12, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at

I think you may miss the unsqueeze operation?

        i = F.sigmoid(self.ix(inputs)+self.ih(child_h_sum.unsqueeze(0)))
        o = F.sigmoid(self.ox(inputs)+self.oh(child_h_sum.unsqueeze(0)))
        u = F.tanh(self.ux(inputs)+self.uh(child_h_sum.unsqueeze(0)))

Sizes do not match

When I run
python main.py
I met the following error message

Namespace(batchsize=25, cuda=True, data='data/sick/', epochs=15, expname='test',
glove='data/glove/', hidden_dim=50, input_dim=150, lr=0.01, mem_dim=75, num_classes=5, >optim='adagrad', save='checkpoints/', seed=123, sparse=False, wd=0.0001)
==> SICK vocabulary size : 2412
==> Size of train data : 4500
==> Size of dev data : 500
==> Size of test data : 4927
Traceback (most recent call last):
File "main.py", line 157, in
main()
File "main.py", line 126, in main
model.childsumtreelstm.emb.state_dict()['weight'].copy_(emb)

RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/THCTensorCopy.cu:31

The platform is Arch Linux and CUDA8.0

I would appreciate it for any reply.

Why consider cell state vectors instead of hidden state vectors

Hi, I was going through another implementation of tree LSTM from https://github.com/ttpro1995/TreeLSTMSentiment where for sentiment classification, they consider the hidden state values instead of the cell state values. The original paper also used the notation "h" rather than "c" while doing the vector multiplication and subtraction in section 4.2. My question is why did you went for "c" instead of "h" and why not both?

Upgrade to pytorch 0.2

When is the upgrade to pytorch 0.2 planned? (#9) Thanks @dasguptar for the amazing work done

how to run it in GPU？？？

i run pip install -r req....

but ,couldnt run it with python main.py --cuda

with the trace back:
AssertionError: Torch not compiled with CUDA enabled

No such file or directory: data/sick/train/a.toks

Hi:
I need help!
IOError: [Errno 2] No such file or directory: 'data/sick/train/a.toks'
run python main.py with default paras, then throw this error.

Where this file a.toks comes from?

lr is different with the original paper

I noticed that lr in code is 0.01 while paper is 0.05 with adagrad, and I tried with 0.05 to train the model but train loss doesn't decrease at all, may be due to the high lr ?

Why you set lr to 0.01 ? And since lr is different, may be there is a bug in code ?