Giter Site home page Giter Site logo

dasguptar / treelstm.pytorch Goto Github PK

View Code? Open in Web Editor NEW
550.0 550.0 140.0 51 KB

Tree LSTM implementation in PyTorch

License: MIT License

Python 68.29% Shell 2.74% Java 27.57% Dockerfile 1.40%
deep-learning deeplearning machine-learning machinelearning pytorch recursive-neural-networks tree-lstm treelstm

treelstm.pytorch's People

Contributors

dasguptar avatar huangshenno1 avatar jizg avatar soumith avatar vinhdv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

treelstm.pytorch's Issues

Does current TreeLSTM support batch size?

It seems batch size is still not supported from the code? In the forward function of ChildSumTreeLSTM, it seems that it only support process a single tree in one forward.

`

 def forward(self, tree, inputs):
    for idx in range(tree.num_children):
        self.forward(tree.children[idx], inputs)

    if tree.num_children == 0:
        child_c = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
        child_h = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
    else:
        child_c, child_h = zip(* map(lambda x: x.state, tree.children))
        child_c, child_h = torch.cat(child_c, dim=0), torch.cat(child_h, dim=0)

    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
    return tree.state

`

Can the

I run the sentiment model successfully. My gpus are double 1080ti, and get a 14% in gpu0. Is there an extra way to run it on multigpu? I implement a model in tensorflow fold, but it seems that it can't support multigpu.

can not find packages

lib\CollapseUnaryTransformer.java:3: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.Label;
^
lib\CollapseUnaryTransformer.java:4: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Tree;
^
lib\CollapseUnaryTransformer.java:5: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TreeTransformer;
^
lib\CollapseUnaryTransformer.java:6: 错误: 程序包edu.stanford.nlp.util不存在
import edu.stanford.nlp.util.Generics;

...
What can I do ??

How can I get the parsing in the same format for sentences in German

Hi,

I am trying to use this model for parse sentences in German with the dependency parser that is used in this code.

So, the DependencyParse.java file has the following lines:

public static final String TAGGER_MODEL = "standford-tagger/models/english-left3words-distsim.tagger"; public static final String PARSER_MODEL = "edu/standford/nlp/models/nndep/english_SD.gz";

Is it enough to change those lines in order to specify a German tagger and parser?

Thanks in advance for any help,

Why zero out embeddings for special words if they are absent in vocab

Hi,

I noticed that in main.py, you zero out the embeddings for special words if they are absent in vocabulary:

# zero out the embeddings for padding and other special words if they are absent in vocab
for idx, item in enumerate([Constants.PAD_WORD, Constants.UNK_WORD, Constants.BOS_WORD, Constants.EOS_WORD]):
    emb[idx].zero_()

Is there any reason for doing so? Why not using random normal vectors?

Thanks.

Two differences from the original implementation

I got the same result as you, ~0.846 Pearson score. After checking the original implementation, I found two differences.

  • In your trainer.py file,
def train(self, dataset):
        self.model.train()
        self.optimizer.zero_grad()
        loss, k = 0.0, 0
        indices = torch.randperm(len(dataset))
        for idx in tqdm(range(len(dataset)),desc='Training epoch '+str(self.epoch+1)+''):
            ltree,lsent,rtree,rsent,label = dataset[indices[idx]]
            linput, rinput = Var(lsent), Var(rsent)
            target = Var(map_label_to_target(label,dataset.num_classes))
            if self.args.cuda:
                linput, rinput = linput.cuda(), rinput.cuda()
                target = target.cuda()
            output = self.model(ltree,linput,rtree,rinput)
            err = self.criterion(output, target)
            loss += err.data[0]
            err.backward()           # <------------
            k += 1
            if k%self.args.batchsize==0:
                self.optimizer.step()
                self.optimizer.zero_grad()
        self.epoch += 1
        return loss/len(dataset)

You call .backward() for each sample in the mini-batch, and then perform one step update with self.optimizer.step(). Since the backward() function accumulate the gradients automatically, it seems you need to average both the losses and the gradients over the mini-batch. So I think the arrow line above should be changed to

(err/self.args.batchsize).backward()
  • The original implementation does not really update the embeddings. It does not include the embedding parameters into the model, and all the parameters of the model are optimized with Adagrad. It updates the embedding parameters with the gradients*learning_rate directly, but the learning_rate is set to 0.
    Furthermore, I did some simple calculations. The number of embedding parameters is more than 700000, and 286505 for the other model parameters. Consider the size of the training set is just 4500, it is too small to fine-tune the embeddings.

After I made the two above modifications, I can get 0.854 Pearson score and 0.274 MSE with Adagrad(learning_rate=0.05)

Docker image is broken!

OS: macOS Mojave
Docker Edition: Version 18.03.1-ce-mac65 (24312)
Channel: stable

I tried to build the docker image in order to run the lib without being dependent on my mac setup. The image is actually broken due to not updating the links and procedures to fetch those dependencies:

Step 10/11 : RUN ["/bin/bash", "-c", "pip install -r requirements.txt"] ---> Running in 90832d3e48fe torch-0.4.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform. You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The command '/bin/bash -c pip install -r requirements.txt' returned a non-zero code: 1

Fixed the issue by changing the docker build file:
1- Remove this line:
RUN ["/bin/bash", "-c", "pip install -r requirements.txt"]

2- Run the container in the interactive mode:
docker run -it [IMAGE-NAME]

3- Install the python-3.5, pip3 and their dependencies manually and run the main.py.

This how I ve done it to make sure I am installing the right things, however the best solution would be to do all these changes on the image building level.

Trying to understand cparents.txt in Constituency parsing

I have downloaded the SICK data and obtained the dependency and constituency parsing with the fetch_and_preprocess.sh file.

I am now trying to understand what is the information that is generated in the cparents.txt file.
This is an example:

a.txt -> Two dogs are fighting
a.cparents.txt -> 5 5 7 7 6 0 6

If I am not mistaken, from the cparents.txt I should be able to build the parse tree. Is that right? And how would the tree for this example look like?

Thanks for any help in advance

map_label_to_target should init zero tensor

Your map_label_to_target for SICK dataset init random tensor.

def map_label_to_target(label,num_classes):
    target = torch.Tensor(1,num_classes) # this is not zero tensor
    ceil = int(math.ceil(label))
    floor = int(math.floor(label))
    if ceil==floor:
        target[0][floor-1] = 1
    else:
        target[0][floor-1] = ceil - label
        target[0][ceil-1] = label - floor
    return target

However, in treelstm , the author init zero tensor

local targets = torch.zeros(batch_size, self.num_classes)
for j = 1, batch_size do
  local sim = dataset.labels[indices[i + j - 1]] * (self.num_classes - 1) + 1
  local ceil, floor = math.ceil(sim), math.floor(sim)
  if ceil == floor then
    targets[{j, floor}] = 1
  else
    targets[{j, floor}] = ceil - sim
    targets[{j, ceil}] = sim - floor
  end

ChildSumTreeLSTM : fx and fh linear layer are declare but is not used

Line 21, 22

self.fx = nn.Linear(self.in_dim,self.mem_dim)
self.fh = nn.Linear(self.mem_dim,self.mem_dim)

But it is never use

I think you intend to use in line 38, 39.
(perhaps typo ix with fx )

fx = F.torch.unsqueeze(self.ix(inputs),1)
f = F.torch.cat([self.ih(child_hi)+fx for child_hi in child_h], 0)

Error while Compiling

Ubuntu 18.04 Java 11.0.3
running (as part of fetch_and_preprocess.sh)
javac -cp $CLASSPATH lib/*.java -Xlint:unchecked

lib/CollapseUnaryTransformer.java:17: error: error while writing CollapseUnaryTransform
er: /home/eduard_ergenzinger/treelstm.pytorch/lib/CollapseUnaryTransformer.class
public class CollapseUnaryTransformer implements TreeTransformer {
       ^
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked call to PTBTokenizer(Read
er,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
      PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
                                     ^
  where T is a type-variable:
    T extends HasWord declared in class PTBTokenizer
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked conversion
      PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
                                     ^
  required: PTBTokenizer<Word>
  found:    PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked call to PTBTokenizer(Reader
,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
        PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                       ^
  where T is a type-variable:
    T extends HasWord declared in class PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked conversion
        PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                       ^
  required: PTBTokenizer<Word>
  found:    PTBTokenizer
1 error
4 warnings

How to make it with dynamic batching?

This implementation can only process one sample at a time. The performance is limited since the usage of the GPU is low. Is there possibility to make treelstm support dynamic batching such that the GPU can be fully utilized?

Why move output to cpu?

I noticed that in test function of trainer module
you write

output = output.data.squeeze().cpu()

Why you move output to cpu ?

By the way
in SimilarityTreeLSTM of model module
output = self.similarity(lstate, rstate)

Why don't use
output = self.similarity(lhidden, rhidden)

Nodes' hidden representations?

Hello, not an issue, but what's the easiest way to extract the learned hidden embeddings for each node in a ChildSum tree? New to PyTorch, so forgive my ignorance.

Thanks!

Checkpoint saving may be not appropriate.

In your code:

        if best < test_pearson:
            best = test_pearson
            checkpoint = {
                'model': trainer.model.state_dict(), 
                'optim': trainer.optimizer,
                'pearson': test_pearson, 'mse': test_mse,
                'args': args, 'epoch': epoch
                }
            logger.debug('==> New optimum found, checkpointing everything now...')
            torch.save(checkpoint, '%s.pt' % os.path.join(args.save, args.expname))

The test_pearson is used instead of dev_pearson and the test_pearson should not be used to choose your best model.
I got test result (Pearson: 0.8616 MSE: 0.2626) which had the highest dev_pearson score.

IndexError: index 54 is out of bounds for dimension 0 with size 54

tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)

len(inputs) == 54
tree.idx == 54

tree.idx = idx - 1

more informations

inputs[tree.idx] tensor([[ 3.7410e-02,  5.7619e-02,  3.3822e-01,  ..., -3.5774e-02,
         -7.8579e-02,  1.0644e-02],
        [-2.5287e-02, -2.5835e-01, -7.5715e-02,  ...,  1.2864e-01,
          1.3856e-01,  3.3581e-01],
        [-5.4430e-02, -1.6442e-01, -6.7605e-02,  ...,  1.7388e-01,
         -3.9886e-01, -1.3006e-02],
        ...,
        [-2.5433e-02, -8.0709e-02,  6.2163e-01,  ...,  2.7345e-01,
         -5.6782e-02,  1.8956e-01],
        [-2.4587e-01,  8.9087e-03, -1.5240e-03,  ..., -3.2474e-01,
          1.1630e-02, -1.3252e-01],
        [ 4.9405e-04, -3.5795e-01, -2.2226e-01,  ..., -9.1428e-02,
          2.2649e-01, -2.0806e-01]], device='cuda:0',
       grad_fn=<EmbeddingBackward>)
Traceback (most recent call last):
  File "main.py", line 185, in <module>
    main()
  File "main.py", line 155, in main
    train_loss = trainer.train(train_dataset)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/trainer.py", line 29, in train
    output = self.model(linput, rtree, rinput)
  File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 90, in forward
    rstate, rhidden = self.childsumtreelstm(rtree, rinputs)
  File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  [Previous line repeated 10 more times]
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 48, in forward
    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
IndexError: index 54 is out of bounds for dimension 0 with size 54

classpath error

my current environment:
windows 10
python 3.6
pytorch 0.4
IDE pycharm
I try to run the code preprocess-sick.py and get an error cannot find or load the class
then I try to copy the java cmd to windows cmd window there is an same error raised
error code line:

    cmd = ('java -cp %s DependencyParse -tokpath %s -parentpath %s -relpath %s %s < %s'
           % (cp, tokpath, parentpath, relpath, tokenize_flag, filepath))
    os.system(cmd)

Any plans to support consituency trees?

Constituency trees showed slightly lower performance in the paper, but there should be people wanting to use them under the belief that phrase structures are appropriate for different purposes, including myself :)

download

Why I can't access the nlp.stanford.edu, could you send me the a copy? thank you

Matrix problem

  File ".../treelstm.pytorch/model.py", line 36, in node_forward
    u = F.tanh(self.ux(inputs)+self.uh(child_h_sum))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear.apply(input, self.weight, self.bias)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 12, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at 

I think you may miss the unsqueeze operation?

        i = F.sigmoid(self.ix(inputs)+self.ih(child_h_sum.unsqueeze(0)))
        o = F.sigmoid(self.ox(inputs)+self.oh(child_h_sum.unsqueeze(0)))
        u = F.tanh(self.ux(inputs)+self.uh(child_h_sum.unsqueeze(0)))

Sizes do not match

When I run
python main.py
I met the following error message

Namespace(batchsize=25, cuda=True, data='data/sick/', epochs=15, expname='test',
glove='data/glove/', hidden_dim=50, input_dim=150, lr=0.01, mem_dim=75, num_classes=5, >optim='adagrad', save='checkpoints/', seed=123, sparse=False, wd=0.0001)
==> SICK vocabulary size : 2412
==> Size of train data : 4500
==> Size of dev data : 500
==> Size of test data : 4927
Traceback (most recent call last):
File "main.py", line 157, in
main()
File "main.py", line 126, in main
model.childsumtreelstm.emb.state_dict()['weight'].copy_(emb)

RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/THCTensorCopy.cu:31

The platform is Arch Linux and CUDA8.0

I would appreciate it for any reply.

how to run it in GPU???

i run pip install -r req....

but ,couldnt run it with python main.py --cuda

with the trace back:
AssertionError: Torch not compiled with CUDA enabled

lr is different with the original paper

I noticed that lr in code is 0.01 while paper is 0.05 with adagrad, and I tried with 0.05 to train the model but train loss doesn't decrease at all, may be due to the high lr ?

Why you set lr to 0.01 ? And since lr is different, may be there is a bug in code ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.