dasguptar / treelstm.pytorch Goto Github PK
View Code? Open in Web Editor NEWTree LSTM implementation in PyTorch
License: MIT License
Tree LSTM implementation in PyTorch
License: MIT License
It seems batch size is still not supported from the code? In the forward function of ChildSumTreeLSTM, it seems that it only support process a single tree in one forward.
`
def forward(self, tree, inputs):
for idx in range(tree.num_children):
self.forward(tree.children[idx], inputs)
if tree.num_children == 0:
child_c = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
child_h = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
else:
child_c, child_h = zip(* map(lambda x: x.state, tree.children))
child_c, child_h = torch.cat(child_c, dim=0), torch.cat(child_h, dim=0)
tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
return tree.state
`
I run the sentiment model successfully. My gpus are double 1080ti, and get a 14% in gpu0. Is there an extra way to run it on multigpu? I implement a model in tensorflow fold, but it seems that it can't support multigpu.
lib\CollapseUnaryTransformer.java:3: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.Label;
^
lib\CollapseUnaryTransformer.java:4: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Tree;
^
lib\CollapseUnaryTransformer.java:5: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TreeTransformer;
^
lib\CollapseUnaryTransformer.java:6: 错误: 程序包edu.stanford.nlp.util不存在
import edu.stanford.nlp.util.Generics;
...
What can I do ??
just copy the 'lib' dir from https://github.com/stanfordnlp/treelstm to this code main dir.
Hi,
I am trying to use this model for parse sentences in German with the dependency parser that is used in this code.
So, the DependencyParse.java file has the following lines:
public static final String TAGGER_MODEL = "standford-tagger/models/english-left3words-distsim.tagger"; public static final String PARSER_MODEL = "edu/standford/nlp/models/nndep/english_SD.gz
";
Is it enough to change those lines in order to specify a German tagger and parser?
Thanks in advance for any help,
Hi,
I noticed that in main.py
, you zero out the embeddings for special words if they are absent in vocabulary:
# zero out the embeddings for padding and other special words if they are absent in vocab
for idx, item in enumerate([Constants.PAD_WORD, Constants.UNK_WORD, Constants.BOS_WORD, Constants.EOS_WORD]):
emb[idx].zero_()
Is there any reason for doing so? Why not using random normal vectors?
Thanks.
I got the same result as you, ~0.846
Pearson score. After checking the original implementation, I found two differences.
def train(self, dataset):
self.model.train()
self.optimizer.zero_grad()
loss, k = 0.0, 0
indices = torch.randperm(len(dataset))
for idx in tqdm(range(len(dataset)),desc='Training epoch '+str(self.epoch+1)+''):
ltree,lsent,rtree,rsent,label = dataset[indices[idx]]
linput, rinput = Var(lsent), Var(rsent)
target = Var(map_label_to_target(label,dataset.num_classes))
if self.args.cuda:
linput, rinput = linput.cuda(), rinput.cuda()
target = target.cuda()
output = self.model(ltree,linput,rtree,rinput)
err = self.criterion(output, target)
loss += err.data[0]
err.backward() # <------------
k += 1
if k%self.args.batchsize==0:
self.optimizer.step()
self.optimizer.zero_grad()
self.epoch += 1
return loss/len(dataset)
You call .backward() for each sample in the mini-batch, and then perform one step update with self.optimizer.step(). Since the backward() function accumulate the gradients automatically, it seems you need to average both the losses and the gradients over the mini-batch. So I think the arrow line above should be changed to
(err/self.args.batchsize).backward()
0
.700000
, and 286505
for the other model parameters. Consider the size of the training set is just 4500
, it is too small to fine-tune the embeddings.After I made the two above modifications, I can get 0.854
Pearson score and 0.274
MSE with Adagrad(learning_rate=0.05
)
OS: macOS Mojave
Docker Edition: Version 18.03.1-ce-mac65 (24312)
Channel: stable
I tried to build the docker image in order to run the lib without being dependent on my mac setup. The image is actually broken due to not updating the links and procedures to fetch those dependencies:
Step 10/11 : RUN ["/bin/bash", "-c", "pip install -r requirements.txt"] ---> Running in 90832d3e48fe torch-0.4.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform. You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The command '/bin/bash -c pip install -r requirements.txt' returned a non-zero code: 1
Fixed the issue by changing the docker build file:
1- Remove this line:
RUN ["/bin/bash", "-c", "pip install -r requirements.txt"]
2- Run the container in the interactive mode:
docker run -it [IMAGE-NAME]
3- Install the python-3.5, pip3 and their dependencies manually and run the main.py.
This how I ve done it to make sure I am installing the right things, however the best solution would be to do all these changes on the image building level.
I have downloaded the SICK data and obtained the dependency and constituency parsing with the fetch_and_preprocess.sh file.
I am now trying to understand what is the information that is generated in the cparents.txt file.
This is an example:
a.txt -> Two dogs are fighting
a.cparents.txt -> 5 5 7 7 6 0 6
If I am not mistaken, from the cparents.txt I should be able to build the parse tree. Is that right? And how would the tree for this example look like?
Thanks for any help in advance
treelstm.pytorch/treelstm/dataset.py
Line 63 in 228a314
Your map_label_to_target for SICK dataset init random tensor.
def map_label_to_target(label,num_classes):
target = torch.Tensor(1,num_classes) # this is not zero tensor
ceil = int(math.ceil(label))
floor = int(math.floor(label))
if ceil==floor:
target[0][floor-1] = 1
else:
target[0][floor-1] = ceil - label
target[0][ceil-1] = label - floor
return target
However, in treelstm , the author init zero tensor
local targets = torch.zeros(batch_size, self.num_classes)
for j = 1, batch_size do
local sim = dataset.labels[indices[i + j - 1]] * (self.num_classes - 1) + 1
local ceil, floor = math.ceil(sim), math.floor(sim)
if ceil == floor then
targets[{j, floor}] = 1
else
targets[{j, floor}] = ceil - sim
targets[{j, ceil}] = sim - floor
end
Line 21, 22
self.fx = nn.Linear(self.in_dim,self.mem_dim)
self.fh = nn.Linear(self.mem_dim,self.mem_dim)
But it is never use
I think you intend to use in line 38, 39.
(perhaps typo ix with fx )
fx = F.torch.unsqueeze(self.ix(inputs),1)
f = F.torch.cat([self.ih(child_hi)+fx for child_hi in child_h], 0)
Ubuntu 18.04 Java 11.0.3
running (as part of fetch_and_preprocess.sh)
javac -cp $CLASSPATH lib/*.java -Xlint:unchecked
lib/CollapseUnaryTransformer.java:17: error: error while writing CollapseUnaryTransform
er: /home/eduard_ergenzinger/treelstm.pytorch/lib/CollapseUnaryTransformer.class
public class CollapseUnaryTransformer implements TreeTransformer {
^
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked call to PTBTokenizer(Read
er,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
^
where T is a type-variable:
T extends HasWord declared in class PTBTokenizer
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked conversion
PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
^
required: PTBTokenizer<Word>
found: PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked call to PTBTokenizer(Reader
,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
PTBTokenizer<Word> tokenizer = new PTBTokenizer(
^
where T is a type-variable:
T extends HasWord declared in class PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked conversion
PTBTokenizer<Word> tokenizer = new PTBTokenizer(
^
required: PTBTokenizer<Word>
found: PTBTokenizer
1 error
4 warnings
This implementation can only process one sample at a time. The performance is limited since the usage of the GPU is low. Is there possibility to make treelstm support dynamic batching such that the GPU can be fully utilized?
I noticed that in test function of trainer module
you write
output = output.data.squeeze().cpu()
Why you move output
to cpu ?
By the way
in SimilarityTreeLSTM of model module
output = self.similarity(lstate, rstate)
Why don't use
output = self.similarity(lhidden, rhidden)
Hello, not an issue, but what's the easiest way to extract the learned hidden embeddings for each node in a ChildSum tree? New to PyTorch, so forgive my ignorance.
Thanks!
In your code:
if best < test_pearson:
best = test_pearson
checkpoint = {
'model': trainer.model.state_dict(),
'optim': trainer.optimizer,
'pearson': test_pearson, 'mse': test_mse,
'args': args, 'epoch': epoch
}
logger.debug('==> New optimum found, checkpointing everything now...')
torch.save(checkpoint, '%s.pt' % os.path.join(args.save, args.expname))
The test_pearson is used instead of dev_pearson and the test_pearson should not be used to choose your best model.
I got test result (Pearson: 0.8616 MSE: 0.2626) which had the highest dev_pearson score.
The difference might be because of differences in the way the word embeddings are updated.
Can you make it more specific?
tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
len(inputs) == 54
tree.idx == 54
treelstm.pytorch/treelstm/dataset.py
Line 70 in 228a314
more informations
inputs[tree.idx] tensor([[ 3.7410e-02, 5.7619e-02, 3.3822e-01, ..., -3.5774e-02,
-7.8579e-02, 1.0644e-02],
[-2.5287e-02, -2.5835e-01, -7.5715e-02, ..., 1.2864e-01,
1.3856e-01, 3.3581e-01],
[-5.4430e-02, -1.6442e-01, -6.7605e-02, ..., 1.7388e-01,
-3.9886e-01, -1.3006e-02],
...,
[-2.5433e-02, -8.0709e-02, 6.2163e-01, ..., 2.7345e-01,
-5.6782e-02, 1.8956e-01],
[-2.4587e-01, 8.9087e-03, -1.5240e-03, ..., -3.2474e-01,
1.1630e-02, -1.3252e-01],
[ 4.9405e-04, -3.5795e-01, -2.2226e-01, ..., -9.1428e-02,
2.2649e-01, -2.0806e-01]], device='cuda:0',
grad_fn=<EmbeddingBackward>)
Traceback (most recent call last):
File "main.py", line 185, in <module>
main()
File "main.py", line 155, in main
train_loss = trainer.train(train_dataset)
File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/trainer.py", line 29, in train
output = self.model(linput, rtree, rinput)
File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 90, in forward
rstate, rhidden = self.childsumtreelstm(rtree, rinputs)
File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
self.forward(tree.children[idx], inputs)
File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
self.forward(tree.children[idx], inputs)
File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
self.forward(tree.children[idx], inputs)
[Previous line repeated 10 more times]
File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 48, in forward
tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
IndexError: index 54 is out of bounds for dimension 0 with size 54
my current environment:
windows 10
python 3.6
pytorch 0.4
IDE pycharm
I try to run the code preprocess-sick.py and get an error cannot find or load the class
then I try to copy the java cmd to windows cmd window there is an same error raised
error code line:
cmd = ('java -cp %s DependencyParse -tokpath %s -parentpath %s -relpath %s %s < %s'
% (cp, tokpath, parentpath, relpath, tokenize_flag, filepath))
os.system(cmd)
Constituency trees showed slightly lower performance in the paper, but there should be people wanting to use them under the belief that phrase structures are appropriate for different purposes, including myself :)
Why I can't access the nlp.stanford.edu, could you send me the a copy? thank you
File ".../treelstm.pytorch/model.py", line 36, in node_forward
u = F.tanh(self.ux(inputs)+self.uh(child_h_sum))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 54, in forward
return self._backend.Linear.apply(input, self.weight, self.bias)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 12, in forward
output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at
I think you may miss the unsqueeze operation?
i = F.sigmoid(self.ix(inputs)+self.ih(child_h_sum.unsqueeze(0)))
o = F.sigmoid(self.ox(inputs)+self.oh(child_h_sum.unsqueeze(0)))
u = F.tanh(self.ux(inputs)+self.uh(child_h_sum.unsqueeze(0)))
When I run
python main.py
I met the following error message
Namespace(batchsize=25, cuda=True, data='data/sick/', epochs=15, expname='test',
glove='data/glove/', hidden_dim=50, input_dim=150, lr=0.01, mem_dim=75, num_classes=5, >optim='adagrad', save='checkpoints/', seed=123, sparse=False, wd=0.0001)
==> SICK vocabulary size : 2412
==> Size of train data : 4500
==> Size of dev data : 500
==> Size of test data : 4927
Traceback (most recent call last):
File "main.py", line 157, in
main()
File "main.py", line 126, in main
model.childsumtreelstm.emb.state_dict()['weight'].copy_(emb)
RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/THCTensorCopy.cu:31
The platform is Arch Linux and CUDA8.0
I would appreciate it for any reply.
Hi, I was going through another implementation of tree LSTM from https://github.com/ttpro1995/TreeLSTMSentiment where for sentiment classification, they consider the hidden state values instead of the cell state values. The original paper also used the notation "h" rather than "c" while doing the vector multiplication and subtraction in section 4.2. My question is why did you went for "c" instead of "h" and why not both?
When is the upgrade to pytorch 0.2 planned? (#9) Thanks @dasguptar for the amazing work done
i run pip install -r req....
but ,couldnt run it with python main.py --cuda
with the trace back:
AssertionError: Torch not compiled with CUDA enabled
Hi:
I need help!
IOError: [Errno 2] No such file or directory: 'data/sick/train/a.toks'
run python main.py with default paras, then throw this error.
Where this file a.toks comes from?
I noticed that lr in code is 0.01 while paper is 0.05 with adagrad, and I tried with 0.05 to train the model but train loss doesn't decrease at all, may be due to the high lr ?
Why you set lr to 0.01 ? And since lr is different, may be there is a bug in code ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.