queuq / cglb Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
When the GCN hidden unit size is larger than the input size, TWP will throw the following error:
Traceback (most recent call last):
File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main
AP, AF, acc_matrix,cls_matrix = main(args, valid=True)
File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class
train_func(train_loader, loss_criterion, tid, args)
File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL
eloss.backward()
File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
because of the following code snippet:
CGLB/GCGL/Backbones/graphconv.py
Lines 155 to 188 in 793a346
When hidden unit size is bigger than the input size, the if statement is false, causing the else statement to be executed. However, in the else statement, for the first layer of GCN, the edge weight is computed without interacting with the trainable weight, making it unoptimizable, hence the error.
Hi,
Thanks for open source the code for CGLB. I was implementing LWF in the task-incremental setting and ran into a problem when I tried to add more units to the output when the new task came. For example, I have four classes in the first task and five classes in the second task. So in the first time step, the output unit is four, at the second time step, the output units of the model should be 9. I tried:
class GCN(torch.nn.Module):
def init(self, in_channels, hidden_channels, out_channels):
super(GCN, self).init()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
self.apply(kaiming_normal_init)
def forward(self, x, edge_index):
x = F.relu(self.conv1(x, edge_index))
x = self.conv2(x, edge_index)
m = torch.nn.Sigmoid()
return m(x)
def add_new_outputs(self, num_new_classes):
# Add new output units for each new class
in_channels = self.conv2.out_channels
out_channels = self.conv2.out_channels + num_new_classes
new_conv2 = GCNConv(in_channels, out_channels)
# the parameters trained for the old tasks copied to the new defined layer
print(self.conv2.weight)
print(new_conv2.weight)
new_conv2.weight[:in_channels] = self.conv2.weight
new_conv2.bias[:in_channels] = self.conv2.bias
self.conv2 = new_conv2
and got the error message "Object GCNConv has no attribute weight".
May i ask how did you copy the weights from the previous model to the new model in LWF? or you assume the model knows how many classes in total the datasets have
Thanks in advance!
Hi,
I try using the codes to reproduce the results in the paper. I did not change any line of code and I could not reproduce the results for Joint training and Learning without forgetting. Their values are extremely low. I run with the command:
python /scratch1/mengxiwu/CLGL/GCGL/train.py
--dataset Aromaticity-CL
--method jointtrain
--backbone GCN -
-gpu 0
--clsIL False
python /scratch1/mengxiwu/CLGL/GCGL/train.py
--dataset Aromaticity-CL
--method lwf
--backbone GCN
--gpu 0
--clsIL False
Many thanks!
Dear @QueuQ ,
I find this work interesting and helpful for continual learning with graphs. According to the paper, the results from ERGNN is quite dominant. However, I can't reproduce your results with the default args. Thus, I would like to know the set of parameters that you used.
Thanks.
my problem has been solved
Hi @QueuQ,
When reproducing the results with Reddit dataset, there seems to be an issue considering the last task. For the last task, I am receiving the following error concerning the batch size.
ValueError: Expected input batch_size (1) to match target batch_size (0).
Have you come across with this problem? What would be the cause of this size misfitting problem? Thanks in advance.
When considering the edges between tasks, why is the input during training the intersection of the training set train_ids of all previous task nodes and the current task node? The training subgraph is the subgraph formed by all the previous tasks rather than the subgraph of the current task.
Hi,
I used the latest codes but the SIDER-tIL's accuracy is only 0.61 not 0.68 as reported. I did not change a single line of code. Could you check it?
Many thanks!
Hello,
I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.
Kr
In the appendix, I found the following grid-search for hyper-parameters, which I think is used in GCGL tasks too.
However, the GCGL datasets only contains less than 1000 samples per task, this means GEM is effectively storing the entire history of data for regularization purposes when n_memory is set to 1000 during hyper-param search, which I assume to have the best performance. However, this doesn't seem fair as it destroys the purpose of continual learning, but maybe you used different parameters for GCGL? I want to know the exact hyperparameter which is used for the evaluation of GEM-GCGL if possible.
Thank you
Wei
tmask = np.random.choice(self.mask, self.n_memories, replace = False)
tmask = np.array(tmask)
self.memory_data.append(tmask)
old_task_loss = loss[self.memory_data[old_task_i],old_task_i].mean()
However, the loss is (batch_size, 1). Why use self.memory_data[old_task_i] won't go out of memory?
in the line 470 in CGLB/GCGL/pipeline.py, error occurred due to the wrong form of inputs
for epoch in range(epochs):
# Train
if args['method'] == 'lwf':
train_func(train_loader, loss_criterion, tid, args, prev_model)
elif args['method'] == 'jointtrain':
train_func(train_loader, loss_criterion, tid, args, train_loader_joint)
else:
train_func(train_loader, loss_criterion, tid, args)
The codes should be updated as below?
[1]
the function observe_tskIL_multicls in GCGL/Baselines/jointtrain_model.py
from def observe_tskIL_multicls(self, data_loader, loss_criterion, task_i, args, train_loader_joint):
-> to def observe_tskIL_multicls(self, train_loader_joint, loss_criterion, task_i, args):
And
[2]
in pipeline.py
from train_func(train_loader, loss_criterion, tid, args, train_loader_joint)
-> to train_func(train_loader_joint, loss_criterion, tid, args)
Hello!
Your code removes categories with the sample size less than 2, resulting in non-continuous integers in the categories. However, it seems that the categories were not reordered during the subsequent task partitioning. May I ask if there will be any errors on this side?
Thank you!
I encountered the error while I implementing GEM on Tox21-tIL in TaskIncremental setting.
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
The parameters of the model were nan so consequently logits were also nan. This error does not occur in other settings and datasets, but only Tox21 one.
Line 10 in 3e0debf
The function to set the random seed for GCGL is never executed, the reproducibility of the result is hindered.
Thanks for your effort to keep this benchmark better!
I have continuously kept facing errors from GEM on Tox21 in GCGL as below:
....
EarlyStopping counter: 41 out of 10
error constraints are inconsistent, no solution
Traceback (most recent call last):
File "train.py", line 118, in
AP, AF, acc_matrix = main(args,valid=True)
File "/home/python/CGLB/GCGL/pipeline_org.py", line 375, in pipeline_multi_label
life_model_ins.observe(train_loader, loss_criterion, tid, args)
File "/home/python/CGLB/GCGL/Baselines/gem_model.py", line 122, in observe
self.grads.index_select(1, indx), self.margin)
File "/home/python/CGLB/GCGL/Baselines/gem_utils.py", line 66, in project2cone2
v = quadprog.solve_qp(P, q, G, h)[0]
File "quadprog/quadprog.pyx", line 102, in quadprog.solve_qp
ValueError: constraints are inconsistent, no solution
Since the error occured during quadratic programming, the error might not be due to the original code.
However, I've never got the result but only error while I implementing GEM on Tox21.
Have you ever faced the same or similar case of error?
On lines 95-98, the cls_balance parameter should be set instead of the mask parameter. After the mask parameter is set, the sample for testing is selected.
as seen in the next code fragment, line 212, optimizer.step() is outside of the for loop, which iterates over the current task.
CGLB/GCGL/Baselines/gem_model.py
Lines 178 to 212 in 3e0debf
I am interested in this benchmark. When looking through the code, there seems to be a code bug in GCGL/Baselines/mas_model?
CGLB/GCGL/Baselines/mas_model.py
Lines 196 to 197 in 966ba0e
In the following snippet, the model is first training for a user defined epoch, then overwritten by model = pickle.load(open(save_model_path,'rb')).cuda(args['gpu'])
if valid == False (which is the case for test phase). Does the training function still serves a purpose or is it redundant?
Lines 370 to 389 in 6d71034
Hi,
I ran the codes on A40 GPU, using the supplementary materials' hyperparameters.
EWC on CoraFull with 4.4±0.6 is far from 15.02 provided in the paper.
Thanks!
Issue: Performance matrix visualization for GCGL tasks are all black, despite the acc being high.
Reason: The accuracy matrix for GCGL is not multiplied by 100, which is done for NCGL tasks.
Line 479 in 6d71034
Line 527 in 6d71034
acc_matrices.append(acc_matrix_test*100)
Line 153 in 6d71034
In the observe_class_IL_batch
function of ergnn_model.py while sampling the subgraph corresponding to task id t > 0
the code seems to sample the subgraph from the entire dataset. However,
Therefore, this code seems to take an extra advantage in the class incremental setting without inter-edge connections in pipeline_class_IL_no_inter_edge_minibatch
.
Can you please clarify our concern?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.