Giter Site home page Giter Site logo

cglb's Issues

Is ```observe_class_IL_batch``` function considering inter-edge connections in ```pipeline_class_IL_no_inter_edge_minibatch```?

In the observe_class_IL_batch function of ergnn_model.py while sampling the subgraph corresponding to task id t > 0 the code seems to sample the subgraph from the entire dataset. However,

  1. The entire graph should not be available for any task ID during training.
  2. It may include inter-task edges between different task IDs already existing in the dataset.

Therefore, this code seems to take an extra advantage in the class incremental setting without inter-edge connections in pipeline_class_IL_no_inter_edge_minibatch.

Can you please clarify our concern?

non-continuous categories

Hello!

Your code removes categories with the sample size less than 2, resulting in non-continuous integers in the categories. However, it seems that the categories were not reordered during the subsequent task partitioning. May I ask if there will be any errors on this side?

Thank you!

Reddit Dataset Batch Size Issue

Hi @QueuQ,

When reproducing the results with Reddit dataset, there seems to be an issue considering the last task. For the last task, I am receiving the following error concerning the batch size.

ValueError: Expected input batch_size (1) to match target batch_size (0).

Have you come across with this problem? What would be the cause of this size misfitting problem? Thanks in advance.

ValueError from GEM on Tox21

Thanks for your effort to keep this benchmark better!

I have continuously kept facing errors from GEM on Tox21 in GCGL as below:

....
EarlyStopping counter: 41 out of 10

error constraints are inconsistent, no solution
Traceback (most recent call last):
File "train.py", line 118, in
AP, AF, acc_matrix = main(args,valid=True)
File "/home/python/CGLB/GCGL/pipeline_org.py", line 375, in pipeline_multi_label
life_model_ins.observe(train_loader, loss_criterion, tid, args)
File "/home/python/CGLB/GCGL/Baselines/gem_model.py", line 122, in observe
self.grads.index_select(1, indx), self.margin)
File "/home/python/CGLB/GCGL/Baselines/gem_utils.py", line 66, in project2cone2
v = quadprog.solve_qp(P, q, G, h)[0]
File "quadprog/quadprog.pyx", line 102, in quadprog.solve_qp
ValueError: constraints are inconsistent, no solution

Since the error occured during quadratic programming, the error might not be due to the original code.
However, I've never got the result but only error while I implementing GEM on Tox21.
Have you ever faced the same or similar case of error?

add more units to the output layer in LWF

Hi,

Thanks for open source the code for CGLB. I was implementing LWF in the task-incremental setting and ran into a problem when I tried to add more units to the output when the new task came. For example, I have four classes in the first task and five classes in the second task. So in the first time step, the output unit is four, at the second time step, the output units of the model should be 9. I tried:

class GCN(torch.nn.Module):
def init(self, in_channels, hidden_channels, out_channels):
super(GCN, self).init()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
self.apply(kaiming_normal_init)

def forward(self, x, edge_index):
    x = F.relu(self.conv1(x, edge_index))
    x = self.conv2(x, edge_index)
    m = torch.nn.Sigmoid()
    return m(x)

def add_new_outputs(self, num_new_classes):
    # Add new output units for each new class
    in_channels = self.conv2.out_channels
    out_channels = self.conv2.out_channels + num_new_classes
    new_conv2 = GCNConv(in_channels, out_channels)

    # the parameters trained for the old tasks copied to the new defined layer
    print(self.conv2.weight)
    print(new_conv2.weight)
    new_conv2.weight[:in_channels] = self.conv2.weight
    new_conv2.bias[:in_channels] = self.conv2.bias

    self.conv2 = new_conv2

and got the error message "Object GCNConv has no attribute weight".
May i ask how did you copy the weights from the previous model to the new model in LWF? or you assume the model knows how many classes in total the datasets have

Thanks in advance!

Performance matrix visualization for GCGL is not working correctly

Issue: Performance matrix visualization for GCGL tasks are all black, despite the acc being high.
Reason: The accuracy matrix for GCGL is not multiplied by 100, which is done for NCGL tasks.

AP = round(np.mean(score_matrix[-1, :]), 4)

acc_mean = round(np.mean(acc_mean)*100,2)

Solution:
Add *100 for GCGL accuracies somewhere. I added this in the following line:
acc_matrices.append(acc_matrix_test*100)
acc_matrices.append(acc_matrix_test)

Error occurred for jointtrain in GCGL

in the line 470 in CGLB/GCGL/pipeline.py, error occurred due to the wrong form of inputs

  for epoch in range(epochs):
        # Train
        if args['method'] == 'lwf':
            train_func(train_loader, loss_criterion, tid, args, prev_model)
        elif args['method'] == 'jointtrain':
            train_func(train_loader, loss_criterion, tid, args, train_loader_joint)
        else:
            train_func(train_loader, loss_criterion, tid, args)

The codes should be updated as below?

[1]
the function observe_tskIL_multicls in GCGL/Baselines/jointtrain_model.py
from def observe_tskIL_multicls(self, data_loader, loss_criterion, task_i, args, train_loader_joint):
-> to def observe_tskIL_multicls(self, train_loader_joint, loss_criterion, task_i, args):

And

[2]
in pipeline.py
from train_func(train_loader, loss_criterion, tid, args, train_loader_joint)
-> to train_func(train_loader_joint, loss_criterion, tid, args)

Reproducibility on Joint Training for Graph Classification

Hi,

I try using the codes to reproduce the results in the paper. I did not change any line of code and I could not reproduce the results for Joint training and Learning without forgetting. Their values are extremely low. I run with the command:

python /scratch1/mengxiwu/CLGL/GCGL/train.py
--dataset Aromaticity-CL
--method jointtrain
--backbone GCN -
-gpu 0
--clsIL False

python /scratch1/mengxiwu/CLGL/GCGL/train.py
--dataset Aromaticity-CL
--method lwf
--backbone GCN
--gpu 0
--clsIL False

Many thanks!

Redundant training in testing phase?

In the following snippet, the model is first training for a user defined epoch, then overwritten by model = pickle.load(open(save_model_path,'rb')).cuda(args['gpu']) if valid == False (which is the case for test phase). Does the training function still serves a purpose or is it redundant?

CGLB/GCGL/pipeline.py

Lines 370 to 389 in 6d71034

for epoch in range(epochs):
# Train
if args['method'] == 'lwf':
life_model_ins.observe(train_loader, loss_criterion, tid, args, prev_model)
else:
life_model_ins.observe(train_loader, loss_criterion, tid, args)
# Validation and early stop
val_score = val_func(args, model, val_loader, tid)
early_stop = stopper.step(val_score, model)
if early_stop and args['early_stop']:
print(epoch)
break
if not args['pre_trained'] and valid and args['early_stop']:
stopper.load_checkpoint(model)
if not valid:
model = pickle.load(open(save_model_path,'rb')).cuda(args['gpu'])
score_matrix[tid] = test_func(args, model, test_loader, tid)

error report when implementing GEM on Tox21 datasets

I encountered the error while I implementing GEM on Tox21-tIL in TaskIncremental setting.

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

The parameters of the model were nan so consequently logits were also nan. This error does not occur in other settings and datasets, but only Tox21 one.

Question about hyper-parameter search in GCGL

In the appendix, I found the following grid-search for hyper-parameters, which I think is used in GCGL tasks too.

image

However, the GCGL datasets only contains less than 1000 samples per task, this means GEM is effectively storing the entire history of data for regularization purposes when n_memory is set to 1000 during hyper-param search, which I assume to have the best performance. However, this doesn't seem fair as it destroys the purpose of continual learning, but maybe you used different parameters for GCGL? I want to know the exact hyperparameter which is used for the evaluation of GEM-GCGL if possible.

image

Thank you
Wei

[Reproduce] Hyperparameters

Dear @QueuQ ,
I find this work interesting and helpful for continual learning with graphs. According to the paper, the results from ERGNN is quite dominant. However, I can't reproduce your results with the default args. Thus, I would like to know the set of parameters that you used.
Thanks.

GEM Memory_data

tmask = np.random.choice(self.mask, self.n_memories, replace = False)
tmask = np.array(tmask)
self.memory_data.append(tmask)

old_task_loss = loss[self.memory_data[old_task_i],old_task_i].mean()

However, the loss is (batch_size, 1). Why use self.memory_data[old_task_i] won't go out of memory?

GCGL, TWP error when using higher GCN hidden units

When the GCN hidden unit size is larger than the input size, TWP will throw the following error:

Traceback (most recent call last):
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main
    AP, AF, acc_matrix,cls_matrix = main(args, valid=True)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class
    train_func(train_loader, loss_criterion, tid, args)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL
    eloss.backward()
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

because of the following code snippet:

if self._in_feats > self._out_feats:
# mult W first to reduce the feature size for aggregation.
if weight is not None:
feat = th.matmul(feat, weight)
graph.srcdata['h'] = feat
#######
graph.ndata['feat'] = feat
graph.apply_edges(lambda edges: {'e': th.sum((th.mul(edges.src['h'], th.tanh(edges.dst['h']))),1)})
e = self.leaky_relu(graph.edata.pop('e'))
e_soft = edge_softmax(graph, e)
graph.ndata.pop('feat')
#######
graph.update_all(fn.copy_src(src='h', out='m'),
fn.sum(msg='m', out='h'))
rst = graph.dstdata['h']
else:
# aggregate first then mult W
graph.srcdata['h'] = feat
#######
graph.ndata['feat'] = feat
graph.apply_edges(lambda edges: {'e': th.sum((th.mul(edges.src['h'], th.tanh(edges.dst['h']))),1)})
e = self.leaky_relu(graph.edata.pop('e'))
e_soft = edge_softmax(graph, e)
graph.ndata.pop('feat')
#######
graph.update_all(fn.copy_src(src='h', out='m'),
fn.sum(msg='m', out='h'))
rst = graph.dstdata['h']
if weight is not None:
rst = th.matmul(rst, weight)

When hidden unit size is bigger than the input size, the if statement is false, causing the else statement to be executed. However, in the else statement, for the first layer of GCN, the edge weight is computed without interacting with the trainable weight, making it unoptimizable, hence the error.

SIDER-tIL Jointtrain Cannot be reproduced

Hi,

I used the latest codes but the SIDER-tIL's accuracy is only 0.61 not 0.68 as reported. I did not change a single line of code. Could you check it?

Many thanks!

Problem in utils.py

On lines 95-98, the cls_balance parameter should be set instead of the mask parameter. After the mask parameter is set, the sample for testing is selected.

Problem in function pipeline_task_IL_inter_edge_minibatch

When considering the edges between tasks, why is the input during training the intersection of the training set train_ids of all previous task nodes and the current task node? The training subgraph is the subgraph formed by all the previous tasks rather than the subgraph of the current task.

GEM baseline in class-IL for GCGL has wrong indentation for optimizer step?

as seen in the next code fragment, line 212, optimizer.step() is outside of the for loop, which iterates over the current task.

for batch_id, batch_data in enumerate(data_loader[task_i]):
smiles, bg, labels, masks = batch_data
bg = bg.to(f"cuda:{args['gpu']}")
labels, masks = labels.cuda(), masks.cuda()
logits = predict(args, self.net, bg)
# class balance
n_per_cls = [(labels == j).sum() for j in clss]
loss_w_ = [1. / max(i, 1) for i in n_per_cls]
loss_w_ = torch.tensor(loss_w_).to(device='cuda:{}'.format(args['gpu']))
# labels= labels.long()
for i, c in enumerate(clss):
labels[labels == c] = i
# Mask non-existing labels
loss = loss_criterion(logits[:, clss], labels.long(), weight=loss_w_).float()
self.optimizer.zero_grad()
loss.backward()
# check if gradient violates constraints
if len(self.observed_tasks) > 1:
# copy gradient
store_grad(self.net.parameters, self.grads, self.grad_dims, task_i)
indx = torch.cuda.LongTensor(self.observed_tasks[:-1])
dotp = torch.mm(self.grads[:, task_i].unsqueeze(0),
self.grads.index_select(1, indx))
if (dotp < 0).sum() != 0:
project2cone2(self.grads[:, task_i].unsqueeze(1),
self.grads.index_select(1, indx), self.margin)
# copy gradients back
overwrite_grad(self.net.parameters, self.grads[:, task_i],
self.grad_dims)
self.optimizer.step()

while in the same file, in line 296 and line 127, this step is inside the for loop.

Find a bug.

I am interested in this benchmark. When looking through the code, there seems to be a code bug in GCGL/Baselines/mas_model?

for f in range(len(new_fisher)):
self.fisher[f] = (self.fisher[f]*self.n_observed_data + new_fisher[f]+n_new_data)/(self.n_observed_data+n_new_data)

If the moving average is calculated, it should be a multiplication operation here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.