The cglb's discuss from queuq

Is ```observe_class_IL_batch``` function considering inter-edge connections in ```pipeline_class_IL_no_inter_edge_minibatch```?

In the observe_class_IL_batch function of ergnn_model.py while sampling the subgraph corresponding to task id t > 0 the code seems to sample the subgraph from the entire dataset. However,

The entire graph should not be available for any task ID during training.
It may include inter-task edges between different task IDs already existing in the dataset.

Therefore, this code seems to take an extra advantage in the class incremental setting without inter-edge connections in pipeline_class_IL_no_inter_edge_minibatch.

Can you please clarify our concern?

non-continuous categories

Hello!

Your code removes categories with the sample size less than 2, resulting in non-continuous integers in the categories. However, it seems that the categories were not reordered during the subsequent task partitioning. May I ask if there will be any errors on this side?

Thank you!

Reddit Dataset Batch Size Issue

Hi @QueuQ,

When reproducing the results with Reddit dataset, there seems to be an issue considering the last task. For the last task, I am receiving the following error concerning the batch size.

ValueError: Expected input batch_size (1) to match target batch_size (0).

Have you come across with this problem? What would be the cause of this size misfitting problem? Thanks in advance.

Class Incremental for Node Classification EWC and MAS cannot be reproduced

Hi,

I ran the codes on A40 GPU, using the supplementary materials' hyperparameters.
EWC on CoraFull with 4.4±0.6 is far from 15.02 provided in the paper.

Thanks!

ValueError from GEM on Tox21

Thanks for your effort to keep this benchmark better!

I have continuously kept facing errors from GEM on Tox21 in GCGL as below:

....
EarlyStopping counter: 41 out of 10

error constraints are inconsistent, no solution
Traceback (most recent call last):
File "train.py", line 118, in
AP, AF, acc_matrix = main(args,valid=True)
File "/home/python/CGLB/GCGL/pipeline_org.py", line 375, in pipeline_multi_label
life_model_ins.observe(train_loader, loss_criterion, tid, args)
File "/home/python/CGLB/GCGL/Baselines/gem_model.py", line 122, in observe
self.grads.index_select(1, indx), self.margin)
File "/home/python/CGLB/GCGL/Baselines/gem_utils.py", line 66, in project2cone2
v = quadprog.solve_qp(P, q, G, h)[0]
File "quadprog/quadprog.pyx", line 102, in quadprog.solve_qp
ValueError: constraints are inconsistent, no solution

Since the error occured during quadratic programming, the error might not be due to the original code.
However, I've never got the result but only error while I implementing GEM on Tox21.
Have you ever faced the same or similar case of error?

add more units to the output layer in LWF

Hi,

Thanks for open source the code for CGLB. I was implementing LWF in the task-incremental setting and ran into a problem when I tried to add more units to the output when the new task came. For example, I have four classes in the first task and five classes in the second task. So in the first time step, the output unit is four, at the second time step, the output units of the model should be 9. I tried:

class GCN(torch.nn.Module):
def init(self, in_channels, hidden_channels, out_channels):
super(GCN, self).init()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
self.apply(kaiming_normal_init)

def forward(self, x, edge_index):
    x = F.relu(self.conv1(x, edge_index))
    x = self.conv2(x, edge_index)
    m = torch.nn.Sigmoid()
    return m(x)

def add_new_outputs(self, num_new_classes):
    # Add new output units for each new class
    in_channels = self.conv2.out_channels
    out_channels = self.conv2.out_channels + num_new_classes
    new_conv2 = GCNConv(in_channels, out_channels)

    # the parameters trained for the old tasks copied to the new defined layer
    print(self.conv2.weight)
    print(new_conv2.weight)
    new_conv2.weight[:in_channels] = self.conv2.weight
    new_conv2.bias[:in_channels] = self.conv2.bias

    self.conv2 = new_conv2

and got the error message "Object GCNConv has no attribute weight".
May i ask how did you copy the weights from the previous model to the new model in LWF? or you assume the model knows how many classes in total the datasets have

Thanks in advance!

Performance matrix visualization for GCGL is not working correctly

Issue: Performance matrix visualization for GCGL tasks are all black, despite the acc being high.
Reason: The accuracy matrix for GCGL is not multiplied by 100, which is done for NCGL tasks.

CGLB/GCGL/pipeline.py

Line 479 in 6d71034

AP = round(np.mean(score_matrix[-1, :]), 4)

CGLB/NCGL/pipeline.py

Line 527 in 6d71034

acc_mean = round(np.mean(acc_mean)*100,2)

Solution:
Add *100 for GCGL accuracies somewhere. I added this in the following line:
acc_matrices.append(acc_matrix_test*100)

CGLB/GCGL/train.py

Line 153 in 6d71034

acc_matrices.append(acc_matrix_test)

Error occurred for jointtrain in GCGL

in the line 470 in CGLB/GCGL/pipeline.py, error occurred due to the wrong form of inputs

  for epoch in range(epochs):
        # Train
        if args['method'] == 'lwf':
            train_func(train_loader, loss_criterion, tid, args, prev_model)
        elif args['method'] == 'jointtrain':
            train_func(train_loader, loss_criterion, tid, args, train_loader_joint)
        else:
            train_func(train_loader, loss_criterion, tid, args)

The codes should be updated as below?

[1]
the function observe_tskIL_multicls in GCGL/Baselines/jointtrain_model.py
from def observe_tskIL_multicls(self, data_loader, loss_criterion, task_i, args, train_loader_joint):
-> to def observe_tskIL_multicls(self, train_loader_joint, loss_criterion, task_i, args):

And

[2]
in pipeline.py
from train_func(train_loader, loss_criterion, tid, args, train_loader_joint)
-> to train_func(train_loader_joint, loss_criterion, tid, args)

Reproducibility on Joint Training for Graph Classification

Hi,

I try using the codes to reproduce the results in the paper. I did not change any line of code and I could not reproduce the results for Joint training and Learning without forgetting. Their values are extremely low. I run with the command:

python /scratch1/mengxiwu/CLGL/GCGL/train.py
--dataset Aromaticity-CL
--method jointtrain
--backbone GCN -
-gpu 0
--clsIL False

python /scratch1/mengxiwu/CLGL/GCGL/train.py
--dataset Aromaticity-CL
--method lwf
--backbone GCN
--gpu 0
--clsIL False

Many thanks!

Environment configuration

my problem has been solved

Redundant training in testing phase?

In the following snippet, the model is first training for a user defined epoch, then overwritten by model = pickle.load(open(save_model_path,'rb')).cuda(args['gpu']) if valid == False (which is the case for test phase). Does the training function still serves a purpose or is it redundant?

CGLB/GCGL/pipeline.py

Lines 370 to 389 in 6d71034

    
           for epoch in range(epochs): 
        
               # Train 
        
               if args['method'] == 'lwf': 
        
                   life_model_ins.observe(train_loader, loss_criterion, tid, args, prev_model) 
        
               else: 
        
                   life_model_ins.observe(train_loader, loss_criterion, tid, args) 
        
               # Validation and early stop 
        
               val_score = val_func(args, model, val_loader, tid) 
        
               early_stop = stopper.step(val_score, model) 
        
               if early_stop and args['early_stop']: 
        
                   print(epoch) 
        
                   break 
        
           if not args['pre_trained'] and valid and args['early_stop']: 
        
               stopper.load_checkpoint(model) 
        
           if not valid: 
        
               model = pickle.load(open(save_model_path,'rb')).cuda(args['gpu']) 
        
           score_matrix[tid] = test_func(args, model, test_loader, tid)

GCGL set_random seed is never called

CGLB/GCGL/utils.py

Line 10 in 3e0debf

def set_random_seed(seed=0):

The function to set the random seed for GCGL is never executed, the reproducibility of the result is hindered.

error report when implementing GEM on Tox21 datasets

I encountered the error while I implementing GEM on Tox21-tIL in TaskIncremental setting.

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

The parameters of the model were nan so consequently logits were also nan. This error does not occur in other settings and datasets, but only Tox21 one.

Question about hyper-parameter search in GCGL

In the appendix, I found the following grid-search for hyper-parameters, which I think is used in GCGL tasks too.

However, the GCGL datasets only contains less than 1000 samples per task, this means GEM is effectively storing the entire history of data for regularization purposes when n_memory is set to 1000 during hyper-param search, which I assume to have the best performance. However, this doesn't seem fair as it destroys the purpose of continual learning, but maybe you used different parameters for GCGL? I want to know the exact hyperparameter which is used for the evaluation of GEM-GCGL if possible.

Thank you
Wei

[Reproduce] Hyperparameters

Dear @QueuQ ,
I find this work interesting and helpful for continual learning with graphs. According to the paper, the results from ERGNN is quite dominant. However, I can't reproduce your results with the default args. Thus, I would like to know the set of parameters that you used.
Thanks.

GEM Memory_data

tmask = np.random.choice(self.mask, self.n_memories, replace = False)
tmask = np.array(tmask)
self.memory_data.append(tmask)

old_task_loss = loss[self.memory_data[old_task_i],old_task_i].mean()

However, the loss is (batch_size, 1). Why use self.memory_data[old_task_i] won't go out of memory?

GCGL, TWP error when using higher GCN hidden units

When the GCN hidden unit size is larger than the input size, TWP will throw the following error:

Traceback (most recent call last):
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main
    AP, AF, acc_matrix,cls_matrix = main(args, valid=True)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class
    train_func(train_loader, loss_criterion, tid, args)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL
    eloss.backward()
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

because of the following code snippet:

CGLB/GCGL/Backbones/graphconv.py

Lines 155 to 188 in 793a346

    
           if self._in_feats > self._out_feats: 
        
               # mult W first to reduce the feature size for aggregation. 
        
               if weight is not None: 
        
                   feat = th.matmul(feat, weight) 
        
               graph.srcdata['h'] = feat 
        
               ####### 
        
               graph.ndata['feat'] = feat 
        
               graph.apply_edges(lambda edges: {'e': th.sum((th.mul(edges.src['h'], th.tanh(edges.dst['h']))),1)})         
        
               e = self.leaky_relu(graph.edata.pop('e'))  
        
               e_soft = edge_softmax(graph, e)   
        
               graph.ndata.pop('feat') 
        
               ####### 
        
               graph.update_all(fn.copy_src(src='h', out='m'), 
        
                                fn.sum(msg='m', out='h')) 
        
               rst = graph.dstdata['h'] 
        
           else: 
        
               # aggregate first then mult W 
        
               graph.srcdata['h'] = feat 
        
               ####### 
        
               graph.ndata['feat'] = feat 
        
               graph.apply_edges(lambda edges: {'e': th.sum((th.mul(edges.src['h'], th.tanh(edges.dst['h']))),1)})         
        
               e = self.leaky_relu(graph.edata.pop('e'))  
        
               e_soft = edge_softmax(graph, e)   
        
               graph.ndata.pop('feat') 
        
               ####### 
        
               graph.update_all(fn.copy_src(src='h', out='m'), 
        
                                fn.sum(msg='m', out='h')) 
        
               rst = graph.dstdata['h'] 
        
               if weight is not None: 
        
                   rst = th.matmul(rst, weight)

When hidden unit size is bigger than the input size, the if statement is false, causing the else statement to be executed. However, in the else statement, for the first layer of GCN, the edge weight is computed without interacting with the trainable weight, making it unoptimizable, hence the error.

SIDER-tIL Jointtrain Cannot be reproduced

Hi,

I used the latest codes but the SIDER-tIL's accuracy is only 0.61 not 0.68 as reported. I did not change a single line of code. Could you check it?

Many thanks!

Difference in joint train performance for aromaticity CL of CGLB and BeGin

Hello,

I've come across another CGL benchmark, BeGin (https://arxiv.org/pdf/2211.14568.pdf), in table 5.c. the performance of aromaticity-CL is reported, they have a result of 0.286, while in CGLB, this is ~78%. I would expect the result to be more similar.

Kr

Problem in utils.py

On lines 95-98, the cls_balance parameter should be set instead of the mask parameter. After the mask parameter is set, the sample for testing is selected.

Problem in function pipeline_task_IL_inter_edge_minibatch

When considering the edges between tasks, why is the input during training the intersection of the training set train_ids of all previous task nodes and the current task node? The training subgraph is the subgraph formed by all the previous tasks rather than the subgraph of the current task.

GEM baseline in class-IL for GCGL has wrong indentation for optimizer step?

as seen in the next code fragment, line 212, optimizer.step() is outside of the for loop, which iterates over the current task.

CGLB/GCGL/Baselines/gem_model.py

Lines 178 to 212 in 3e0debf

    
           for batch_id, batch_data in enumerate(data_loader[task_i]): 
        
               smiles, bg, labels, masks = batch_data 
        
               bg = bg.to(f"cuda:{args['gpu']}") 
        
               labels, masks = labels.cuda(), masks.cuda() 
        
               logits = predict(args, self.net, bg) 
        
               # class balance 
        
               n_per_cls = [(labels == j).sum() for j in clss] 
        
               loss_w_ = [1. / max(i, 1) for i in n_per_cls] 
        
               loss_w_ = torch.tensor(loss_w_).to(device='cuda:{}'.format(args['gpu'])) 
        
               # labels= labels.long() 
        
               for i, c in enumerate(clss): 
        
                   labels[labels == c] = i 
        
               # Mask non-existing labels 
        
               loss = loss_criterion(logits[:, clss], labels.long(), weight=loss_w_).float() 
        
               self.optimizer.zero_grad() 
        
               loss.backward() 
        
               # check if gradient violates constraints 
        
               if len(self.observed_tasks) > 1: 
        
                   # copy gradient 
        
                   store_grad(self.net.parameters, self.grads, self.grad_dims, task_i) 
        
                   indx = torch.cuda.LongTensor(self.observed_tasks[:-1]) 
        
                   dotp = torch.mm(self.grads[:, task_i].unsqueeze(0), 
        
                                   self.grads.index_select(1, indx)) 
        
                   if (dotp < 0).sum() != 0: 
        
                       project2cone2(self.grads[:, task_i].unsqueeze(1), 
        
                                     self.grads.index_select(1, indx), self.margin) 
        
                       # copy gradients back 
        
                       overwrite_grad(self.net.parameters, self.grads[:, task_i], 
        
                                      self.grad_dims) 
        
           self.optimizer.step()

while in the same file, in line 296 and line 127, this step is inside the for loop.

Find a bug.

I am interested in this benchmark. When looking through the code, there seems to be a code bug in GCGL/Baselines/mas_model?

CGLB/GCGL/Baselines/mas_model.py

Lines 196 to 197 in 966ba0e

    
           for f in range(len(new_fisher)): 
        
               self.fisher[f] = (self.fisher[f]*self.n_observed_data + new_fisher[f]+n_new_data)/(self.n_observed_data+n_new_data)

If the moving average is calculated, it should be a multiplication operation here.

	for epoch in range(epochs):
	# Train
	if args['method'] == 'lwf':
	life_model_ins.observe(train_loader, loss_criterion, tid, args, prev_model)
	else:
	life_model_ins.observe(train_loader, loss_criterion, tid, args)

	# Validation and early stop
	val_score = val_func(args, model, val_loader, tid)
	early_stop = stopper.step(val_score, model)

	if early_stop and args['early_stop']:
	print(epoch)
	break

	if not args['pre_trained'] and valid and args['early_stop']:
	stopper.load_checkpoint(model)
	if not valid:
	model = pickle.load(open(save_model_path,'rb')).cuda(args['gpu'])
	score_matrix[tid] = test_func(args, model, test_loader, tid)

	if self._in_feats > self._out_feats:
	# mult W first to reduce the feature size for aggregation.
	if weight is not None:
	feat = th.matmul(feat, weight)
	graph.srcdata['h'] = feat

	#######
	graph.ndata['feat'] = feat
	graph.apply_edges(lambda edges: {'e': th.sum((th.mul(edges.src['h'], th.tanh(edges.dst['h']))),1)})
	e = self.leaky_relu(graph.edata.pop('e'))
	e_soft = edge_softmax(graph, e)
	graph.ndata.pop('feat')
	#######

	graph.update_all(fn.copy_src(src='h', out='m'),
	fn.sum(msg='m', out='h'))
	rst = graph.dstdata['h']
	else:
	# aggregate first then mult W
	graph.srcdata['h'] = feat

	#######
	graph.ndata['feat'] = feat
	graph.apply_edges(lambda edges: {'e': th.sum((th.mul(edges.src['h'], th.tanh(edges.dst['h']))),1)})
	e = self.leaky_relu(graph.edata.pop('e'))
	e_soft = edge_softmax(graph, e)
	graph.ndata.pop('feat')
	#######

	graph.update_all(fn.copy_src(src='h', out='m'),
	fn.sum(msg='m', out='h'))
	rst = graph.dstdata['h']
	if weight is not None:
	rst = th.matmul(rst, weight)

	for batch_id, batch_data in enumerate(data_loader[task_i]):
	smiles, bg, labels, masks = batch_data
	bg = bg.to(f"cuda:{args['gpu']}")
	labels, masks = labels.cuda(), masks.cuda()
	logits = predict(args, self.net, bg)

	# class balance
	n_per_cls = [(labels == j).sum() for j in clss]
	loss_w_ = [1. / max(i, 1) for i in n_per_cls]
	loss_w_ = torch.tensor(loss_w_).to(device='cuda:{}'.format(args['gpu']))
	# labels= labels.long()
	for i, c in enumerate(clss):
	labels[labels == c] = i

	# Mask non-existing labels
	loss = loss_criterion(logits[:, clss], labels.long(), weight=loss_w_).float()

	self.optimizer.zero_grad()
	loss.backward()

	# check if gradient violates constraints
	if len(self.observed_tasks) > 1:
	# copy gradient
	store_grad(self.net.parameters, self.grads, self.grad_dims, task_i)
	indx = torch.cuda.LongTensor(self.observed_tasks[:-1])
	dotp = torch.mm(self.grads[:, task_i].unsqueeze(0),
	self.grads.index_select(1, indx))
	if (dotp < 0).sum() != 0:
	project2cone2(self.grads[:, task_i].unsqueeze(1),
	self.grads.index_select(1, indx), self.margin)
	# copy gradients back
	overwrite_grad(self.net.parameters, self.grads[:, task_i],
	self.grad_dims)

	self.optimizer.step()

	for f in range(len(new_fisher)):
	self.fisher[f] = (self.fisher[f]*self.n_observed_data + new_fisher[f]+n_new_data)/(self.n_observed_data+n_new_data)

queuq / cglb Goto Github PK

cglb's Issues

Recommend Projects

Recommend Topics

Recommend Org