newbeeer / l_dmi Goto Github PK
View Code? Open in Web Editor NEWCode for NeurIPS 2019 Paper, "L_DMI: An Information-theoretic Noise-robust Loss Function"
Code for NeurIPS 2019 Paper, "L_DMI: An Information-theoretic Noise-robust Loss Function"
Hi,
Your code looks good! But why do you set confusion matrix rather simple? I don't think your noisy data is really noisy. Have you tried more sophisticated cases?
Line 37 in 4376ede
Hi! I try to run the code for Clothing1M. But error report: GPU memory is not enough. I use RTX 2080Ti, about 10G GPU memory. So, I want to know if it's the GPU‘ memory too small.
hi, since it needs torch.log to det(mat), usually we add 1e-10 to achieve numerical stability, why in this loss, we need to add 1e-3?
Hi,
thanks for sharing your implementation. I have two questions about it:
Thanks!
Can you provide the following files: noisy_train.txt,clean_val.txt,clean_test.txt.
if train==True:
flist = os.path.join(root, "annotations/noisy_train.txt")
if valid==True:
flist = os.path.join(root, "annotations/clean_val.txt")
if test==True:
flist = os.path.join(root, "annotations/clean_test.txt")
I have written the following code for a toy dataset. It's the banana dataset. I am observing very strange issues:
`from future import print_function, absolute_import
"""
The code uses the following version:
Anaconda: 4.5.8 [64-bit]
python: 3.6.1
keras: 2.0.4
tensorflow-gpu: 1.2.0
numpy: 1.12.1 (/1.14.2)
matplotlib: 2.0.2
scipy: 1.1.0
"""
import numpy as np
from numpy.random import RandomState
import arff
import matplotlib
import matplotlib.pyplot as plt
import os
import pickle
import time
import argparse
from sklearn import model_selection
from numpy.testing import assert_array_almost_equal
import keras
import keras.backend as K
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Activation
from keras.callbacks import ModelCheckpoint
from keras import metrics
from keras.utils import plot_model
import tensorflow as tf
def L_DMI(y_true, y_pred):
U = (1/tf.dtypes.cast(tf.keras.backend.shape(y_true)[0], tf.float32))*tf.keras.backend.dot(tf.transpose(y_pred), y_true)
return -1.0 * tf.math.log(tf.dtypes.cast(tf.math.abs(tf.linalg.det(U)), tf.float32) + 1e-3)
def build_uniform_P(size, noise):
""" The noise matrix flips any class to any other with probability
noise / (#class - 1).
"""
assert(noise >= 0.) and (noise <= 1.)
P = noise / (size - 1) * np.ones((size, size))
np.fill_diagonal(P, (1 - noise) * np.ones(size))
assert_array_almost_equal(P.sum(axis=1), 1, 1)
return P
def multiclass_noisify(y, P, random_state=0):
""" Flip classes according to transition probability matrix P.
It expects a number between 0 and the number of classes - 1.
"""
assert P.shape[0] == P.shape[1]
assert np.max(y) < P.shape[0]
# row stochastic matrix
assert_array_almost_equal(P.sum(axis=1), np.ones(P.shape[1]))
assert (P >= 0.0).all()
m = y.shape[0]
new_y = y.copy()
flipper = np.random.RandomState(random_state)
for idx in np.arange(m):
i = y[idx]
# draw a vector with only an 1
flipped = flipper.multinomial(1, P[int(i), :], size=1)[0]
new_y[idx] = np.where(flipped == 1)[0]
return new_y
def noisify_with_P(labels, num_classes, noise, random_state=None):
if noise > 0.0:
P = build_uniform_P(num_classes, noise)
# seed the random numbers with #run
labels_noisy = multiclass_noisify(labels, P=P, random_state=random_state)
actual_noise = (labels_noisy != labels).mean()
assert actual_noise > 0.0
print('Actual noise %.2f' % actual_noise)
labels = labels_noisy
else:
P = np.eye(num_classes)
return labels, P
label_card = 1 #binary classification
num_classes = 2
seed = 23423
loss_fn = 'L_DMI'
#loss_fn = 'categorical_crossentropy'
batch_size = 256
epochs = 200
noise_rate = [0, 0.2, 0.4, 0.7]
noise_type = ['sym']
"""
Data preparation
"""
data_file = arff.load(open('./banana.arff', 'r'))
data_raw = data_file['data']
data_arr = np.asarray(data_raw, 'float')
data, data_lab = data_arr[:,:-label_card], data_arr[:,-label_card:]
no_samples = data.shape[0]
''' labels: {1,2} --> {0, 1} '''
index = np.where(data_lab == 1)
for i in index[0]:
data_lab[i] = 0
index = np.where(data_lab == 2)
for i in index[0]:
data_lab[i] = 1
for nt in noise_type:
for nr in noise_rate:
X_temp, X_test, y_temp, y_test = model_selection.train_test_split(data,
data_lab, test_size = 0.2, random_state = 42)
if nr > 0:
''' add noise '''
y_temp_noisy, P = noisify_with_P(y_temp, num_classes=num_classes,
noise=nr, random_state=seed)
''' random shuffle '''
idx_perm = np.random.permutation(X_temp.shape[0])
X_temp, y_temp_noisy = X_temp[idx_perm], y_temp_noisy[idx_perm]
else:
''' random shuffle '''
idx_perm = np.random.permutation(X_temp.shape[0])
X_temp, y_temp_noisy = X_temp[idx_perm], y_temp[idx_perm]
''' train and val split '''
X_train, X_val, y_train, y_val = model_selection.train_test_split(X_temp,
y_temp_noisy, test_size = 0.2, random_state = 42)
''' normalize data '''
# means = X_train.mean(axis=0)
# std = np.std(X_train)
# X_train = (X_train - means)/std
# X_val = (X_val - means)/std
# X_test = (X_test - means)/std
''' one-hot encoding '''
y_train = keras.utils.to_categorical(y_train, num_classes=num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes=num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes=num_classes)
"""
Model specifications
"""
model = Sequential()
model.add(Dense(64, kernel_initializer='glorot_uniform', activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dropout(0.2))
model.add(Dense(16, kernel_initializer='glorot_uniform', activation='relu'))
model.add(Dropout(0.5))
#model.add(Dense(1, activation= 'sigmoid'))
model.add(Dense(num_classes, kernel_initializer='glorot_uniform', activation='softmax'))
#opt = keras.optimizers.SGD(lr=1e-5, momentum=1, decay=1e-4)
#opt = keras.optimizers.RMSprop(lr=1e-3)
opt = keras.optimizers.Adam(lr=3e-4, beta_1=0.9, beta_2=0.999)
if loss_fn == 'L_DMI':
loss = L_DMI
else:
loss = 'categorical_crossentropy'
model.compile(optimizer=opt,loss=loss, metrics=['accuracy', loss])
model.summary()
"""
Callbacks
"""
callbacks = []
chkpt_filename = "model/checkpoint_banana_%s_%s_%s.hd5" % (loss_fn, nt, str(nr))
checkpoint_callback = ModelCheckpoint(chkpt_filename, monitor='val_loss', verbose=1, save_best_only=True,
save_weights_only=True, period=epochs)
callbacks.append(checkpoint_callback)
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss',
min_delta=0.001, patience=20, verbose=1, mode='auto')
#callbacks.append(early_stop)
reduce_lr_plat = keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
factor=0.1, patience=10, verbose=1, mode='auto', epsilon=0.0001,
cooldown=0, min_lr=0.00001)
callbacks.append(reduce_lr_plat)
"""
Training
"""
history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs,
validation_data=(X_val, y_val), shuffle=False, verbose=1, callbacks=callbacks)
mdl_filename = "model/model_banana_%s_%s_%s.hd5" % (loss_fn, nt, str(nr))
model.save(mdl_filename)
print('Saved trained model at %s ' % (mdl_filename))
"""
Testing
"""
if loss_fn == 'categorical_crossentropy':
model_load = keras.models.load_model(mdl_filename)
else:
model_load = keras.models.load_model(mdl_filename,
custom_objects={loss_fn:loss})
pred_prob = model_load.predict(X_test)
pred_lab = pred_prob.copy()
pred_lab[pred_prob >= 0.5] = 1
pred_lab[pred_prob < 0.5] = 0
score = model_load.evaluate(X_test, y_test, batch_size=batch_size)
print("==========================================================")
print("==========================================================")
print(" ")
print('Test loss:', score[0])
print('Test accuracy:', score[1])
print(" ")
print("==========================================================")
print("==========================================================")
"""
Store results
"""
res_filename = 'results/results_banana_%s_%s_%s.pkl' % (loss_fn, nt, str(nr))
res_file = open(res_filename, 'wb')
pickle.dump([pred_prob, score], res_file)
res_file.close()
print('Saved model results at %s ' % (res_filename))
"""
Plot a graph of the model
"""
#plot_model(model, to_file='model.png')
"""
Plot results
"""
# res_filename = 'results/results_banana_%s_%s_%s.pkl' % (loss_fn, nt, str(nr))
#
# res_file = open(res_filename, 'rb')
# tmp_dat = pickle.load(res_file)
X_zero_test = []
Y_zero_test = []
X_one_test = []
Y_one_test = []
X_zero_pred = []
Y_zero_pred = []
X_one_pred = []
Y_one_pred = []
for i in range(len(y_test)):
if y_test[i][1] != 0:
X_one_test.append(X_test[i][0])
Y_one_test.append(X_test[i][1])
else:
X_zero_test.append(X_test[i][0])
Y_zero_test.append(X_test[i][1])
if pred_lab[i][1] != 0:
X_one_pred.append(X_test[i][0])
Y_one_pred.append(X_test[i][1])
else:
X_zero_pred.append(X_test[i][0])
Y_zero_pred.append(X_test[i][1])
plt.figure(2)
plt.scatter(X_zero_test, Y_zero_test, c= 'b', label = 'class-0')
plt.scatter(X_one_test, Y_one_test, c= 'r', label = 'class-1')
plt.title('clean data distribution')
plt.show()
plt.figure(3)
plt.scatter(X_zero_pred, Y_zero_pred, c= 'b', label = 'class-0')
plt.scatter(X_one_pred, Y_one_pred, c= 'r', label = 'class-1')
plt.title('prediction data (%s percent noise)' % (str(nr)))
plt.show()
plt.figure(4)
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()
plt.figure(5)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()
`
Hello, I am trying to implement DMI_loss, where can I get the data set of dogcat and clothing?
Thank you!
Hello!
Is it normal to have negative values in the LDMI loss? If so, do you have any intuition on how this affects backpropagation when the loss values are around zero? I am encountering some instabilities in such cases (occurring for 60% of uniform random noise in CIFAR-10, when 60% means that 60% of the labels are incorrect, not just random).
Thanks in advance.
Best,
Eric
Hi!
I would like to run L-DMI in CIFAR-10 and CIFAR-100. I would appreciated a bit of help to run it properly.
As far as I understood from the paper/code, you pre-train with cross-entropy and then you apply the DMI loss. Am I right? Also, how many epochs do I need to run with cross-entropy? Which learning rate should I use? In my experience, when training with cross-entropy a high learning rate is desirable to prevent (to some extent) fitting the label noise. Furthermore, when applying the DMI loss, I can train with 0.00001 learning rate as suggested in the paper, right?
Many thanks in advance!
Best,
Diego.
It seems that if we directly use the model without pre-training, the obtained performance will be extremely terrible. Can someone explain that? Thanks!
When i'm running main.py it is throwing error in fashion.py while defining the conf_matrix.
Please help me out of how to run your code in steps.
Thanks very much for your interest. Please find the dataset at the following URL:
https://drive.google.com/folderview?id=0B67_d0rLRTQYU2E4aHNHaE1uMTg&usp=sharing
Detailed instructions and terms of use are listed in the README.md. Please do not redistribute the dataset ( said by the original data collector). Thanks.
Originally posted by @Newbeeer in #8 (comment)
Hi, I interested to use this loss for multi-label classification. However, that would mean I won't be applying the softmax function on the output. Do you think the loss function will still work?
A great work!
However, when the number of categories is much larger than batch size, e.g., 700 categories v.s. batch size being 64, det() always equals 0. I use ResNet50 as backbone. The batch size is limited by memory.
A wonderful work! It seems that this new loss can also be used for clean data? And different from cross entropy loss, which just calculates loss for each example separately, this loss consider the class distribution information between examples. Maybe it can achieve better performance. Have you tried it ? @CSPSY @Newbeeer
你好,我希望能尝试Clothing1M数据集,请问一下我在哪里可以下载该数据集吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.