newbeeer / l_dmi Goto Github PK

View Code? Open in Web Editor NEW

117.0 117.0 16.0 21.21 MB

Code for NeurIPS 2019 Paper, "L_DMI: An Information-theoretic Noise-robust Loss Function"

Python 98.58% Shell 1.42%

l_dmi's People

Contributors

Stargazers

Watchers

Forkers

zhyhan trantorrepository hakeyi gohsyi kawa23 suyanzhou626 wang3702 guixianjin yyht xrosliang peggyzz christofernal mutual-ai kongyanlei trellixvulnteam liangfengjiao

l_dmi's Issues

Confusion Matrix

Hi,

Your code looks good! But why do you set confusion matrix rather simple? I don't think your noisy data is really noisy. Have you tried more sophisticated cases?

L_DMI/CIFAR-10/dataset.py

Line 37 in 4376ede

conf_matrix = torch.eye(10)

For Clothing1M dataset, how many GPU memory are needed?

Hi! I try to run the code for Clothing1M. But error report: GPU memory is not enough. I use RTX 2080Ti, about 10G GPU memory. So, I want to know if it's the GPU‘ memory too small.

why dmi loss need to add '0.001' to det(mat)+0.001

hi, since it needs torch.log to det(mat), usually we add 1e-10 to achieve numerical stability, why in this loss, we need to add 1e-3?

new datasets/ noisy instances

Hi,
thanks for sharing your implementation. I have two questions about it:

Is the code tailored to the datasets used in the paper or can one apply it to any data?
Is it possible to identify the noisy instances (return the noisy IDs or the clean set)?

Thanks!

clothing1M

   Can you provide the following files: noisy_train.txt,clean_val.txt,clean_test.txt.
    if train==True:
        flist = os.path.join(root, "annotations/noisy_train.txt")
    if valid==True:
        flist = os.path.join(root, "annotations/clean_val.txt")
    if test==True:
        flist = os.path.join(root, "annotations/clean_test.txt")

Negative Loss Values and a Strange Issue

I have written the following code for a toy dataset. It's the banana dataset. I am observing very strange issues:

Occasionally the loss values are negative. They are small in magnitude but negative nonetheless
If I run the code for, say 10 times, train accuracy converges to 70-80% and 10-20% for alternate runs (under 0% label noise)

`from future import print_function, absolute_import

"""
The code uses the following version:

Anaconda: 4.5.8 [64-bit]
python: 3.6.1
keras: 2.0.4
tensorflow-gpu: 1.2.0
numpy: 1.12.1 (/1.14.2)
matplotlib: 2.0.2
scipy: 1.1.0

"""

import numpy as np
from numpy.random import RandomState
import arff
import matplotlib
import matplotlib.pyplot as plt
import os
import pickle
import time
import argparse
from sklearn import model_selection
from numpy.testing import assert_array_almost_equal

import keras
import keras.backend as K
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Activation
from keras.callbacks import ModelCheckpoint
from keras import metrics
from keras.utils import plot_model

import tensorflow as tf

def L_DMI(y_true, y_pred):

U = (1/tf.dtypes.cast(tf.keras.backend.shape(y_true)[0], tf.float32))*tf.keras.backend.dot(tf.transpose(y_pred), y_true)
return -1.0 * tf.math.log(tf.dtypes.cast(tf.math.abs(tf.linalg.det(U)), tf.float32) + 1e-3)

def build_uniform_P(size, noise):
""" The noise matrix flips any class to any other with probability
noise / (#class - 1).
"""

assert(noise >= 0.) and (noise <= 1.)

P = noise / (size - 1) * np.ones((size, size))
np.fill_diagonal(P, (1 - noise) * np.ones(size))

assert_array_almost_equal(P.sum(axis=1), 1, 1)
return P

def multiclass_noisify(y, P, random_state=0):
""" Flip classes according to transition probability matrix P.
It expects a number between 0 and the number of classes - 1.
"""

assert P.shape[0] == P.shape[1]
assert np.max(y) < P.shape[0]

# row stochastic matrix
assert_array_almost_equal(P.sum(axis=1), np.ones(P.shape[1]))
assert (P >= 0.0).all()

m = y.shape[0]
new_y = y.copy()
flipper = np.random.RandomState(random_state)

for idx in np.arange(m):
    i = y[idx]
    # draw a vector with only an 1
    flipped = flipper.multinomial(1, P[int(i), :], size=1)[0]
    new_y[idx] = np.where(flipped == 1)[0]

return new_y

def noisify_with_P(labels, num_classes, noise, random_state=None):

if noise > 0.0:
    P = build_uniform_P(num_classes, noise)
    # seed the random numbers with #run
    labels_noisy = multiclass_noisify(labels, P=P, random_state=random_state)
    actual_noise = (labels_noisy != labels).mean()
    assert actual_noise > 0.0
    print('Actual noise %.2f' % actual_noise)
    labels = labels_noisy
else:
    P = np.eye(num_classes)

return labels, P

label_card = 1 #binary classification
num_classes = 2
seed = 23423

loss_fn = 'L_DMI'
#loss_fn = 'categorical_crossentropy'
batch_size = 256
epochs = 200

noise_rate = [0, 0.2, 0.4, 0.7]
noise_type = ['sym']

"""
Data preparation
"""

data_file = arff.load(open('./banana.arff', 'r'))
data_raw = data_file['data']
data_arr = np.asarray(data_raw, 'float')

data, data_lab = data_arr[:,:-label_card], data_arr[:,-label_card:]
no_samples = data.shape[0]

''' labels: {1,2} --> {0, 1} '''
index = np.where(data_lab == 1)
for i in index[0]:
data_lab[i] = 0

index = np.where(data_lab == 2)
for i in index[0]:
data_lab[i] = 1

for nt in noise_type:
for nr in noise_rate:

    X_temp, X_test, y_temp, y_test = model_selection.train_test_split(data,
     data_lab, test_size = 0.2, random_state = 42)

    if nr > 0:
        ''' add noise '''
        y_temp_noisy, P =  noisify_with_P(y_temp, num_classes=num_classes,
        noise=nr, random_state=seed)

        ''' random shuffle '''
        idx_perm = np.random.permutation(X_temp.shape[0])
        X_temp, y_temp_noisy = X_temp[idx_perm], y_temp_noisy[idx_perm]
    else:
        ''' random shuffle '''
        idx_perm = np.random.permutation(X_temp.shape[0])
        X_temp, y_temp_noisy = X_temp[idx_perm], y_temp[idx_perm]


    ''' train and val split '''
    X_train, X_val, y_train, y_val = model_selection.train_test_split(X_temp,
     y_temp_noisy, test_size = 0.2, random_state = 42)

    ''' normalize data '''
    # means = X_train.mean(axis=0)
    # std = np.std(X_train)
    # X_train = (X_train - means)/std
    # X_val = (X_val - means)/std
    # X_test = (X_test - means)/std

    ''' one-hot encoding '''
    y_train = keras.utils.to_categorical(y_train, num_classes=num_classes)
    y_val = keras.utils.to_categorical(y_val, num_classes=num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes=num_classes)

    """
    Model specifications
    """

    model = Sequential()
    model.add(Dense(64, kernel_initializer='glorot_uniform', activation='relu', input_shape=(X_train.shape[1],)))
    model.add(Dropout(0.2))
    model.add(Dense(16, kernel_initializer='glorot_uniform', activation='relu'))
    model.add(Dropout(0.5))
    #model.add(Dense(1, activation= 'sigmoid'))
    model.add(Dense(num_classes, kernel_initializer='glorot_uniform', activation='softmax'))

    #opt = keras.optimizers.SGD(lr=1e-5, momentum=1, decay=1e-4)
    #opt = keras.optimizers.RMSprop(lr=1e-3)
    opt = keras.optimizers.Adam(lr=3e-4, beta_1=0.9, beta_2=0.999)

    if loss_fn == 'L_DMI':
        loss = L_DMI
    else:
        loss = 'categorical_crossentropy'

    model.compile(optimizer=opt,loss=loss, metrics=['accuracy', loss])
    model.summary()

    """
    Callbacks
    """
    callbacks = []

    chkpt_filename = "model/checkpoint_banana_%s_%s_%s.hd5" % (loss_fn, nt, str(nr))
    checkpoint_callback = ModelCheckpoint(chkpt_filename, monitor='val_loss', verbose=1, save_best_only=True,
    save_weights_only=True, period=epochs)

    callbacks.append(checkpoint_callback)

    early_stop = keras.callbacks.EarlyStopping(monitor='val_loss',
    min_delta=0.001, patience=20, verbose=1, mode='auto')

    #callbacks.append(early_stop)

    reduce_lr_plat = keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
    factor=0.1, patience=10, verbose=1, mode='auto', epsilon=0.0001,
    cooldown=0, min_lr=0.00001)

    callbacks.append(reduce_lr_plat)

    """
    Training
    """
    history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs,
    validation_data=(X_val, y_val), shuffle=False, verbose=1, callbacks=callbacks)

    mdl_filename = "model/model_banana_%s_%s_%s.hd5" % (loss_fn, nt, str(nr))
    model.save(mdl_filename)
    print('Saved trained model at %s ' % (mdl_filename))


    """
    Testing
    """
    if loss_fn == 'categorical_crossentropy':
        model_load = keras.models.load_model(mdl_filename)
    else:
        model_load = keras.models.load_model(mdl_filename,
        custom_objects={loss_fn:loss})

    pred_prob = model_load.predict(X_test)
    pred_lab = pred_prob.copy()
    pred_lab[pred_prob >= 0.5] = 1
    pred_lab[pred_prob < 0.5] = 0

    score = model_load.evaluate(X_test, y_test, batch_size=batch_size)
    print("==========================================================")
    print("==========================================================")
    print("                                                          ")
    print('Test loss:', score[0])
    print('Test accuracy:', score[1])
    print("                                                          ")
    print("==========================================================")
    print("==========================================================")

    """
    Store results
    """
    res_filename = 'results/results_banana_%s_%s_%s.pkl' % (loss_fn, nt, str(nr))

    res_file = open(res_filename, 'wb')
    pickle.dump([pred_prob, score], res_file)

    res_file.close()
    print('Saved model results at %s ' % (res_filename))


    """
    Plot a graph of the model
    """
    #plot_model(model, to_file='model.png')

    """
    Plot results
    """
    # res_filename = 'results/results_banana_%s_%s_%s.pkl' % (loss_fn, nt, str(nr))
    #
    # res_file = open(res_filename, 'rb')
    # tmp_dat = pickle.load(res_file)

    X_zero_test = []
    Y_zero_test = []

    X_one_test = []
    Y_one_test = []

    X_zero_pred = []
    Y_zero_pred = []

    X_one_pred = []
    Y_one_pred = []

    for i in range(len(y_test)):
        if y_test[i][1] != 0:
            X_one_test.append(X_test[i][0])
            Y_one_test.append(X_test[i][1])
        else:
            X_zero_test.append(X_test[i][0])
            Y_zero_test.append(X_test[i][1])

        if pred_lab[i][1] != 0:
            X_one_pred.append(X_test[i][0])
            Y_one_pred.append(X_test[i][1])
        else:
            X_zero_pred.append(X_test[i][0])
            Y_zero_pred.append(X_test[i][1])


    plt.figure(2)
    plt.scatter(X_zero_test, Y_zero_test, c= 'b', label = 'class-0')
    plt.scatter(X_one_test, Y_one_test, c= 'r', label = 'class-1')
    plt.title('clean data distribution')
    plt.show()

    plt.figure(3)
    plt.scatter(X_zero_pred, Y_zero_pred, c= 'b', label = 'class-0')
    plt.scatter(X_one_pred, Y_one_pred, c= 'r', label = 'class-1')
    plt.title('prediction data (%s percent noise)' % (str(nr)))
    plt.show()

    plt.figure(4)
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('Model accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Validation'], loc='lower right')
    plt.show()

    plt.figure(5)
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Validation'], loc='lower right')
    plt.show()

DMI_loss

Hello, I am trying to implement DMI_loss, where can I get the data set of dogcat and clothing?

Thank you!

Negative loss values

Hello!

Is it normal to have negative values in the LDMI loss? If so, do you have any intuition on how this affects backpropagation when the loss values are around zero? I am encountering some instabilities in such cases (occurring for 60% of uniform random noise in CIFAR-10, when 60% means that 60% of the labels are incorrect, not just random).

Thanks in advance.
Best,
Eric

Help on running L_DMI

Hi!

I would like to run L-DMI in CIFAR-10 and CIFAR-100. I would appreciated a bit of help to run it properly.
As far as I understood from the paper/code, you pre-train with cross-entropy and then you apply the DMI loss. Am I right? Also, how many epochs do I need to run with cross-entropy? Which learning rate should I use? In my experience, when training with cross-entropy a high learning rate is desirable to prevent (to some extent) fitting the label noise. Furthermore, when applying the DMI loss, I can train with 0.00001 learning rate as suggested in the paper, right?
Many thanks in advance!

Best,
Diego.

Why bad performance without model pre-training?

It seems that if we directly use the model without pre-training, the obtained performance will be extremely terrible. Can someone explain that? Thanks!

Error in fashion.py

When i'm running main.py it is throwing error in fashion.py while defining the conf_matrix.
Please help me out of how to run your code in steps.

您好，如何才能具有访问链接的权限？

    Thanks very much for your interest. Please find the dataset at the following URL:

https://drive.google.com/folderview?id=0B67_d0rLRTQYU2E4aHNHaE1uMTg&usp=sharing

Detailed instructions and terms of use are listed in the README.md. Please do not redistribute the dataset ( said by the original data collector). Thanks.

Originally posted by @Newbeeer in #8 (comment)

Can this loss can be use for multi label classification?

Hi, I interested to use this loss for multi-label classification. However, that would mean I won't be applying the softmax function on the output. Do you think the loss function will still work?

det() always equals 0 when the number of categories is much larger than batch size, e.g., 700 classes v.s. 64 batch.

A great work!

However, when the number of categories is much larger than batch size, e.g., 700 categories v.s. batch size being 64, det() always equals 0. I use ResNet50 as backbone. The batch size is limited by memory.

It seems that this new loss can also be used for clean data?

A wonderful work! It seems that this new loss can also be used for clean data? And different from cross entropy loss, which just calculates loss for each example separately, this loss consider the class distribution information between examples. Maybe it can achieve better performance. Have you tried it ? @CSPSY @Newbeeer

Clothing1M数据集下载

你好，我希望能尝试Clothing1M数据集，请问一下我在哪里可以下载该数据集吗？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.