fabiodimarco / tf-levenberg-marquardt Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 14.0 86 KB

Tensorflow implementation of Levenberg-Marquardt training algorithm

License: MIT License

Python 80.18% Jupyter Notebook 19.82%

tf-levenberg-marquardt's People

Contributors

Stargazers

Watchers

Forkers

pedrosergiot alhermann nehapant14 chriszonghaoli sedgewickmm18 defau1taxis codingcatmountain mikoflip caiomizerkowski mira-ba mrleiyz dopawei usmandroid nauanelinhares

tf-levenberg-marquardt's Issues

Error in resuduals when labels given as int instead of float64

When trying to train a simple neural network with labels fed as integers I encountered the following error

TypeError: Exception encountered when calling layer "tf.math.subtract" (type TFOpLambda). Input 'y' of 'Sub' Op has type float32 that does not match type int64 of argument 'x'.

Originating in the following method:

def residuals(self, y_true, y_pred): return y_true - y_pred

I tried casting in the method but It raised a further error. It started working once I simply converted the labels to floats before training.

Maybe we could add a check and a warning in order to avoid runtime exceptions.

how can I use model_wrapper to test the model and get the predicted value?

hellow ,how can I use model_wrapper to test the model and get the predicted value?

damping method and matrix solver

Hi Fabio,

damping method

Based on two interesting documents:
[1] http://www2.imm.dtu.dk/pubdb/edoc/imm3215.pdf
[2] https://people.duke.edu/~hpgavin/ce281/lm.pdf

It seems that it has been proposed that the LM damping method features a normalization to JJT (max(diag(JJT)) that may provide better and more stable minimization.

The code is in DampingAlgorithm

            damping = tf.eye(tf.shape(JJ)[0], dtype=JJ.dtype)

        damping = tf.scalar_mul(damping_factor, damping)
        return tf.add(JJ, damping)

Following the documents, I propose this modification:

            max_diag = tf.math.reduce_max(tf.linalg.diag_part(JJ))
            damping = max_diag * tf.eye(tf.shape(JJ)[0], dtype=JJ.dtype)

        damping = tf.scalar_mul(damping_factor, damping)
        return tf.add(JJ, damping)

with a starting_value=1e-6 and tf.random.set_seed(1234), I can get these results from your provided example.

We can observe clearly that the minimization is more stable; indeed, at same number of steps, we can observe a decade smaller in loss. Of course, it may require more tests to see if this factor helps in all cases. Also, considering the value of the loss, I may suggest to set the property tf.keras.backend.set_floatx('float64') for the curve fitting.

In addition, the document [1] (pp.8) shows that it is possible to observe a benefit when the damped method is monitored by a step length. It shows particularly that the Nielsen strategy may help on the general stability of the optimizer.

Cholesky method

The Cholesky method is probably an efficient numerical solutions to solve a square matrix. Unfortunately, by definition, Cholesky requires a positive definite matrix (due to the square root). However, it may have a simple way to reach this requirement with LM which is described in Appendix A of document [1].

As instance, this code, written with numpy (not tensorflow) shows that we can get a positive definite matrix by just increasing the LM damping parameter:

        while 1:
            try:
                # if not positive definite, we damp mu
                L = np.cholesky(self.A + self.mu*I)
                y = np.lstsq(L, self.g.T[0], rcond=None)[0]
                self.dX = np.lstsq(L.T, y, rcond=None)[0]
            except LinAlgError as e:
                self.mu = 10*self.mu
            else:
                break

Of course, this piece of code needs to be converted using methods of tensorflow.
There are certain advantages by using Cholesky method. We can keep the convergence of the optimizer (as QR) and the speed can be improved sensitively (faster by a factor of ~3).
Moreover, when matrix is not well formed (near-singular and rank deficient matrices), it mostly has consequence that the resulted LM step will not be so good; an increase of the damping will provide a better step. Then, it may help to improve the stability of the optimizer.

What do you think about this?

** I apologize to have put 2 different topics on same issue thread. If you like, I can copy paste in another thread.

Thanks for your support,
Raphael.

Getting a shape error while trying to fit another dataset

Hello,
I am trying to implement this algorithm by following the code. After preparing the model and dataset, whenever I try to fit another dataset which is the boston_housing dataset imported from Keras, I get a shape error. It says "Cannot convert a partially known tensor shape to a tensor(13,1, None)."I am in a confusion about how to fix the shape of any dataset to fit into this algorithm, especially for a regression problem. Please help me and hope to get your response.

Issue with a model that returns the gradient of a sequence

Hi Fabio,

I am observing an issue with the LM optimizer.
The test case described in the attachment is about a model that returns the gradient of a sequence.
The model is described as following:

class NeuralNet(tf.keras.Model):
    def __init__(self, dim):
        super().__init__()
        self.lay1 = Dense(dim[1], activation='softplus')
        self.lay2 = Dense(dim[2], activation='softplus')
        self.lay3 = Dense(dim[3])

    def model_charge(self, x_input, training=False):
        """ model of gate charge
        """
        x = self.lay1(x_input, training=training)
        x = self.lay2(x, training=training)
        x = self.lay3(x, training=training)
        return x

    def call(self, x_input, training=False):
        """ calc gradient from model_charge > dQ/dV = C
        """
        with tf.GradientTape(persistent=True) as tape:
            x_input = tf.convert_to_tensor(x_input)
            tape.watch(x_input)
            x = self.model_charge(x_input, training)
            o_x = tape.gradient(x, x_input)
        return o_x

This model is used to fit a capacitance curve (curve fitting) with a charge model (C = dQ/dV). It is required to minimize the gradient. Later, the weight/bias (and derivation of the layers into equations) can be included into a more bigger model (current + charge) such written in a veriloga Spice model file (which requires a charge model as input). The test case works with NADAM but not with LM. This is why I suspect a bug with LM. I admit that GradientTape is tricky and not a usual case.

Python terminates with a ValueError without starting the optimization:

    File "...bug/levenberg_marquardt.py", line 400, in _compute_jacobian  *
        jacobians = [tf.reshape(j, (num_residuals, -1)) for j in jacobians]

    ValueError: Tried to convert 'tensor' to a tensor and failed. Error: None values not supported.

I would like to use LM because it shows much better optimum than NADAM in consideration of a curve fitting job. In general, I can see at least 2 decades better in RMS.
I have no clue how to solve this issue. Is it possible that you get a look?

Thank you very much for your support !
Raphael.

The test case is described in this file fit.py.txt.

How to save this model and load weights?

Hello im having trouble saving and loading weights, please help out

Input matrix is not invertible

Hi,

I am trying to build a regression network using levenberg optimizer. I tried using your package but I get the error - Input matrix is not invertible. Could you please suggest what I can do to resolve this error?

Thanks,
Lipi

TypeError when trying to train model

Hi.

First of all, thank you very much for the effort in developing this version of LM for Keras, which is the only implementation as far as I know. The main issue that I am facing is related to the following error.

TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.

This error arises as I try to execute the training for the model as

def create_model_lm(neurons1=1, neurons2=0, n_features=None):
    # create model
    model = Sequential()
    model.add(Dense(neurons1, input_dim=n_features, kernel_initializer='uniform', activation='tanh'))
    model.add(Dropout(0.2))
    if neurons2 != 0:
        model.add(Dense(neurons2, kernel_initializer='uniform', activation='tanh'))
        model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
    # Compile model
    model_wrapper = lm.ModelWrapper(model)
    model_wrapper.compile(
        optimizer=SGD(learning_rate=0.1),
        loss=lm.BinaryCrossentropy(from_logits=True),
        metrics=['accuracy'])

    return model_wrapper

model = create_model_lm(params['neurons1'], params['neurons2'], params['npcs'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=20, restore_best_weights=True)

history = model.fit(train_data, train_target, batch_size=batch,
                        validation_data=(test_data, test_target),
                        shuffle=True, epochs=400, verbose=0,
                        callbacks=[es])

Can you give me a little help with any ideias for this specific error? I would be very grateful for any insight.

Random results

Hello, I get a random result everytime I fit the model with my data. Is there a way to specify seed so that i get the same result with respect to a seed? or does this algorithm just work differently?

How to use LM algorithm in a custom train loop with custom loss function?

Dear Sir or Madam,
Thanks for your endeavor to develop such code.
And, I try to use it in a custom training loop, and the loss function is also custom (except for y_true, and y_pred, it has other input variables). So, what should I do to solve this problem by modifying your code?

Hope for your response ASAP.
Thanks a lot . @fabiodimarco

Combine fireTS library for NARX network with Levenberg Marquardt

Hi.
I want to create a NARX (Nonlinear Autoregressive with exogenous variables) model based on LM (Levenberg Marquardt) method.

Since this two methods are not implemented in keras, I search for the library fireTs https://pypi.org/project/fireTS/ (for NARX) and your implementation of LM and I'm trying to combine them. This is the code:

import tensorflow as tf
import numpy as np
import levenberg_marquardt as lm
from fireTS.models import NARX

input_size = 20000
batch_size = 1000

x_train = np.linspace(-1, 1, input_size, dtype=np.float64)
y_train = np.sinc(10 * x_train)

x_train = tf.expand_dims(tf.cast(x_train, tf.float32), axis=-1)
y_train = tf.expand_dims(tf.cast(y_train, tf.float32), axis=-1)

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(input_size)
train_dataset = train_dataset.batch(batch_size).cache()
train_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)

model = tf.keras.Sequential([
tf.keras.layers.Dense(20, activation='tanh', input_shape=(1,)),
tf.keras.layers.Dense(1, activation='linear')])

model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss=tf.keras.losses.MeanSquaredError())

model_wrapper = lm.ModelWrapper(model)

model_wrapper.compile(
optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),
loss=lm.MeanSquaredError())

mdl1 = NARX(
model_wrapper,
auto_order=2,
exog_order=[2, 2],
exog_delay=[1, 1])

mdl1.fit(train_dataset,epoch=10)
ypred1 = mdl1.predict(x=x_test, y=y_test)

ypred1

And I'm having this error:

AttributeError Traceback (most recent call last)

in ()
37 auto_order=2,
38 exog_order=[2, 2],
---> 39 exog_delay=[1, 1])
40
41 mdl1.fit(train_dataset,epoch=10)

2 frames

/usr/local/lib/python3.7/dist-packages/fireTS/core.py in init(self, base_estimator, **base_params)
14
15 def init(self, base_estimator, **base_params):
---> 16 self.base_estimator = base_estimator.set_params(**base_params)
17
18 def set_params(self, **params):

AttributeError: 'ModelWrapper' object has no attribute 'set_params'

Any solution?

Applying the LM optimizer for PINNs

Hi, I would like to apply the LM optimizer for the training of physics informed neural networks (PINNs), I have to write a custom loss function so I followed one example that was mentioned in another issue and then I customized it to my problem. Here I am trying to solve a very simple ODE. As far as I understood I have to create a custom loss class and then I have to call the custom loss function in the call method. Here is my code:

import sys
import os.path
import tensorflow as tf
import math as m
import numpy as np
import time
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Layer, Activation, Dense, BatchNormalization
from tensorflow.keras.optimizers import Adam
import levenberg_marquardt as lm
import matplotlib.pyplot as plt


pi = tf.constant(m.pi)

inputs = tf.keras.Input(shape=(1,),dtype='float32')
layer1 = Dense(units=50, activation='tanh',dtype='float32')(inputs)
layer2 = Dense(units=50, activation='tanh',dtype='float32')(layer1)
predictions = Dense(units=1, activation='linear',dtype='float32')(layer2)
model = tf.keras.Model(inputs=inputs, outputs=predictions)
model.summary()

t0 = tf.Variable([[0.0]], shape=[1,1])



# Define custom loss
def custom_loss():

    # Write here the loss function
    t = tf.random.uniform(shape=[1000,1],maxval = 1.5)
    uIC = model_wrapper.predict(t0)
    with tf.GradientTape() as tape1:
        u = model_wrapper.predict(t)
    u_t = tape1.gradient(u, t)[:,0]
    
    tf.print("\n IC Loss is : ", tf.reduce_mean(tf.square(uIC - 1)), output_stream=sys.stdout)
    tf.print("\n ODE Loss is : ", tf.reduce_mean(tf.square(u_t-4*pi*tf.math.cos(2*pi*t))), output_stream=sys.stdout)
    
    loss = tf.reduce_mean(tf.square(uIC - 1)) + \
    10.0 * tf.reduce_mean(tf.square(u_t-4*pi*tf.math.cos(2*pi*t)))

    # Return a function
    return loss


class CustomLoss(tf.keras.losses.Loss):
    def __init__(self,
                 reduction=tf.keras.losses.Reduction.AUTO,
                 name='custom_loss'):
        super(CustomLoss, self).__init__(
            reduction=reduction,
            name=name)

    def call(self, y_true, y_pred):
        
        loss = custom_loss()

        return loss

    def residuals(self, y_true, y_pred):

        loss = custom_loss()
        eps = tf.keras.backend.epsilon()
        residuals = tf.math.sqrt(eps+loss)
        
        return residuals
        # you have to write here the code according to how you custom loss is defined
        # so that: loss = mean(residuals^2)

input_T = tf.linspace([0.0], [1.0], 1000) # We define a dummy input in order to use the fit method
output_T = tf.linspace([0.0], [1.0], 1000)  # We define a dummy output

model.summary()

# model.compile(
#     optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
#     loss=custom_loss())

model_wrapper = lm.ModelWrapper(
    tf.keras.models.clone_model(model))

model_wrapper.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),
    loss=CustomLoss())

model_wrapper.fit(input_T,output_T, epochs=100)

tTest = tf.Variable(tf.linspace([0.0], [1.5], 1000))
tInit = tf.Variable([[0.0]])


with tf.GradientTape(persistent=True) as tape1:
    uIC = model_wrapper.predict(tInit)
u = model_wrapper.predict(tTest)

u_t = tape1.gradient(u, tTest)

As you can see the loss function contains two terms, I have the following error

'
uIC = model_wrapper.predict(t0)
RuntimeError: Method requires being in cross-replica context, use get_replica_context().merge_call()
'

Applying Levenberg-Marquardt to physically informed neural networks (PINNs)

We have been using your Levenberg-Marquardt code successfully for anomaly detection in time series data and it works well. Thank you for making this available! See https://arxiv.org/abs/2111.06060

I am now interested in applying your Levenberg-Marquardt code to PINNs as this code should provide better/faster results than the combination of Adam and BFGS. PINNS typically use a relatively small number of parameters. I have tried lm_wrapper on the PINN model https://github.com/okada39/pinn_burgers but it generates an error as the PINN model is not a simple keras model. Looks like the Levenberg-Marquardt optimiser needs to connected in a different way and am hoping you can help:-

Traceback (most recent call last):
File "/home/599/jt9268/PINN_project/pinn_burgers/main.py", line 64, in
model_wrap = lm.ModelWrapper(pinn)
File "/home/599/jt9268/PINN_project/pinn_burgers/levenberg_marquardt.py", line 673, in init
super(ModelWrapper, self).init([model])
File "/scratch/ue12/jt9268/tflow/lib/python3.10/site-packages/tensorflow/python/training/tracking/base.py", line 629, in _method_wrapper
result = method(self, *args, **kwargs)
File "/scratch/ue12/jt9268/tflow/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/scratch/ue12/jt9268/tflow/lib/python3.10/site-packages/keras/engine/functional.py", line 598, in _run_internal_graph
assert x_id in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Exception encountered when calling layer "model_1" (type Functional).

Could not compute output KerasTensor(type_spec=TensorSpec(shape=(None, 1), dtype=tf.float32, name=None), name='model/dense_3/BiasAdd:0', description="created by layer 'model'")

Error when using the "fit" function on the wrapped model "unexpected keyword argument 'count_mode'"

I get the following error when using the wrapped function on the "fit" function:

Executing this:
model_wrapper.fit(train_dataset, epochs=2)

Results in:
callbacks.append(tf.keras.callbacks.ProgbarLogger( TypeError: __init__() got an unexpected keyword argument 'count_mode'

I have also tried with the provided examples in this git directory, but get the same error.

This goes away if we set "verbose=0" in the fit function defined in the wrapper class, but I? would like to see if there is any solution.

Can you please help?

Thanks

Error when running the code test_curve_fitting.py

Hello, I would like to thank you for the codes, I am trying to use the LM optimizer in my work and I have encountered an error when I run the test_curve_fitting.py script. The error says the following :

AttributeError: 'ModelWrapper' object has no attribute '_validate_target_and_loss'

I am using Tensorflow 2.7.0, could please help me in checking the source of this error, the error pops up after calling the fit method of the model_wrapper. Thank you very much.

Return value for ModelWrapper fit()

Calling fit() on a model typically returns an object that holds the history of the training that can be plotted to visualize the model performance. There's no return value for the ModelWrapper fit() method, though.

    def fit(self,
            x=None,
            y=None,
            batch_size=None,
            epochs=1,
            verbose=1,
            callbacks=None,
            **kwargs):
        if verbose > 0:
            if callbacks is None:
                callbacks = []

            callbacks.append(tf.keras.callbacks.ProgbarLogger(
                count_mode='steps',
                stateful_metrics=["damping_factor", "attempts"]))
        return super(ModelWrapper, self).fit( # return inserted here
            x=x,
            y=y,
            batch_size=batch_size,
            epochs=epochs,
            verbose=verbose,
            callbacks=callbacks,
            **kwargs)

Returning the call to the fit() method of super (as above) seems to solve this problem.

Need help

How to prediction of trainer ?

thank you

Retracing warning on latest tensorflow version

Hello, thank you for releasing your code for public use. When I run model.fit() I'm encountering a retracing warning. Per this stack overflow link I simply added "i = tf.cast(step, tf.int64)" after line 437 in levenberg_marquardt.py and the warning went away. I considered making a PR but this is a relatively small change.

Hyperparameter tuning to avoid overfitting

Hello there! I hope you are doing well. I created an issue earlier which was about getting a shape error while trying to fit the Boston housing dataset. That problem was solved easily with your help. But after fitting the data, it was giving a high overfitting problem. I tried to use the dropout layer and kernel initializer but couldn't able to solve the problem. As this algorithm is a part of my research project, what procedure should I follow to tune the hyperparameters so that this algorithm gives a generalized model for any regression dataset? As you are the only developer of this algorithm, your help is expected. Hope to get a response. Thank You.

Loss function returns 0 after first epoch for training set only when using validation data in training

Hello and thank you for releasing your code.

I'm having an issue, where if I introduce a validation set into the training process, training loss value after first epoch is replaced by 0 in the output, as well as in History object - however validation loss is being calculated correctly, and is going down, indicating that the training is progressing despite loss being calculated as 0.

Example console output:

Epoch 1/760
8/8 [==============================] - 5s 187ms/step - damping_factor: 1.0000e-10 - attempts: 1.0000 - loss: 2.8319e-06 - r_square: -130.5820 - val_loss: 2.5453e-06 - val_r_square: -106.8921

Epoch 2/760
8/8 [==============================] - 1s 112ms/step - damping_factor: 1.0000e-10 - attempts: 1.0000 - loss: 0.0000e+00 - r_square: -106.5525 - val_loss: 2.0733e-06 - val_r_square: -86.5898

Epoch 3/760
8/8 [==============================] - 1s 134ms/step - damping_factor: 1.0000e-10 - attempts: 1.0000 - loss: 0.0000e+00 - r_square: -86.2367 - val_loss: 1.6919e-06 - val_r_square: -71.4224

Example code

# X_train - DataFrame object
# Y_train - DataFrame object
model = Sequential()
model.add(Dense(34, input_shape=(48, ), activation='tanh'))
model.add(Dense(24, activation='linear'))
model_wrapper = lm.ModelWrapper(model)
model_wrapper.compile(
            optimizer=tf.keras.optimizers.SGD(learning_rate=0.01),
            loss=lm.MeanSquaredError(),
            metrics=RSquare())
early_stop = EarlyStopping(monitor='val_loss', patience=15)
history = model_wrapper.fit(X_train, y_train, epochs=760,
            validation_split=0.15,     # removing this parameter fixes the loss display
            callbacks=[early_stop])

Error when running the model.fit on the wrapped model "Exception encountered when calling ModelWrapper.call()"

When executing (it is the wrapped model from here

model.fit(training_dataset, epochs=5) where
training_dataset = tf.data.Dataset.from_tensor_slices((training_input, training_output.T))

I get:

RuntimeError: Exception encountered when calling ModelWrapper.call(). Could not automatically infer the output shape / dtype of 'model_wrapper' (of type ModelWrapper). Either the ModelWrapper.call()method is incorrect, or you need to implement theModelWrapper.compute_output_spec() / compute_output_shape() method. Error encountered: Only input tensors may be passed as positional arguments. The following argument value should be passed as a keyword argument: None (of type <class 'NoneType'>) Arguments received by ModelWrapper.call(): • args=('<KerasTensor shape=(None,), dtype=float32, sparse=None, name=keras_tensor_10>',) • kwargs={'mask': 'None'}

This happens even with the two examples provided in this repository.
Can you please help? Thanks

Getting error when trying to wrap a model with a tf keras Normalization layer

Hello,
Thanks for building the capability to use LM algorithm with Tensorflow.
When I am trying to use my own dataset to build a shallow neural net using Tensorflow 2.9.0 for which I need to normalize the features using tf.keras.layers.Normalization(axis=-1), I end up getting an error when trying to wrap the model as instructed for using the LM algorithm. The error description is as follows:
All 'axis' values to be kept must have known shape. Got axis: (-1,), input shape: [None, None], with unknown axis at index: 1

Many thanks for all your help in advance.

Best wishes,
Tanuj