I came up with this implementation of gradients( I plan to create an autograd function for this function).
I am not sure how much this is correct since the gradient of C1 should have the same dimensions as C1 which isn't the case. @gpeyre it would be a great help if you can point out to some equations calculating the gradients so that this loss function can be used.