keiserlab / keras-neural-graph-fingerprint Goto Github PK

Keras implementation of Neural Graph Fingerprints as proposed by Duvenaud et al., 2015

License: MIT License

Python 100.00%

tensor molecule fingerprint atom graph graph-algorithms keras neural-networks gpu differentiable-models morgan-fingerprints

keras-neural-graph-fingerprint's Introduction

Keras Neural Graph Fingerprint

This repository is an implementation of Convolutional Networks on Graphs for Learning Molecular Fingerprints in Keras.

It includes a preprocessing function to convert molecules in smiles representation into molecule tensors.

Next to this, it includes two custom layers for Neural Graphs in Keras, allowing flexible Keras fingerprint models. See examples.py for an examples

Related work

There are several implementations of this paper publicly available:

by HIPS using autograd
by debbiemarkslab using theano
by [GUR9000] 3 using keras
by ericmjl using autograd
by DeepChem using tensorflow

The closest implementation is the implementation by GUR9000 in Keras. However this repository represents moleculs in a fundamentally different way. The consequences are described in the sections below.

Molecule Representation

Atom, bond and edge tensors

This codebase uses tensor matrices to represent molecules. Each molecule is described by a combination of the following three tensors:

atom matrix, size: (max_atoms, num_atom_features) This matrix defines the atom features.

Each column in the atom matrix represents the feature vector for the atom at the index of that column.
edge matrix, size: (max_atoms, max_degree) This matrix defines the connectivity between atoms.

Each column in the edge matrix represent the neighbours of an atom. The neighbours are encoded by an integer representing the index of their feature vector in the atom matrix.

As atoms can have a variable number of neighbours, not all rows will have a neighbour index defined. These entries are filled with the masking value of -1. (This explicit edge matrix masking value is important for the layers to work)
bond tensor size: (max_atoms, max_degree, num_bond_features) This matrix defines the atom features.

The first two dimensions of this tensor represent the bonds defined in the edge tensor. The column in the bond tensor at the position of the bond index in the edge tensor defines the features of that bond.

Bonds that are unused are masked with 0 vectors.

Batch representations

This codes deals with molecules in batches. An extra dimension is added to all of the three tensors at the first index. Their respective sizes become:

atom matrix, size: (num_molecules, max_atoms, num_atom_features)
edge matrix, size: (num_molecules, max_atoms, max_degree)
bond tensor size: (num_molecules, max_atoms, max_degree, num_bond_features)

As molecules have different numbers of atoms, max_atoms needs to be defined for the entire dataset. Unused atom columns are masked by 0 vectors.

Strong and weak points

The obvious downside of this representation is that there is a lot of masking, resulting in a waste of computation power.

The alternative is to represent the entire dataset as a bag of atoms as in the authors original implementation. For larger datasets, this is infeasable. In [GUR9000's implementation] (https://github.com/GUR9000/KerasNeuralFingerprint) the same approach is used, but each batch is pre-calculated as a bag of atoms. The downside of this is that each epoch uses the exact same composition of batches, decreasing the stochasticity. Furthermore, Keras recognises the variability in batch- size and will not run. In his implementation GUR9000 included a modified version of Keras to correct for this.

The tensor representation used in this repository does not have these downsides, and allows for many modificiations of Duvenauds algorithm (there is a lot to explore).

Their representation may be optimised for the regular algorithm, but at a first glance, the tensor implementation seems to perform reasonably fast (check out the examples).

NeuralGraph layers

The two workhorses are defined in NGF/layers.py.

NeuralGraphHidden takes a set of molecules (represented by [atoms, bonds, edges]), and returns the convolved feature vectors of the higher layers. Only the feature vectors change at each iteration, so for higher layers only the atom tensor needs to be replaced by the convolved output of the previous NeuralGraphHidden.

NeuralGraphOutput takes a set of molecules (represented by [atoms, bonds, edges]), and returns the fingerprint output for that layer. According to the original paper, the fingerprints of all layers need to be summed. But these are neural nets, so feel free to play around with the architectures!

Initialisation

The NeuralGraph layers have an internal (Dense) layer of the output size (conv_width for NeuralGraphHidden or fp_length for NeuralGraphOutput). This inner layer accounts for the trainable parameters, activation function, etc.

There are three ways to initialise the inner layer and it's parameters:

Using an integer conv_width and possible kwags (Dense layer is used)

atoms1 = NeuralGraphHidden(conv_width, activation='relu', bias=False)([atoms0, bonds, edges])

Using an initialised Dense layer

atoms1 = NeuralGraphHidden(Dense(conv_width, activation='relu', bias=False))([atoms0, bonds, edges])

Using a function that returns an initialised Dense layer

atoms1 = NeuralGraphHidden(lambda: Dense(conv_width, activation='relu', bias=False))([atoms0, bonds, edges])

In the case of NeuralGraphOutput, all these three methods would be identical. For NeuralGraphHidden, these methods are equal, but can be slightly different. The reason is that a NeuralGraphHidden has a dense layer for each degree.

The following will not work for NeuralGraphHidden:

atoms1 = NeuralGraphHidden(conv_width, activation='relu', bias=False, W_regularizer=l2(0.01))([atoms0, bonds, edges])

The reason is that the same l2 object will be passed to each internal layer, wheras an l2 object can obly be assigned to one layer.

Method 2. will work, because a new layer is instanciated based on the configuration of the passed layer.

Method 3. will work if a function is provided that returns a new l2 object each time it is called (as would be the case for the given lambda function).

NeuralGraph models

For convienience, two builder functions are included that can build a variety of Neural Graph models by specifiying it's parameters. See NGF/models.py.

The examples in examples.py should help you along the way. NGF You can store and load the trained models. Make sure to specify the custom classes:

model = load_model('model.h5', custom_objects={'NeuralGraphHidden':NeuralGraphHidden, 'NeuralGraphOutput':NeuralGraphOutput})

Dependencies

RDKit This dependency is nescecairy to convert molecules into tensor representatins, once this step is conducted, the new data can be stored, and RDkit is no longer a dependency.
Keras Requires keras 1.x for building, training and evaluating the models.
NumPy

Acknowledgements

Implementation is based on Duvenaud et al., 2015.
Feature extraction scripts were copied from the original implementation
Data preprocessing scripts were copied from GRU2000
The usage of the Keras functional API was inspired by GRU2000
Graphpool layer adopted from Han, et al., 2016

keras-neural-graph-fingerprint's People

Contributors

Stargazers

Watchers

keras-neural-graph-fingerprint's Issues

Implement Layers as layer wrapper

Both NGF_Layers are essentially layer wrappers. It could be more natural to implement them as such, especially considering the way passing arguments to the dense layers is handled.

However, I don't think layer wrappers support multiple inner layers (as is the case for NeuralGraphHidden so for that layer it wouldn't work.

For now no need to rewrite the implementation, but this might be useful later on.

Bond representation

In the neural graph fingerprint paper, and implementation, the summed features for a node are a concatenation of the summed atom features of the neighbours, and the summed bond features of the respective bonds.

In Duvenauds implementation, the summed bond features are stored in a separate matrix, in order to speed up computation.

This is memory efficient, but from the graph perspective, inadequate.

There are only a few bond types, and I want to experiment with bond-type-dependent weights (rather than degree-dependent, as is currently suggested). In order to play around this this, the NeuralGraphLayers need to be able to access the bond-type information per bond (and the summed information is no longer sufficient).

There are two obvious ways to represent bond-type infomation in the current framework:

Bond-type neighbour matrix There are only a few discrete bond types, so a possible solution would be an atom x atom matrix for each molecule, with integer values that represent the bond type (0 for no bond). A downside of this is that it will be harder to extend the bond features (e.g. include distance).
Bond-feature neighbour tensor An extension of the bond-type neighbour matrix could be a atom x atom x feature tensor, that stores the bond features for each bond.

A downside of both approaches is that it is subobtimal for the regular Duvenaud algorihtm. For each hidden layer and batch, the neighbouring atom features will have to be summed. This computation is exactely the same for each layer. On the other side, perhaps theano otimisation will figure this out (?).

Both representations will also require more memory, especially the Bond-feature neighbour tensor. In these graphs the edges are undirected, so the bond feature information will take up two times as much space as required.
I will have to run benchmarks to see what is acceptable.

Implement sparse iterator

Benchmarking on bigger datasets shows that the 0-padding does have a big impact on performance.

Histograms of #bonds and #atoms per molecule confirm that max-padding can create a lot of overhead:

The overhead occurs at three levels:

data-preprocessing
data-storage and loading
in the network

Because preprocessing only has to be done once, I won't focus on its performance.

Sparse matrices will definitely speed up the data-storage and loading process.

The 3rd level (training the network) is the most important as it is accounts for the most runtime. Because of the many matrix operations that take place at the GPU, the tensors should be in full format when on the GPU.

However, sparse matrices can still lead to increased performance. The goal is to pad the tensors per batch. So for each batch and each dimension, the molecule with the max at that axis will determine the size of that dimension for that batch.

Given the histograms, this will already results in significant speedup. An even bigger increase can be achieved when the tensors are sorted on their length dimension, and grouped in batches of equal size (with the possible risk of losing stochasticity).

Luckily, keras.model.Model's provide _generator versions of the fit, train and predict functions that will take a generator and allow for different batch-sizes.

I may have to adjust the layers slightly as well, to support various #atoms.

As for the implementation, unfortunately scipy sparse matrices only support 2D matrices.

This means that I will have to implement my own class. Because of the listed prioritisation, I will focus on quick retrieval, and possibility to sort on the length of the dimension. I will do so, even if at the cost of slower preprocessing.

Add support for multiple internal layers.

Currently the NeuralGraphHidden and NeuralGraphOutput rely on an internal layer that can be specified by providing the class dense_layer_type at initialisation.

Internally, a TimeDistributed() is wrapped around this class. In order to apply batch_normalisation or dropout, it needs to be possible to provide more than one layer.

Dropout, for example will need to be called just before the Dense and redirect the input one to one.

The thus needs to be support for multiple layers, like dropout or batchnorm.

The easiest solution I can see now is providing a list of layers (initialised, but not build) that get called subsequently. An adavantage is this that the NeuralGraph-layers do not have to deal with the kwargs of the internal layers.

For the NeuralGraphOutput this would definitely work, but for the NeuralGraphHidden we would somehow have to copy the layers for each timestep (as they are already initialised).

Use SparseTensors in preprocessing

The SparseTensors implemented in NGF/sparse.py are very well suited for the molecule data structures/distributions used in these repo. An advantage of the SparseTensors tensors is that they do not require padding.

It would be easy to implement an append or an extend method to merge the tensors. This would significantly speed up the preprocessing for larger datasets, and obsolete the padaxis and concat_mol_tensors functions.

Improve neighbour lookup performance

Make sure NGF.layers.batch_lookup works faster and also works in TensorFlow by using reshape as in
the TimeDistributed layer

Make compatible with tensorflow backend

The only theano dependency is due to the
modification of temporal_padding in NGF/layers.py

See if the desired modification is supported in keras 2 (after fixing #10) and otherwise implement it in Tensorflow in parallel.

Make compatible with keras 2

As @8li pointed out, the current code is incompatible with keras 2 due to the layer_from_config in NGF/layers.py.

Investigate what other issues arise when using keras 2 and see if we can fix them in a backwards-compatible manner.

Make compatible with Keras 2

There are quite a few issues when using your code with Keras 2. For example, the customised Keras layers does not work and gives the following issue:
File "C:\Users\User\Desktop\Python\SNN_cont_only\own_package\NGF\layers.py", line 397, in init
super(NeuralGraphOutput, self).init(**kwargs)
File "C:\Program Files (x86)\Python36-64\lib\site-packages\keras\engine\base_layer.py", line 128, in init
raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'activation')