For certain graphs, the ordering of the output embeddings may not be consistent with the node labels. It seems that this can be fixed through the addition of the following near the bottom of the transform_and_save_embedding
function:
self.real_and_imaginary.index = self.index
self.real_and_imaginary.index = self.real_and_imaginary.index.astype(self.settings.index_type)
self.real_and_imaginary = self.real_and_imaginary.sort_index()
where self.settings.index_type' is set by the user to a simple python type, eg
int,
string, or
float, etc. and
self.index=G.nodes()is set in the
__init__` function of the class.
Additionally adding the index column to the csv file generated would allow for embeddings to be linked to named, rather than numbered, nodes.
This would be done through changing the bottom line in the transform_and_save_embedding_function
to the following: self.real_and_imaginary.to_csv(self.settings.output)
Note, this was tested using a slightly modified read_graph() function as the csv version resulted in a very different graph to the nx.read_edgelist
version:
def read_graph(path):
graph = nx.read_edgelist(path)
return graph
Also would there be any plans to move towards the nx.read_edgelist
implementation as it seems more elegant of a solution?
I don't currently have the time to create a pull request with a fully tested solution to these issues, but the changes suggested should solve most of it (except the user-defined index type).