packtpublishing / graph-machine-learning Goto Github PK

View Code? Open in Web Editor NEW

249.0 12.0 139.0 2.51 MB

Graph Machine Learning, published by Packt

License: MIT License

Jupyter Notebook 99.60% Python 0.40%

graph-machine-learning's Introduction

Graph Machine Learning

This is the code repository for Graph Machine Learning, published by Packt.

Take graph data to the next level by applying machine learning techniques and algorithms

What is this book about?

Graph Machine Learning provides a new set of tools for processing network data and leveraging the power of the relation between entities that can be used for predictive, modeling, and analytics tasks.

This book covers the following exciting features: <First 5 What you'll learn points>

Write Python scripts to extract features from graphs
Distinguish between the main graph representation learning techniques
Become well-versed with extracting data from social networks, financial transaction systems, and more
Implement the main unsupervised and supervised graph embedding techniques
Get to grips with shallow embedding methods, graph neural networks, graph regularization methods, and more

If you feel this book is for you, get your copy today!

Errata

Page 16

The expression nt.to.numpy.matrix(G) should be nx.to.numpy.matrix(G)

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

from stellargraph.mapper import HinSAGENodeGenerator
batch_size = 50
num_samples = [10, 5]
generator = HinSAGENodeGenerator(
 subgraph, batch_size, num_samples,
 head_node_type="document"
)

Following is what you need for this book: This book is for data analysts, graph developers, graph analysts, and graph professionals who want to leverage the information embedded in the connections and relations between data points to boost their analysis and model performance. The book will also be useful for data scientists and machine learning developers who want to build ML-driven graph databases. A beginner-level understanding of graph databases and graph data is required. Intermediate-level working knowledge of Python programming and machine learning is also expected to make the most out of this book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

Chapter	Software required	OS required
1 -10	Python	Windows, Mac OS X, and Linux (Any)
1 -10	Neo4j	Windows, Mac OS X, and Linux (Any)
1 -10	Gephi	Windows, Mac OS X, and Linux (Any)
1 -10	Google colab or Jupyter notebook	Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Get to Know the Authors

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he has developed a solid background in artificial intelligence, graph theory, and machine learning, with a focus on the biomedical field. He is currently a senior data scientist in CGnal, a consulting firm fully committed to helping its top-tier clients implement data-driven strategies and build AI-powered solutions to promote efficiency and support new business models.

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher at the University of Calabria and collaborates with several international institutions.

Enrico Deusebio is currently the chief operating officer at CGnal, a consulting firm that helps its top-tier clients implement data-driven strategies and build AI-powered solutions. He has been working with data and large-scale simulations using high-performance facilities and large-scale computing centers for over 10 years, both in an academic and industrial context. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc. degrees in aerospace engineering from Politecnico di Torino.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781800204492

graph-machine-learning's People

Contributors

Stargazers

Watchers

Forkers

standardgalactic habibmrad aimagine deusebio mikechen66 nlebang rimanb texervn nguyetvo beoy laplacekorea myausweis nguyendo24 vitasiku ifftsolutions qianyaoyy stmnk arifmudi animadurkar lexmuga tdebono sum-coderepo sruthi5797 tiagoooliveira shravankumar147 audepertron haisudarshan robinmarshall55 qianrenjian data-mining timeitself alirezabayatmk codegass buxiangqimingzi2021 nikita-surya yanyipu shreyash1811 amimul statsgary dejavu3750 ssakmt ryzenme shashisingh coerick anggijaya zlatnizmaj whoiscnu hasan-moni-321 scottlittle nnn666nnn wapiti08 souravbose1991 phoitack doseikwatia yeniherdiyeni lechuzo32 hli8nova jamie613 hzionn gmchattman aditya964 mc-o zj15001 jear brunopaiveira ssgantayat chiheb-edine-zoghlemi shiningdata sdi1982 jfzo shivani-srivastava guitaristjimmy muluken2 jerry185 amihua ipocan kyeboah saikmar-1729 m-gal-study hadeer-sma fernandocarazomelo koushikam rumblevi furmanlukasz rabiul-ai aungzarlin1 thomaschangsf ds-books-complete-solutions-materials savy2017 andrea-yao ytiam paaaron dange-academic oesmanhk surajrepo mvkkiran dhananjaychaudhari26 duyamin greede14 srimugunthan

graph-machine-learning's Issues

GEM gf not found

follow the code with

`from gem.embedding.gf import GraphFactorization

G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)

gf = GraphFactorization(d=2, data_set=None,max_iter=10000, eta=1*10**-4, regu=1.0)
gf.learn_embedding(G)`

and got reply

[Errno 13] Permission denied: 'gem/c_exe/gf' ./gf not found. Reverting to Python implementation. Please compile gf, place node2vec in the path and grant executable permission Iter id: 0, Objective: 95.0047, f1: 95.001, f2: 0.00377086

seems there's something I missed?

Codes cannot be run in Chap 4?

Dear authors,

I like your book very much and it helped me get a lot of useful insights, but I have a problem when I do the implementation.

I ran the codes of Graph Classification using GCNs in Chapter04/04_Graph_Neural_Networks.ipynb, but when I try to train the model using the command history = model.fit(
train_gen, epochs=epochs, verbose=1, validation_data=test_gen, shuffle=True,
)

I got an error:
UnimplementedError: Cast string to float is not supported
[[node binary_crossentropy/Cast (defined at :2) ]] [Op:__inference_train_function_2594]

Function call stack:
train_function

It seems like the model used a string, but I'm not very good at python and I'm not able to fix the bug, could you please help me with this? Thank you very much!

Chapter 3 numpy issues with gensim

I had an issue with numpy and gensim for chapter 3 (first chapter that I ran examples). This was the error specifically:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject "gensim"

My solution was to use a conda environment. I set up an environment.yml file as below

name: packt_graphml_env
channels:
   - conda-forge
   - defaults
dependencies:
   - python==3.8.*
   - pip
   - ipykernel
   - pandas
   - gensim==3.8.3
   - numpy==1.19.5
   - networkx==2.5.*
   - matplotlib==3.2.*
   - node2vec==0.3.*
   - karateclub==1.0.*
   - scipy==1.6.*
   - tensorflow==2.4.1
   - scikit-learn==0.24.*
   - stellargraph::stellargraph
   - pip:
     #- "--editable=git+https://github.com/stellargraph/stellargraph.git#egg=stellargraph"
     - "--editable=git+https://github.com/palash1992/GEM.git#egg=GEM"

and install/update with:
mamba env update -f environment.yml

Of course, replace mamba with conda if you don't have mamba. And I added ipykernel for a Jupyter kernel, which was already installed in my base conda environment.

Hope this helps someone.

NLTK corpora reuters is not loaded even after download

Hello,
I am trying to execute examples from Graph-Machine-Learning/Chapter07/01_nlp_graph_creation.ipynb in Google Colab.

At line number#5 corpus = pd.DataFrame([..]) I am getting error as :

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/nltk/corpus/util.py in __load(self)
     79             except LookupError as e:
---> 80                 try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
     81                 except LookupError: raise e

5 frames
LookupError: 
**********************************************************************
  Resource reuters not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('reuters')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************


During handling of the above exception, another exception occurred:

LookupError                               Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)
    671     sep = '*' * 70
    672     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 673     raise LookupError(resource_not_found)
    674 
    675 

LookupError: 
**********************************************************************
  Resource reuters not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('reuters')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************

Even after following instructions to nltk.download('reuters') I am still getting the same error. Reuters is download in /root/

~/nltk_data/corpora# ls
reuters.zip

Could you please help me?

Thanks,

Chapter 3 Source Code Issue

I tried to run the code "01_Shallow_Embeddings.ipynb" unfortunately when I try to run this excerpt :
from gem.embedding.gf import GraphFactorization

G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)

gf = GraphFactorization(d=2, data_set=None,max_iter=10000, eta=1*10**-4, regu=1.0)
gf.learn_embedding(G)

I have the following error :
./gf not found. Reverting to Python implementation. Please compile gf, place node2vec in the path and grant executable permission
Iter id: 0, Objective: 95.0097, f1: 95.0035, f2: 0.00623775

The error is located specifically with this instruction: gf.learn_embedding(G)

gf (GraphFactorization) not found

Hello,

I am using conda. I created a new environment called "new_env" with python=3.7.16 (I had to use an older version of Python because stellargraph was having difficulties with python>3.8) by:

conda create -n new_env python=3.7.16
conda activate new_env

then in the terminal where the new_env being active, I typed:

pip install -r requirements.txt

where the requirements.txt file include the following (from Chapter-3 in the book):

Jupyter==1.0.0
networkx==2.5
matplotlib==3.2.2
karateclub==1.0.19
node2vec==0.3.3
tensorflow==2.4.0
scikit-learn==0.24.0
git+https://github.com/palash1992/GEM.git
git+https://github.com/stellargraph/stellargraph.git

I have two questions.
Question-1: When I run the "01_Shallow_Embeddings.ipynb" notebook given in Chapter-03, for the cell given below, I get the following error in GraphFactorization section:

from gem.embedding.gf import GraphFactorization
G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)
gf = GraphFactorization(d=2,  data_set=None,max_iter=10000, eta=1*10**-4, regu=1.0)
gf.learn_embedding(G)

Error: ./gf not found. Reverting to Python implementation. Please compile gf, place node2vec in the path and grant executable permission.

Do I go to site_packages in the active conda environment and compile/run a setup.py file in GEM library? Or do I need to clone GEM library (as well as stellar) separate;y rather than putting them in the requirements.txt file, though I thought it is the same procedure.

Question-2: I have also an error when I run the DeepWalk example in the same notebook,
The code:

import networkx as nx
from karateclub.node_embedding.neighbourhood.deepwalk import DeepWalk
G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)
dw = DeepWalk(dimensions=2)
dw.fit(G)

The error I get:

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_22368\195704241.py in <module>
      6 
      7 dw = DeepWalk(dimensions=2)
----> 8 dw.fit(G)

~\anaconda3\envs\gml_book_ch3\lib\site-packages\karateclub\node_embedding\neighbourhood\deepwalk.py in fit(self, graph)
     57                          min_count=self.min_count,
     58                          workers=self.workers,
---> 59                          seed=self.seed)
     60 
     61         num_of_nodes = graph.number_of_nodes()

TypeError: __init__() got an unexpected keyword argument 'iter'

Did anyone have similar issues or know the solution?

Thanks!

Chapter 2 - embeddings

import networkx as nx
from node2vec import Node2Vec

G = nx.barbell_graph(m1=7, m2=4)
draw_graph(G, nx.spring_layout(G))

node2vec = Node2Vec(G, dimensions=2)
model = node2vec.fit(window=10)

Computing transition probabilities: 100%
18/18 [00:00<00:00, 261.56it/s]
Generating walks (CPU: 1): 100%|██████████| 10/10 [00:00<00:00, 16.48it/s]

TypeError Traceback (most recent call last)
in ()
6
7 node2vec = Node2Vec(G, dimensions=2)
----> 8 model = node2vec.fit(window=10)

/usr/local/lib/python3.7/dist-packages/node2vec/node2vec.py in fit(self, **skip_gram_params)
186 skip_gram_params['sg'] = 1
187
--> 188 return gensim.models.Word2Vec(self.walks, **skip_gram_params)

TypeError: init() got an unexpected keyword argument 'size'