Giter Site home page Giter Site logo

graph-machine-learning's Introduction

Graph Machine Learning

This is the code repository for Graph Machine Learning, published by Packt.

Take graph data to the next level by applying machine learning techniques and algorithms

What is this book about?

Graph Machine Learning provides a new set of tools for processing network data and leveraging the power of the relation between entities that can be used for predictive, modeling, and analytics tasks.

This book covers the following exciting features: <First 5 What you'll learn points>

  • Write Python scripts to extract features from graphs
  • Distinguish between the main graph representation learning techniques
  • Become well-versed with extracting data from social networks, financial transaction systems, and more
  • Implement the main unsupervised and supervised graph embedding techniques
  • Get to grips with shallow embedding methods, graph neural networks, graph regularization methods, and more

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Errata

Page 16

The expression nt.to.numpy.matrix(G) should be nx.to.numpy.matrix(G)

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

from stellargraph.mapper import HinSAGENodeGenerator
batch_size = 50
num_samples = [10, 5]
generator = HinSAGENodeGenerator(
 subgraph, batch_size, num_samples,
 head_node_type="document"
)

Following is what you need for this book: This book is for data analysts, graph developers, graph analysts, and graph professionals who want to leverage the information embedded in the connections and relations between data points to boost their analysis and model performance. The book will also be useful for data scientists and machine learning developers who want to build ML-driven graph databases. A beginner-level understanding of graph databases and graph data is required. Intermediate-level working knowledge of Python programming and machine learning is also expected to make the most out of this book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

Chapter Software required OS required
1 -10 Python Windows, Mac OS X, and Linux (Any)
1 -10 Neo4j Windows, Mac OS X, and Linux (Any)
1 -10 Gephi Windows, Mac OS X, and Linux (Any)
1 -10 Google colab or Jupyter notebook Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Authors

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he has developed a solid background in artificial intelligence, graph theory, and machine learning, with a focus on the biomedical field. He is currently a senior data scientist in CGnal, a consulting firm fully committed to helping its top-tier clients implement data-driven strategies and build AI-powered solutions to promote efficiency and support new business models.

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher at the University of Calabria and collaborates with several international institutions.

Enrico Deusebio is currently the chief operating officer at CGnal, a consulting firm that helps its top-tier clients implement data-driven strategies and build AI-powered solutions. He has been working with data and large-scale simulations using high-performance facilities and large-scale computing centers for over 10 years, both in an academic and industrial context. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc. degrees in aerospace engineering from Politecnico di Torino.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781800204492

graph-machine-learning's People

Contributors

amarzullo24 avatar deusebio avatar packt-itservice avatar packtutkarshr avatar roshank10 avatar sonam-packt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graph-machine-learning's Issues

GEM gf not found

follow the code with

`from gem.embedding.gf import GraphFactorization

G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)

gf = GraphFactorization(d=2, data_set=None,max_iter=10000, eta=1*10**-4, regu=1.0)
gf.learn_embedding(G)`

and got reply

[Errno 13] Permission denied: 'gem/c_exe/gf' ./gf not found. Reverting to Python implementation. Please compile gf, place node2vec in the path and grant executable permission Iter id: 0, Objective: 95.0047, f1: 95.001, f2: 0.00377086

seems there's something I missed?

Codes cannot be run in Chap 4?

Dear authors,

I like your book very much and it helped me get a lot of useful insights, but I have a problem when I do the implementation.

I ran the codes of Graph Classification using GCNs in Chapter04/04_Graph_Neural_Networks.ipynb, but when I try to train the model using the command history = model.fit(
train_gen, epochs=epochs, verbose=1, validation_data=test_gen, shuffle=True,
)

I got an error:
UnimplementedError: Cast string to float is not supported
[[node binary_crossentropy/Cast (defined at :2) ]] [Op:__inference_train_function_2594]

Function call stack:
train_function

It seems like the model used a string, but I'm not very good at python and I'm not able to fix the bug, could you please help me with this? Thank you very much!

Chapter 3 numpy issues with gensim

I had an issue with numpy and gensim for chapter 3 (first chapter that I ran examples). This was the error specifically:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject "gensim"

My solution was to use a conda environment. I set up an environment.yml file as below

name: packt_graphml_env
channels:
   - conda-forge
   - defaults
dependencies:
   - python==3.8.*
   - pip
   - ipykernel
   - pandas
   - gensim==3.8.3
   - numpy==1.19.5
   - networkx==2.5.*
   - matplotlib==3.2.*
   - node2vec==0.3.*
   - karateclub==1.0.*
   - scipy==1.6.*
   - tensorflow==2.4.1
   - scikit-learn==0.24.*
   - stellargraph::stellargraph
   - pip:
     #- "--editable=git+https://github.com/stellargraph/stellargraph.git#egg=stellargraph"
     - "--editable=git+https://github.com/palash1992/GEM.git#egg=GEM"

and install/update with:
mamba env update -f environment.yml

Of course, replace mamba with conda if you don't have mamba. And I added ipykernel for a Jupyter kernel, which was already installed in my base conda environment.

Hope this helps someone.

NLTK corpora reuters is not loaded even after download

Hello,
I am trying to execute examples from Graph-Machine-Learning/Chapter07/01_nlp_graph_creation.ipynb in Google Colab.

At line number#5 corpus = pd.DataFrame([..]) I am getting error as :

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/nltk/corpus/util.py in __load(self)
     79             except LookupError as e:
---> 80                 try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
     81                 except LookupError: raise e

5 frames
LookupError: 
**********************************************************************
  Resource reuters not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('reuters')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************


During handling of the above exception, another exception occurred:

LookupError                               Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)
    671     sep = '*' * 70
    672     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 673     raise LookupError(resource_not_found)
    674 
    675 

LookupError: 
**********************************************************************
  Resource reuters not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('reuters')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************

Even after following instructions to nltk.download('reuters') I am still getting the same error. Reuters is download in /root/

~/nltk_data/corpora# ls
reuters.zip

Could you please help me?

Thanks,

Chapter 3 Source Code Issue

I tried to run the code "01_Shallow_Embeddings.ipynb" unfortunately when I try to run this excerpt :
from gem.embedding.gf import GraphFactorization

G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)

gf = GraphFactorization(d=2, data_set=None,max_iter=10000, eta=1*10**-4, regu=1.0)
gf.learn_embedding(G)

I have the following error :
./gf not found. Reverting to Python implementation. Please compile gf, place node2vec in the path and grant executable permission
Iter id: 0, Objective: 95.0097, f1: 95.0035, f2: 0.00623775

The error is located specifically with this instruction: gf.learn_embedding(G)

gf (GraphFactorization) not found

Hello,

I am using conda. I created a new environment called "new_env" with python=3.7.16 (I had to use an older version of Python because stellargraph was having difficulties with python>3.8) by:

conda create -n new_env python=3.7.16
conda activate new_env

then in the terminal where the new_env being active, I typed:

pip install -r requirements.txt

where the requirements.txt file include the following (from Chapter-3 in the book):

Jupyter==1.0.0
networkx==2.5
matplotlib==3.2.2
karateclub==1.0.19
node2vec==0.3.3
tensorflow==2.4.0
scikit-learn==0.24.0
git+https://github.com/palash1992/GEM.git
git+https://github.com/stellargraph/stellargraph.git

I have two questions.
Question-1: When I run the "01_Shallow_Embeddings.ipynb" notebook given in Chapter-03, for the cell given below, I get the following error in GraphFactorization section:

from gem.embedding.gf import GraphFactorization
G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)
gf = GraphFactorization(d=2,  data_set=None,max_iter=10000, eta=1*10**-4, regu=1.0)
gf.learn_embedding(G)

Error: ./gf not found. Reverting to Python implementation. Please compile gf, place node2vec in the path and grant executable permission.

Do I go to site_packages in the active conda environment and compile/run a setup.py file in GEM library? Or do I need to clone GEM library (as well as stellar) separate;y rather than putting them in the requirements.txt file, though I thought it is the same procedure.

Question-2: I have also an error when I run the DeepWalk example in the same notebook,
The code:

import networkx as nx
from karateclub.node_embedding.neighbourhood.deepwalk import DeepWalk
G = nx.barbell_graph(m1=10, m2=4)
draw_graph(G)
dw = DeepWalk(dimensions=2)
dw.fit(G)

The error I get:

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_22368\195704241.py in <module>
      6 
      7 dw = DeepWalk(dimensions=2)
----> 8 dw.fit(G)

~\anaconda3\envs\gml_book_ch3\lib\site-packages\karateclub\node_embedding\neighbourhood\deepwalk.py in fit(self, graph)
     57                          min_count=self.min_count,
     58                          workers=self.workers,
---> 59                          seed=self.seed)
     60 
     61         num_of_nodes = graph.number_of_nodes()

TypeError: __init__() got an unexpected keyword argument 'iter'

Did anyone have similar issues or know the solution?

Thanks!

Chapter 2 - embeddings

import networkx as nx
from node2vec import Node2Vec

G = nx.barbell_graph(m1=7, m2=4)
draw_graph(G, nx.spring_layout(G))

node2vec = Node2Vec(G, dimensions=2)
model = node2vec.fit(window=10)

Computing transition probabilities: 100%
18/18 [00:00<00:00, 261.56it/s]
Generating walks (CPU: 1): 100%|██████████| 10/10 [00:00<00:00, 16.48it/s]

TypeError Traceback (most recent call last)
in ()
6
7 node2vec = Node2Vec(G, dimensions=2)
----> 8 model = node2vec.fit(window=10)

/usr/local/lib/python3.7/dist-packages/node2vec/node2vec.py in fit(self, **skip_gram_params)
186 skip_gram_params['sg'] = 1
187
--> 188 return gensim.models.Word2Vec(self.walks, **skip_gram_params)

TypeError: init() got an unexpected keyword argument 'size'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.