Giter Site home page Giter Site logo

gemslab / glimpse-personalized-kgsummarization Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 32 KB

Personalized knowledge graph summarization based on historical queries

Home Page: https://gemslab.github.io/papers/safavi-2019-glimpse.pdf

Python 100.00%
knowledge-graph summarization-algorithm graph-summarization dpbedia yago freebase personalized-search personalized-summarization submodularity

glimpse-personalized-kgsummarization's Introduction

GLIMPSE: Personalized Knowledge Graph Summarization

This is a reference implementation for our IEEE ICDM 2019 paper:

Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. Tara Safavi, Caleb Belth, Lukas Faber, Davide Mottin, Emmanuel Muller, Danai Koutra. IEEE International Conference on Data Mining (ICDM), 2019

Link: https://gemslab.github.io/papers/safavi-2019-glimpse.pdf

If you use it, please cite the following:

@inproceedings{safavi2019personalized,
  title={Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket},
  author={Safavi, Tara and Belth, Caleb and Faber, Lukas and Mottin, Davide and M{\"u}ller, Emmanuel and Koutra, Danai},
  booktitle={2019 IEEE International Conference on Data Mining (ICDM)},
  pages={528--537},
  year={2019},
  organization={IEEE}
}

Requirements

  • Python 3.4 or above
  • numpy
  • scipy
  • pandas

Data

In our experiments we used the following datasets:

  • DBPedia 3.5.1, specifically the "Ontology Infobox Properties" file, which is called mappingbased_properties_en.nt.
  • YAGO 3, specifically the yagoFacts.tsv, yagoLiteralFacts.tsv, and yagoDateFacts.tsv files.
  • Freebase, specifically the latest GZ file from the Freebase data dump. In our paper we used a parsed, cleaned version of the raw dump using the triple shrinking scripts from FreebaseTools.

Note that the code to read in each knowledge graph expects .gz files, so you should gzip the raw data dumps as necessary.

In lines 13-15 of base.py, change the paths to each dataset to your local data directories. Each subclass of KnowledgeGraph also has several keyword arguments, which you may need to change according to your directory structure and file naming conventions:

  • rdf_gz: The filename of the data dump in gzip format.
  • query_dir: The subdirectory where generated queries are saved and retrieved in json format (see below).
  • by_topic: The subdirectory that stores files listing queries by topic (see below).
  • by_mid: The subdirectory that stores files listing queries by topic entity MID (see below).

Here's an example of how queries might be stored according to this subdirectory structure:

<kg_data_dir>/
  <by_topic>/
      art.list
      music.list
      geography.list
  <by_mid>/
      m934sk.list
      g104n1.list
      m10394.list
  <query_dir>/
      q1.json
      q2.json
      q3.json
      q4.json
      q5.json

Now, assuming that queries q1 and q3 are about "art", the art.list file should look like this:

q1
q3

Similarly, assuming that queries q1, q4, and q5 have topic entity MID m934sk, the m934sk.list file should look like this:

q1
q4
q5

In essence, each of the .list files points to queries that fall under its topic/topic entity.

Command-line arguments

usage: main.py [-h] [--kg {YAGO,Freebase,DBPedia}] [--n-queries N_QUERIES]
               [--n-topic-mids N_TOPIC_MIDS] [--n-topics N_TOPICS]
               [--n-mids-per-topic N_MIDS_PER_TOPIC] [--n_users N_USERS]
               [--test-size TEST_SIZE] [--percent-triples PERCENT_TRIPLES]
               [--random-query-prob RANDOM_QUERY_PROB] [--shuffle]
               [--method {glimpse,glimpse-2} [{glimpse,glimpse-2} ...]]

optional arguments:
  -h, --help            show this help message and exit
  --kg {YAGO,Freebase,DBPedia}
                        KG to summarize
  --n-queries N_QUERIES
                        Number of queries to simulate per user. Default is
                        200.
  --n-topic-mids N_TOPIC_MIDS
                        Number of topic mids of interest per user. Default is
                        50.
  --n-topics N_TOPICS   Number of topics to simulate per user log. For
                        Freebase only. Default is 3.
  --n-mids-per-topic N_MIDS_PER_TOPIC
                        Number of unique MIDs per topic. For Freebase only.
                        Default is 20.
  --n_users N_USERS     Number of users to simulate. Default is 5.
  --test-size TEST_SIZE
                        Percentage of queries per user to hold out for
                        testing, in [0, 1]. Default is 0.5.
  --percent-triples PERCENT_TRIPLES
                        Ratio of number of triples of KG to use as K (summary
                        constraint). Default is 0.001.
  --random-query-prob RANDOM_QUERY_PROB
                        Probability of users asking random queries rather than
                        topic-specific ones. Default is 0.1.
  --shuffle             Set this flag to true to shuffle all generated logs.
                        Default False.
  --method {glimpse,glimpse-2} [{glimpse,glimpse-2} ...]
                        Summarization methods to call. Default is [glimpse].

glimpse-personalized-kgsummarization's People

Contributors

danai112358 avatar tsafavi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.