Giter Site home page Giter Site logo

kbredundancies's Introduction

KBRedundancies

The Goal of this project is to identify how many semantic redundancies are in modern KBs, and try to semantically compress the facts in the KBs.

1. Definition of "Redundancies"

This project focuses on semantic redundancies in the factual knowledge. Let be the set of facts in a knowledge base , and be a partition of , be logic entailment, be a first-order Horn rule. is redundant if , or written as . Similarly, if a set of Horn rules, denoted by , is applied on a KB, is redundant if .

The selection of the Horn rules is restricted by the size of the rules and number of counterexamples (denoted by ) that comes along with the entailment. That is, the total size, denoted by (, is the size of rule ), should be as small as possible. At least, it cannot be larger than .

2. Target KBs

The following types of KBs will be observed in this project:

  1. Community Contributed Knowledge Sources:
    • KBs in this type are constructed by contributors in certain community, such as Wikipedia, through tools that are dedicated for the community and converted to some structured form
    • Targets: FreeBase, WikiData, MusicBrainz
  2. Open-domain Extraction from Community Contributions:
    • KBs in this type are built by automatic extraction and integration of various community databases, such as WikiData, and are linked to the Linked Open Data (LOD) cloud.
    • Targets: Yago, DBpedia, CNDBpedia
  3. Information Extraction from Raw Text:
    • KBs in this type are extracted via NLP techniques from raw text, such as the online websites.
    • Targets: KnowItAll, NELL, ConceptNet
  4. Lexical Thesaurus
    • KBs in this type are constructed, by experts of by knowledge integration techniques, for managing lexical knowledge.
    • Targets: WordNet, BabelNet, UMLS
  5. (*) Logic Based
    • KBs in the type are usually represented in formal logic and are complicated. So they are optional in this project.
    • Targets: Cyc, OpenCyc

2.1 Release Versions

All KB versions should be the latest release before 2022/05/01. The following releases should also be taken into account:

  1. Yago: latest version of 1/2s/3

2.2 Language Versions

All of the target KBs should contain the English language version. The following language versions should also be taken into account:

  1. WikiData: Chinese, French, Germany, Russian
  2. WordNet: Chinese, French, Germany, Russian
  3. BabelNet: Chinese, French, Germany, Russian

3. Methodology

The target KBs are very large even if only the facts are considered. Therefore, we try certain patterns by hand first and filter out the entailed facts. If there is no more handcrafted rules that are useful for semantic compression, we try SInC, the semantic compressor, on the remaining part of a KB.

The code for SInC is here

4. Repo Organization

This repos is organized as follows:

┳ README.md
┣ .gitignore
┣ common                    [Common utilities for KB operations]
┃ ┣ NumeratedKBFormat.md   [KB format specification]
┃ ┣ numeratedkb.py         [KB Operation Tools]
┃ ┗ kbstate.py             [Script for KB statistics]
┣ 1-CommunityContributed    [Results for type-1 KBs]
┣ 2-Open-domainExtraction   [Results for type-2 KBs]
┣ 3-InformationExtraction   [Results for type-3 KBs]
┣ 4-LexicalThesaurus        [Results for type-4 KBs]
┗ 5-LogicBased              [Results for type-5 KBs]

kbredundancies's People

Contributors

sjtu-ren avatar tramswang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.