Giter Site home page Giter Site logo

table2vec-lideng's Introduction

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval.

This repository contains resources developed within the master thesis:

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval.

The programming language is Python. Table embeddings are built using TensorFlow.

Table2Vec

There are hundreds of millions of tables in web pages. These tables are much richer sources of structured knowledge than free-format text.

Table2Vec is a novel approach that employs neural language modeling to embed different table elements into semantic vector spaces, which can benefit table-related retrieval tasks.

Dataset

  • The table corpus Wikipedia Tables, which consists of 1.6M high-quality relational tables in total. The statistics is as follows:
Core column Tables in total Tables* in total
existing entities 726,913 212,923
60% entities 556,644 139,572
80% entities 483,665 119,166
100% entities 425,236 78,611
100% unique entities 376,213 53,354

Table* represents the tables that have more than 5 rows and 3 columns. Core column refers to the left most column.

  • The data/queries.txt file contains the search queries.

Table embeddings

Different table embeddings and their training parameters

Embedding Total terms Unique terms Negative samples Window size
Table2VecW 200,157,990 1,829,874 25 5
Table2VecH 7,962,443 339,433 25 20
Table2VecE 24,863,683 2,159,467 25 50
Table2VecE* 5,367,837 1,285,708 25 50

Functionality

Table2Vec currently supports three table-related tasks:

  • Table retrieval
  • Row population
  • Column population

Methods and results:

The evaluation is undertaken by trec_eval. For row population, the run files are too big to added on Github.

  1. Row population :
Methods 1 2 3 4 5
BL1 0.4360 0.4706 0.4788 0.4786 0.4711
BL2 0.2612 0.2778 0.2845 0.2846 0.2817
BL3 0.2912 0.3024 0.3028 0.2987 0.2910
Table2VecE* 0.4982 0.5522 0.5598 0.5543 0.5476
BL1 + Table2VecE* 0.5581 0.6147 0.6400 0.6524 0.6533
BL2 + Table2VecE* 0.5461 0.6027 0.6187 0.6217 0.6223
BL3 + Table2VecE* 0.5487 0.6049 0.6218 0.6249 0.6251
  1. Column population ( the runfile/CP/ folder contains the run files by various methods ):
Methods Runfile 1 2 3
Baseline baseline_R1(2,3).txt 0.2507 0.2845 0.2852
Baseline + Table2VecH combined_R1(2,3).txt 0.2551 0.3322 0.4000
  1. Table retrieval ( the runfile/TR/ folder contains the run files by various methods):
Method Runfile NDCG@5 NDCG@10 NDCG@15 NDCG@20
Baseline gt.txt 0.5527 0.5456 0.5738 0.6031
Baseline + Word2Vec w2v.txt 0.5954 0.6006 0.6315 0.6588
Baseline + Graph2Vec g2v.txt 0.5844 0.5764 0.6128 0.6340
Baseline + Table2VecW t2vW.txt 0.5974 0.6096 0.6312 0.6505
Baseline + Table2VecE t2vE.txt 0.5602 0.5569 0.5760 0.6161

table2vec-lideng's People

Contributors

ninalx avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.