Giter Site home page Giter Site logo

d3neinternaldiv's Introduction

D3NE(Domain Dependency Decomposition Network Embedding) method (Internally dividing model)


Topic
  • D3NE method with internally dividing version.
  • This is the implimentation of the paper "Geospatial Area Embedding Based on the Movement Purpose Hypothesis Using Large-Scale Mobility Data from Smart Card."
Introduction
  • This method uses three network graphs.

  • First, the word-word cooccurence network. This is made from cooccurence probability in all text windows.

  • Second, the document-word cooccurence network. This is made from word occurence probability in each document.

  • Third, the label-word coocurence netwrok. This is made from word distribution probability in each label.

  • This method suppose that the word-word network is generated from the word-label and word-document networks.

  • This method can embed each node to a vector space with some dimensions.

  • Nodes are words, documents, or labels, and they are embedded in the same vector space.

  • This method is categolized to the semi-supervised learning style, so it is not necessary to be labeled for all words. If a part of words are labeled, then this method can be embedded correctly.

  • I wrote this code based on the LINE code. LINE

Directory structure
  • dataminimal/: Just sample data for test run.
  • src/: The source codes and the shell script for test compile and run.
  • tools/: Visualization codes (TSNE method) or other toys.
How to compile and run the sample data
  • This code uses CMake
cd src/
./run.sh
  • The make command generates the compiled file train_pte, this is main file to run the D3NE method.
Features
  • This code use C++11 functions.
  • Speeding up sampling using the Alias table method
  • Using ASGD(Asyncronized Stochastic Gradient Descent) method based on the HogWild! paper.
train_pte options
  • --train_ww=./data/st-st.csv : specify the word-word network file
  • --train_wd=./data/st-corp.csv: specify the word-document network file
  • --train_wl=./data/st-imp.csv:specify the word-label network file
  • --output=vec_2nd_wo_norm.txt:specif the output vector file
  • --binary=0:binary format output (don't use it because of not debugging now)
  • --size=128:specify the vector dimension size.
  • --order=2: specify 1 or 2, this means specifying the first or second proximity, detail information is written in the original paper.
  • --negative=5:specify the iteration number of negative sampling for each sampling.
  • --samples=30:specify the total number of sampling. (1 means 1*10^6 samplings)
  • --threads=30:specify the number of using CPUs.
  • --rho=0.05:specify the initial learning rate. (this rate reduce gradually while running)

./train_d3ne --help shows the detail option information

Output files
  • vec_2nd_wo_norm.txt_all: This is the internally dividing model file, so each node vector values are the sum of *wd file and *wl file values.
  • vec_2nd_wo_norm.txt_wd: This is the word-document vector file.
  • vec_2nd_wo_norm.txt_wl: This is the word-label vector file.
Output file format
  • First row: 'total number of vertex' 'vectors dimension size'
  • After second row: 'the name of vertex' each dimension value with learned vector
  • All characters are separated with 'space'.
Developmental status
  • First version was released(03/2016)
  • In this version, I implemented only the Joint Training style.
    • Seaquentially update all three graphs in each learning loop.

d3neinternaldiv's People

Contributors

m-ochi avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.