cs-project-ml

Several distance-based learning algorithms, including our study topic TransD.

Motivation

Pre-specified features often restrict performance of various algorithms. Distance-based features thus provide an alternative to perform learning, especially in the situation where similarity relation is easier to get or analyze, such as computer vision, bioinformatics natural language processing etc.

Goal

Transform data into a “neat” distribution, by pulling or pushing each pair of points.
Use simple distance-based algorithm to get the final prediction!

TransD

Semi-supervised: Train unlabeled data with labeled data.

for each round:
  determine label for unlabeled data
  for each pair of data:
    if(pass the conditions):
      adjust their distance
    if(data is neat enough): end

round: maximum of 20 rounds

conditions:

𝑐𝑖, 𝑐𝑗 are calculated in the Bayesian KNN.
If random 𝑟 >= 𝜉𝑖𝑗 , transform to new distance. else keep it.

neat enough:

Consensus of 1-nn and 1-mi algorithm.

adjust:

Bayesian KNN

We have k hypothesis : 1-NN, 2-NN, …, K-NN.

Linear Transform Approximation

Our model becomes a single linear transform matrix 𝑇!

Improve and Experiments

Other transformation approximation:

Use feature space extension method. Result for quadratic transformation:
Linear

Quadratic

Some significant good, some significant bad. We can treat different transformation as learning parameter, tune for a specific dataset.

Clustering Preprocessing:

Improving accuracy:
Result: We can increase accuracy in some dataset using clustering preprocessing, however, the overhead time isn’t worthy.
Compressing data:
Result: Compress unlabeled data into 1/5 or even 1/10 with same accuracy (No significant bad), thus saving a lot of time performing TransD.

Randomness Adjustment:

Randomly return class based on the weight. The randomness will decrease after every iteration.
Result: not significant, need more experiments!

Further Issue

We need more experiments on big data.
Further improve time and space complexity.
Implement the algorithm on CUDA (run on GPU).
Other ways to compress data—Fewer data but higher dimension?

Reference

Yuh-Jyh Hu, Min-Che Yu, Hsiang-An Wang, and Zih-Yun Ting, “A Similarity-Based Learning Algorithm Using Distance Transformation,” IEEE TKDE., vol. 27, no. 6, pp. June. 2015.

songronglee / cs-project-ml Goto Github PK

cs-project-ml's Introduction

cs-project-ml

Motivation

Goal

TransD

Bayesian KNN

Linear Transform Approximation

Improve and Experiments

Other transformation approximation:

Clustering Preprocessing:

Randomness Adjustment:

Further Issue

Reference

cs-project-ml's People

Contributors

Stargazers

Watchers

Forkers

cs-project-ml's Issues

Recommend Projects

Recommend Topics

Recommend Org