Giter Site home page Giter Site logo

songronglee / cs-project-ml Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 2.0 115.49 MB

Several distance-based learning algorithms, including our study topic TransD.

C++ 26.39% CMake 0.64% C 0.63% Shell 0.07% HTML 6.12% Cuda 0.35% Fortran 3.55% JavaScript 0.02% CSS 0.01% Python 0.01% Jupyter Notebook 62.21%

cs-project-ml's Introduction

cs-project-ml

Several distance-based learning algorithms, including our study topic TransD.

Motivation

Pre-specified features often restrict performance of various algorithms. Distance-based features thus provide an alternative to perform learning, especially in the situation where similarity relation is easier to get or analyze, such as computer vision, bioinformatics natural language processing etc.

Goal

  1. Transform data into a β€œneat” distribution, by pulling or pushing each pair of points.
  2. Use simple distance-based algorithm to get the final prediction! data pulling

TransD

Semi-supervised: Train unlabeled data with labeled data.

for each round:
  determine label for unlabeled data
  for each pair of data:
    if(pass the conditions):
      adjust their distance
    if(data is neat enough): end

round: maximum of 20 rounds

conditions:

  • 𝑐𝑖, 𝑐𝑗 are calculated in the Bayesian KNN.
  • If random π‘Ÿ >= πœ‰π‘–π‘— , transform to new distance. else keep it.

neat enough:

  • Consensus of 1-nn and 1-mi algorithm.

adjust:
adjust formula

Bayesian KNN

We have k hypothesis : 1-NN, 2-NN, …, K-NN.
Bayesian

Linear Transform Approximation

Inverse

  • Our model becomes a single linear transform matrix 𝑇!

Improve and Experiments

Other transformation approximation:

Use feature space extension method. Result for quadratic transformation:
Linear
Linear Result
Quadratic
Quadratic Result
Some significant good, some significant bad. We can treat different transformation as learning parameter, tune for a specific dataset.

Clustering Preprocessing:

  • Improving accuracy:
    Result: We can increase accuracy in some dataset using clustering preprocessing, however, the overhead time isn’t worthy.
  • Compressing data:
    Result: Compress unlabeled data into 1/5 or even 1/10 with same accuracy (No significant bad), thus saving a lot of time performing TransD.

Randomness Adjustment:

Randomly return class based on the weight. The randomness will decrease after every iteration.
Result: not significant, need more experiments!

Further Issue

  1. We need more experiments on big data.
  2. Further improve time and space complexity.
  3. Implement the algorithm on CUDA (run on GPU).
  4. Other ways to compress dataβ€”Fewer data but higher dimension?

Reference

Yuh-Jyh Hu, Min-Che Yu, Hsiang-An Wang, and Zih-Yun Ting, β€œA Similarity-Based Learning Algorithm Using Distance Transformation,” IEEE TKDE., vol. 27, no. 6, pp. June. 2015.

cs-project-ml's People

Contributors

angusky avatar steven95421 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

cs-project-ml's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.