Giter Site home page Giter Site logo

ai-se / e-dom Goto Github PK

View Code? Open in Web Editor NEW
1.0 7.0 7.0 105.83 MB

Epsilon domination

Python 17.79% Shell 0.26% OpenEdge ABL 29.90% HTML 0.34% CSS 0.44% Scilab 50.27% Common Lisp 1.00%
hyperparameter-optimization hyperparameter-tuning optimization tuning defect-prediction classification sbse software-engineering fft genetic-algorithm

e-dom's Introduction

e-dom

epsilon domination

e-dom's People

Contributors

amritbhanu avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

e-dom's Issues

parameter settings

  • 20 repeats only for d2h measure
  • I can only see quantile transformation is picked up quite good amount of times.

Learner

learner

Preprocess

preprocess

Result - popt - Higher the better

Summary

  • Y-axis shows the maximum Popt values achieved at every iteration
  • X-axis, ran for 1000 different possible subtrees by choosing the least counter value. Total possible subtrees were 85,000.

Conclusion:

  • epsilon dom exists, it flatlines for 0.025, 0.05 sooner than 0.1 and 0.2
  • This means we do not get much improvement after any possible combination (any subtrees).

camel

file

Jedit

file

Poi

file

log4j

file

synapse

file

velocity

file

xalan

file

xerces

file

ivy

file

lucene

file

Samples Needed

  • Based on formula, these are the samples needed across different confidence and epsilon value

baseline

What we are achieving:

  • size of epsilon larger in popt20 than d2h

Popt20

  • When does popt20 starts to achieve plateaus for the maximum score achieved.
    evals_popt20

d2h

  • When does d2h starts to achieve plateaus for the minimum score achieved.
    evals_d2h

Results

Summary

  • Y-axis shows the maximum AUC values achieved at every iteration
  • X-axis, ran for 1000 different possible subtrees by choosing the least counter value. Total possible subtrees were 85,000.

Conclusion:

  • epsilon dom exists, it flatlines for 0.025, 0.05 sooner than 0.1 and 0.2
  • This means we do not get much improvement after any possible combination (any subtrees).

Ivy

file

Log4j

file

Synapse

file

Velocity

file

Options being explored

Transformations

  • StandardScaler
  • MinMaxScaler
  • MaxAbsScaler
  • RobustScaler(quantile_range=(a, b))
    • a,b=_randint(0,50),_randint(51,100)
  • KernelCenterer
  • QuantileTransformer(n_quantiles=a, output_distribution=c, subsample=b)
    • a, b = _randint(100, 1000), _randint(1000, 1e5)
    • c=_randchoice(['normal','uniform'])
  • Normalizer(norm=a)
    • a = _randchoice(['l1', 'l2','max'])
  • Binarizer(threshold=a)
    • a=_randuniform(0,100)

Learners

  • DecisionTreeClassifier(criterion=b, splitter=c, min_samples_split=a)
    • a=_randuniform(0.0,1.0)
    • b=_randchoice(['gini','entropy'])
    • c=_randchoice(['best','random'])
  • RandomForestClassifier(n_estimators=a,criterion=b,min_samples_split=c)
    • a = _randint(50, 150)
    • b = _randchoice(['gini', 'entropy'])
    • c = _randuniform(0.0, 1.0)
  • LogisticRegression(penalty=a, tol=b, C=float(c), solver='liblinear')
    • a=_randchoice(['l1','l2'])
    • b=_randuniform(0.0,0.1)
    • c=_randint(1,500)
  • MultinomialNB(alpha=a)
    • a=_randuniform(0.0,0.1)
  • KNeighborsClassifier(n_neighbors=a, weights=b, p=d, metric=c)
    • a = _randint(2, 25)
    • b = _randchoice(['uniform', 'distance'])
    • c = _randchoice(['minkowski','chebyshev'])
    • if c=='minkowski': d=_randint(1,15) else: d=2

FFT against Tabu

  • Tabu search is showing median of 20 repeats achieved maximum on popt20
  • Tabu search is showing median of 20 repeats achieved minimum on d2h

Max Popt

max_popt median

Min d2h

min_d2h median

Multi-Goal

3 goals, c-dom:
Defect prediction - IFA, or david lo's measures
Text Mining - Runtimes

Flash Defect Prediction D2h Results

  • Ran only with Decision tree tuning, with an initial population of 12, budget 30, total population size 10000
  • Cannot run with the search space what dodge was exploring, the attributes to run with cart/regressor will change depending on which learner or preprocessor to use.
  • Results:
    • Flash never wins against dodge but performs as good as dodge in 5 out of 10 datasets.

flash_d2h

Bad Smell

  • Baseline Random Forest, FFT

Goal: D2h

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.