starlibs / ailibs Goto Github PK

A collection of Java libraries for basic AI algorithms (JAICore) and automated software composition, in particular AutoML (softwareconfiguration)

License: GNU Affero General Public License v3.0

Java 97.12% CSS 0.82% Python 0.51% TeX 0.05% HTML 1.31% Shell 0.04% Dockerfile 0.01% Stan 0.15% R 0.01%

ailibs's People

Contributors

Stargazers

Watchers

ailibs's Issues

MLPlan kills the training of the final classifier

As seen in the logs, MlPlan somehow manages to interrupt the training of the classifier returned by HASCO.
console_3.log

MLPlan doesn't seem to start in some cases

I used the OpenML datasetid 3 with timeout 300, seed 1 and NodeEvaluationTimeout 60 with code that was copied from the MlPlanOpenMLExample
console.log

ML-Plan always gives rise to bad results

Whenever I run ML-Plan, it always returns a DummyClassifier or a GaussianNB when I call the getSelectedClassifier() method. I have tried two different budgets (1 and 10 min) and different seeds too. However, I always get the same results.

I have followed the installation instructions, although I had to modify the pom.xml file:

  <repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
  </repositories>
  <dependencies>
	<dependency>
	  <groupId>ai.libs</groupId>
	  <artifactId>hasco-core</artifactId>
	  <version>0.2.4</version>
	</dependency>
  	<dependency>
  	  <groupId>ai.libs</groupId>
  	  <artifactId>mlplan-sklearn</artifactId>
  	  <version>0.2.4</version>
  	</dependency>
  </dependencies>

I work with Eclipse and I have created a simple project with only a class:

package mlplan;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.concurrent.TimeUnit;

import org.api4.java.ai.ml.classification.singlelabel.evaluation.ISingleLabelClassification;
import org.api4.java.ai.ml.core.dataset.supervised.ILabeledDataset;
import org.api4.java.ai.ml.core.dataset.supervised.ILabeledInstance;
import org.api4.java.ai.ml.core.evaluation.IPrediction;
import org.api4.java.ai.ml.core.evaluation.IPredictionBatch;
import org.api4.java.ai.ml.core.evaluation.execution.ILearnerRunReport;
import org.api4.java.algorithm.Timeout;

import ai.libs.jaicore.ml.classification.loss.dataset.EClassificationPerformanceMeasure;
import ai.libs.jaicore.ml.core.dataset.serialization.ArffDatasetAdapter;
import ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor;
import ai.libs.jaicore.ml.scikitwrapper.ScikitLearnWrapper;
import ai.libs.mlplan.core.MLPlan;
import ai.libs.mlplan.sklearn.builder.MLPlanScikitLearnBuilder;

public class LaunchMLPlan 
{	
	public static void main(String[] args) throws Exception
	{
		String dataset = args[0];
		int seed = Integer.parseInt(args[1]);
		int budget = Integer.parseInt(args[2]);
		
		ILabeledDataset <ILabeledInstance> dTrain = ArffDatasetAdapter.readDataset(new File("datasets" + File.separator + dataset + File.separator + "train.arff"));
		ILabeledDataset <ILabeledInstance> dTest = ArffDatasetAdapter.readDataset(new File("datasets" + File.separator + dataset + File.separator + "test.arff"));
		
		// get the list of labels
		String labels = dTrain.getLabelAttribute().getStringDescriptionOfDomain();
		labels = labels.replace("[", "").replace("]", "").replace(" ", "");
		String[] labelList = labels.split(",");
		
		long start = System.currentTimeMillis();
		MLPlanScikitLearnBuilder builder = MLPlanScikitLearnBuilder.forClassification();
		
		// set the number of cores
		builder.withNumCpus(1);
		// set the seed
		builder.withSeed(seed);
		// set the global timeout of ML-Plan
		builder.withTimeOut(new Timeout(budget, TimeUnit.SECONDS));
		// set the timeout of a single solution candidate
		builder.withNodeEvaluationTimeOut(new Timeout(budget/10, TimeUnit.SECONDS));
		builder.withCandidateEvaluationTimeOut(new Timeout(budget/10, TimeUnit.SECONDS));
		builder.withSearchSpaceConfigFile(new File("./automl/searchmodels/sklearn/sklearn-classification.json"));
		System.out.println(builder.getSearchSpaceConfigFile());
		System.out.println(builder.getAlgorithmConfig());
		
		// ??
		builder.withPortionOfDataReservedForSelection(.0);
		//builder.withMCCVBasedCandidateEvaluationInSearchPhase(3, .8);
		
		// start the optimization process
		MLPlan<ScikitLearnWrapper<IPrediction, IPredictionBatch>> mlplan = builder.withDataset(dTrain).build();
		ScikitLearnWrapper<IPrediction, IPredictionBatch> classifier = mlplan.call();				

		long end = System.currentTimeMillis();
		float sec = (end - start) / 1000F;
		BufferedWriter bw = new BufferedWriter(new FileWriter("runtime" + File.separator + dataset + "_" + seed + ".txt"));
		bw.write(sec + " seconds\n");
		bw.close();
		
		// show the resulting model and its performance
		SupervisedLearnerExecutor executor = new SupervisedLearnerExecutor();
		ILearnerRunReport report = executor.execute(classifier, dTest);
		
		System.out.println("Chosen model is: " + mlplan.getSelectedClassifier());
		System.out.println("Error Rate of the solution produced by ML-Plan: " +
				EClassificationPerformanceMeasure.ERRORRATE.loss(report.getPredictionDiffList().getCastedView(Integer.class, ISingleLabelClassification.class)));
		
		// use the resulting model for prediction and store it in a file
		bw = new BufferedWriter(new FileWriter("predictions" + File.separator + dataset + "_" + seed + ".csv"));
		bw.write("y_pred\n");
		for(IPrediction prediction: classifier.predict(dTest).getPredictions())
			bw.write(labelList[(int) prediction.getPrediction()] + "\n");
		bw.close();
	}
}

It is worth noting that I have copied to my project directory the .json files configuring the search space: builder.withSearchSpaceConfigFile(new File("./automl/searchmodels/sklearn/sklearn-classification.json"));

I have also tried to modify the list of applicable classifiers to only select the DecisionTreeClassifier, which makes that the getSelectedClassifier() method returns such a classifier. However, the results it achieves are very bad (i.e. error rate closer to 0.8). I have created a simple python script to train a DecisionTree with the configuration returned by ML-Plan (DecisionTreeClassifier(criterion="gini",max_depth=6,min_samples_split=11,min_samples_leaf=11)) over the same data partition and it returns much better results.

Am I misunderstanding something about the use of ML-Plan? Thank you in advance.

Train and test files:
car.zip

documentation

Hello,

After having read the published paper "ML-Plan: Automated machine learning via hierarchical planning", this project appears very interesting. It's always interesting to have other approaches in auto ML as TPOT and auto-ML are the only way to go right now.

Is there any documentation, notably on how to build it?
I can't see any doc in this repo or the former one.

Thanks in advance

Help needed with getting MLPlan running

I tried to create a simple Java Maven project in IntelliJ Idea and added the following dependency to the pom.xml,

<dependency>
  <groupId>ai.libs</groupId>
  <artifactId>mlplan-full</artifactId>
  <version>0.2.5</version>
</dependency>

however, when trying to run the project, the following error arises:

Could you give any guidance on how to proceed further?

I'm using Java 17.0.6 and Maven 3.8.8

Receiving TimerAlreadyCanceledException in TwoPhaseHASCO when running MLPlan

Observing this error when running MLPlan in cluster experiments:

	Error message: Timer already cancelled.
	Error trace:
		java.util.Timer.sched(Timer.java:397)
		java.util.Timer.scheduleAtFixedRate(Timer.java:328)
		ai.libs.jaicore.concurrent.TrackableTimer.scheduleAtFixedRate(TrackableTimer.java:135)
		ai.libs.hasco.twophase.TwoPhaseHASCO.nextWithException(TwoPhaseHASCO.java:195)
		ai.libs.jaicore.basic.algorithm.AOptimizer.call(AOptimizer.java:134)
		ai.libs.jaicore.components.optimizingfactory.OptimizingFactory.nextWithException(OptimizingFactory.java:63)
		ai.libs.jaicore.components.optimizingfactory.OptimizingFactory.call(OptimizingFactory.java:80)
		ai.libs.mlplan.core.MLPlan.nextWithException(MLPlan.java:258)
		ai.libs.mlplan.core.MLPlan.call(MLPlan.java:291)
		naiveautoml.experiments.NaiveAutoMLExperimentRunner.evaluate(NaiveAutoMLExperimentRunner.java:217)
		ai.libs.jaicore.experiments.ExperimentRunner.conductExperiment(ExperimentRunner.java:217)
		ai.libs.jaicore.experiments.ExperimentRunner.lambda$randomlyConductExperiments$0(ExperimentRunner.java:104)
		java.lang.Thread.run(Thread.java:748)

Logs show that this stack trace is immediately followed by an indication of memory overflow:

java.lang.OutOfMemoryError: Java heap space

One dataset where this occured was the DNA dataset (https://www.openml.org/d/40670) using 24G memory.

The following message directly preceding the exception suggests that the error occurred when training a BayesNet:

2021-06-01 17:22:03.846 [ORGraphSearch-worker-1] INFO executor - Fitting the learner (class: ai.libs.mlplan.core.TimeTrackingLearnerWrapper) ai.libs.mlplan.core.TimeTrackingLearnerWrapper -
2021-06-01 17:23:03.691 [Global Timer] INFO InterruptionTimerTask - Executing interruption task 1293092700 with descriptor "Timeout for timed computation with thread Thread[ORGraphSearch-wo
2021-06-01 17:23:03.693 [Global Timer] INFO Interrupter - Interrupting Thread[ORGraphSearch-worker-1,5,main] on behalf of Thread[Global Timer,10,main] with reason InterruptionTimerTask [thr
2021-06-01 17:23:03.694 [Global Timer] INFO Interrupter - Interrupt accomplished. Interrupt flag of Thread[ORGraphSearch-worker-1,5,main]: true
2021-06-01 17:23:03.833 [Global Timer] INFO InterruptionTimerTask - Executing interruption task 1024325039 with descriptor "Timeout for timed computation with thread Thread[ORGraphSearch-wo
2021-06-01 17:23:03.834 [Global Timer] INFO Interrupter - Interrupting Thread[ORGraphSearch-worker-1,5,main] on behalf of Thread[Global Timer,10,main] with reason InterruptionTimerTask [thr
2021-06-01 17:23:03.835 [Global Timer] INFO Interrupter - Interrupt accomplished. Interrupt flag of Thread[ORGraphSearch-worker-1,5,main]: true

The question is really whether this can be avoided without spawning external processes.

MLPlan doesn't return a solution in some cases

I used the OpenML datasetid 3 with timeout 300, seed 1 and NodeEvaluationTimeout 60 with code that was copied from the MlPlanOpenMLExample
console_2.log

After that log MlPlan is just ideling and not reporting any solution.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.