Giter Site home page Giter Site logo

starlibs / ailibs Goto Github PK

View Code? Open in Web Editor NEW
38.0 38.0 36.0 153.31 MB

A collection of Java libraries for basic AI algorithms (JAICore) and automated software composition, in particular AutoML (softwareconfiguration)

License: GNU Affero General Public License v3.0

Java 97.12% CSS 0.82% Python 0.51% TeX 0.05% HTML 1.31% Shell 0.04% Dockerfile 0.01% Stan 0.15% R 0.01%

ailibs's People

Contributors

alexandertornede avatar aminfa avatar berberer avatar fmohr avatar helegraf avatar jkoepe avatar jonashanselle avatar julilien avatar mirkojuergens avatar mwever avatar spunkly avatar tornede avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ailibs's Issues

ML-Plan always gives rise to bad results

Whenever I run ML-Plan, it always returns a DummyClassifier or a GaussianNB when I call the getSelectedClassifier() method. I have tried two different budgets (1 and 10 min) and different seeds too. However, I always get the same results.

I have followed the installation instructions, although I had to modify the pom.xml file:

  <repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
  </repositories>
  <dependencies>
	<dependency>
	  <groupId>ai.libs</groupId>
	  <artifactId>hasco-core</artifactId>
	  <version>0.2.4</version>
	</dependency>
  	<dependency>
  	  <groupId>ai.libs</groupId>
  	  <artifactId>mlplan-sklearn</artifactId>
  	  <version>0.2.4</version>
  	</dependency>
  </dependencies>

I work with Eclipse and I have created a simple project with only a class:

package mlplan;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.concurrent.TimeUnit;

import org.api4.java.ai.ml.classification.singlelabel.evaluation.ISingleLabelClassification;
import org.api4.java.ai.ml.core.dataset.supervised.ILabeledDataset;
import org.api4.java.ai.ml.core.dataset.supervised.ILabeledInstance;
import org.api4.java.ai.ml.core.evaluation.IPrediction;
import org.api4.java.ai.ml.core.evaluation.IPredictionBatch;
import org.api4.java.ai.ml.core.evaluation.execution.ILearnerRunReport;
import org.api4.java.algorithm.Timeout;

import ai.libs.jaicore.ml.classification.loss.dataset.EClassificationPerformanceMeasure;
import ai.libs.jaicore.ml.core.dataset.serialization.ArffDatasetAdapter;
import ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor;
import ai.libs.jaicore.ml.scikitwrapper.ScikitLearnWrapper;
import ai.libs.mlplan.core.MLPlan;
import ai.libs.mlplan.sklearn.builder.MLPlanScikitLearnBuilder;

public class LaunchMLPlan 
{	
	public static void main(String[] args) throws Exception
	{
		String dataset = args[0];
		int seed = Integer.parseInt(args[1]);
		int budget = Integer.parseInt(args[2]);
		
		ILabeledDataset <ILabeledInstance> dTrain = ArffDatasetAdapter.readDataset(new File("datasets" + File.separator + dataset + File.separator + "train.arff"));
		ILabeledDataset <ILabeledInstance> dTest = ArffDatasetAdapter.readDataset(new File("datasets" + File.separator + dataset + File.separator + "test.arff"));
		
		// get the list of labels
		String labels = dTrain.getLabelAttribute().getStringDescriptionOfDomain();
		labels = labels.replace("[", "").replace("]", "").replace(" ", "");
		String[] labelList = labels.split(",");
		
		long start = System.currentTimeMillis();
		MLPlanScikitLearnBuilder builder = MLPlanScikitLearnBuilder.forClassification();
		
		// set the number of cores
		builder.withNumCpus(1);
		// set the seed
		builder.withSeed(seed);
		// set the global timeout of ML-Plan
		builder.withTimeOut(new Timeout(budget, TimeUnit.SECONDS));
		// set the timeout of a single solution candidate
		builder.withNodeEvaluationTimeOut(new Timeout(budget/10, TimeUnit.SECONDS));
		builder.withCandidateEvaluationTimeOut(new Timeout(budget/10, TimeUnit.SECONDS));
		builder.withSearchSpaceConfigFile(new File("./automl/searchmodels/sklearn/sklearn-classification.json"));
		System.out.println(builder.getSearchSpaceConfigFile());
		System.out.println(builder.getAlgorithmConfig());
		
		// ??
		builder.withPortionOfDataReservedForSelection(.0);
		//builder.withMCCVBasedCandidateEvaluationInSearchPhase(3, .8);
		
		// start the optimization process
		MLPlan<ScikitLearnWrapper<IPrediction, IPredictionBatch>> mlplan = builder.withDataset(dTrain).build();
		ScikitLearnWrapper<IPrediction, IPredictionBatch> classifier = mlplan.call();				

		long end = System.currentTimeMillis();
		float sec = (end - start) / 1000F;
		BufferedWriter bw = new BufferedWriter(new FileWriter("runtime" + File.separator + dataset + "_" + seed + ".txt"));
		bw.write(sec + " seconds\n");
		bw.close();
		
		// show the resulting model and its performance
		SupervisedLearnerExecutor executor = new SupervisedLearnerExecutor();
		ILearnerRunReport report = executor.execute(classifier, dTest);
		
		System.out.println("Chosen model is: " + mlplan.getSelectedClassifier());
		System.out.println("Error Rate of the solution produced by ML-Plan: " +
				EClassificationPerformanceMeasure.ERRORRATE.loss(report.getPredictionDiffList().getCastedView(Integer.class, ISingleLabelClassification.class)));
		
		// use the resulting model for prediction and store it in a file
		bw = new BufferedWriter(new FileWriter("predictions" + File.separator + dataset + "_" + seed + ".csv"));
		bw.write("y_pred\n");
		for(IPrediction prediction: classifier.predict(dTest).getPredictions())
			bw.write(labelList[(int) prediction.getPrediction()] + "\n");
		bw.close();
	}
}

It is worth noting that I have copied to my project directory the .json files configuring the search space: builder.withSearchSpaceConfigFile(new File("./automl/searchmodels/sklearn/sklearn-classification.json"));

I have also tried to modify the list of applicable classifiers to only select the DecisionTreeClassifier, which makes that the getSelectedClassifier() method returns such a classifier. However, the results it achieves are very bad (i.e. error rate closer to 0.8). I have created a simple python script to train a DecisionTree with the configuration returned by ML-Plan (DecisionTreeClassifier(criterion="gini",max_depth=6,min_samples_split=11,min_samples_leaf=11)) over the same data partition and it returns much better results.

Am I misunderstanding something about the use of ML-Plan? Thank you in advance.

Train and test files:
car.zip

Receiving TimerAlreadyCanceledException in TwoPhaseHASCO when running MLPlan

Observing this error when running MLPlan in cluster experiments:

	Error message: Timer already cancelled.
	Error trace:
		java.util.Timer.sched(Timer.java:397)
		java.util.Timer.scheduleAtFixedRate(Timer.java:328)
		ai.libs.jaicore.concurrent.TrackableTimer.scheduleAtFixedRate(TrackableTimer.java:135)
		ai.libs.hasco.twophase.TwoPhaseHASCO.nextWithException(TwoPhaseHASCO.java:195)
		ai.libs.jaicore.basic.algorithm.AOptimizer.call(AOptimizer.java:134)
		ai.libs.jaicore.components.optimizingfactory.OptimizingFactory.nextWithException(OptimizingFactory.java:63)
		ai.libs.jaicore.components.optimizingfactory.OptimizingFactory.call(OptimizingFactory.java:80)
		ai.libs.mlplan.core.MLPlan.nextWithException(MLPlan.java:258)
		ai.libs.mlplan.core.MLPlan.call(MLPlan.java:291)
		naiveautoml.experiments.NaiveAutoMLExperimentRunner.evaluate(NaiveAutoMLExperimentRunner.java:217)
		ai.libs.jaicore.experiments.ExperimentRunner.conductExperiment(ExperimentRunner.java:217)
		ai.libs.jaicore.experiments.ExperimentRunner.lambda$randomlyConductExperiments$0(ExperimentRunner.java:104)
		java.lang.Thread.run(Thread.java:748)

Logs show that this stack trace is immediately followed by an indication of memory overflow:

java.lang.OutOfMemoryError: Java heap space

One dataset where this occured was the DNA dataset (https://www.openml.org/d/40670) using 24G memory.

The following message directly preceding the exception suggests that the error occurred when training a BayesNet:

2021-06-01 17:22:03.846 [ORGraphSearch-worker-1] INFO executor - Fitting the learner (class: ai.libs.mlplan.core.TimeTrackingLearnerWrapper) ai.libs.mlplan.core.TimeTrackingLearnerWrapper -
2021-06-01 17:23:03.691 [Global Timer] INFO InterruptionTimerTask - Executing interruption task 1293092700 with descriptor "Timeout for timed computation with thread Thread[ORGraphSearch-wo
2021-06-01 17:23:03.693 [Global Timer] INFO Interrupter - Interrupting Thread[ORGraphSearch-worker-1,5,main] on behalf of Thread[Global Timer,10,main] with reason InterruptionTimerTask [thr
2021-06-01 17:23:03.694 [Global Timer] INFO Interrupter - Interrupt accomplished. Interrupt flag of Thread[ORGraphSearch-worker-1,5,main]: true
2021-06-01 17:23:03.833 [Global Timer] INFO InterruptionTimerTask - Executing interruption task 1024325039 with descriptor "Timeout for timed computation with thread Thread[ORGraphSearch-wo
2021-06-01 17:23:03.834 [Global Timer] INFO Interrupter - Interrupting Thread[ORGraphSearch-worker-1,5,main] on behalf of Thread[Global Timer,10,main] with reason InterruptionTimerTask [thr
2021-06-01 17:23:03.835 [Global Timer] INFO Interrupter - Interrupt accomplished. Interrupt flag of Thread[ORGraphSearch-worker-1,5,main]: true

The question is really whether this can be avoided without spawning external processes.

documentation

Hello,

After having read the published paper "ML-Plan: Automated machine learning via hierarchical planning", this project appears very interesting. It's always interesting to have other approaches in auto ML as TPOT and auto-ML are the only way to go right now.

Is there any documentation, notably on how to build it?
I can't see any doc in this repo or the former one.

Thanks in advance

Help needed with getting MLPlan running

I tried to create a simple Java Maven project in IntelliJ Idea and added the following dependency to the pom.xml,

<dependency>
  <groupId>ai.libs</groupId>
  <artifactId>mlplan-full</artifactId>
  <version>0.2.5</version>
</dependency>

however, when trying to run the project, the following error arises:
image

Could you give any guidance on how to proceed further?

I'm using Java 17.0.6 and Maven 3.8.8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.