Whenever I run ML-Plan, it always returns a DummyClassifier
or a GaussianNB
when I call the getSelectedClassifier()
method. I have tried two different budgets (1 and 10 min) and different seeds too. However, I always get the same results.
I have followed the installation instructions, although I had to modify the pom.xml
file:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>ai.libs</groupId>
<artifactId>hasco-core</artifactId>
<version>0.2.4</version>
</dependency>
<dependency>
<groupId>ai.libs</groupId>
<artifactId>mlplan-sklearn</artifactId>
<version>0.2.4</version>
</dependency>
</dependencies>
I work with Eclipse and I have created a simple project with only a class:
package mlplan;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.concurrent.TimeUnit;
import org.api4.java.ai.ml.classification.singlelabel.evaluation.ISingleLabelClassification;
import org.api4.java.ai.ml.core.dataset.supervised.ILabeledDataset;
import org.api4.java.ai.ml.core.dataset.supervised.ILabeledInstance;
import org.api4.java.ai.ml.core.evaluation.IPrediction;
import org.api4.java.ai.ml.core.evaluation.IPredictionBatch;
import org.api4.java.ai.ml.core.evaluation.execution.ILearnerRunReport;
import org.api4.java.algorithm.Timeout;
import ai.libs.jaicore.ml.classification.loss.dataset.EClassificationPerformanceMeasure;
import ai.libs.jaicore.ml.core.dataset.serialization.ArffDatasetAdapter;
import ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor;
import ai.libs.jaicore.ml.scikitwrapper.ScikitLearnWrapper;
import ai.libs.mlplan.core.MLPlan;
import ai.libs.mlplan.sklearn.builder.MLPlanScikitLearnBuilder;
public class LaunchMLPlan
{
public static void main(String[] args) throws Exception
{
String dataset = args[0];
int seed = Integer.parseInt(args[1]);
int budget = Integer.parseInt(args[2]);
ILabeledDataset <ILabeledInstance> dTrain = ArffDatasetAdapter.readDataset(new File("datasets" + File.separator + dataset + File.separator + "train.arff"));
ILabeledDataset <ILabeledInstance> dTest = ArffDatasetAdapter.readDataset(new File("datasets" + File.separator + dataset + File.separator + "test.arff"));
// get the list of labels
String labels = dTrain.getLabelAttribute().getStringDescriptionOfDomain();
labels = labels.replace("[", "").replace("]", "").replace(" ", "");
String[] labelList = labels.split(",");
long start = System.currentTimeMillis();
MLPlanScikitLearnBuilder builder = MLPlanScikitLearnBuilder.forClassification();
// set the number of cores
builder.withNumCpus(1);
// set the seed
builder.withSeed(seed);
// set the global timeout of ML-Plan
builder.withTimeOut(new Timeout(budget, TimeUnit.SECONDS));
// set the timeout of a single solution candidate
builder.withNodeEvaluationTimeOut(new Timeout(budget/10, TimeUnit.SECONDS));
builder.withCandidateEvaluationTimeOut(new Timeout(budget/10, TimeUnit.SECONDS));
builder.withSearchSpaceConfigFile(new File("./automl/searchmodels/sklearn/sklearn-classification.json"));
System.out.println(builder.getSearchSpaceConfigFile());
System.out.println(builder.getAlgorithmConfig());
// ??
builder.withPortionOfDataReservedForSelection(.0);
//builder.withMCCVBasedCandidateEvaluationInSearchPhase(3, .8);
// start the optimization process
MLPlan<ScikitLearnWrapper<IPrediction, IPredictionBatch>> mlplan = builder.withDataset(dTrain).build();
ScikitLearnWrapper<IPrediction, IPredictionBatch> classifier = mlplan.call();
long end = System.currentTimeMillis();
float sec = (end - start) / 1000F;
BufferedWriter bw = new BufferedWriter(new FileWriter("runtime" + File.separator + dataset + "_" + seed + ".txt"));
bw.write(sec + " seconds\n");
bw.close();
// show the resulting model and its performance
SupervisedLearnerExecutor executor = new SupervisedLearnerExecutor();
ILearnerRunReport report = executor.execute(classifier, dTest);
System.out.println("Chosen model is: " + mlplan.getSelectedClassifier());
System.out.println("Error Rate of the solution produced by ML-Plan: " +
EClassificationPerformanceMeasure.ERRORRATE.loss(report.getPredictionDiffList().getCastedView(Integer.class, ISingleLabelClassification.class)));
// use the resulting model for prediction and store it in a file
bw = new BufferedWriter(new FileWriter("predictions" + File.separator + dataset + "_" + seed + ".csv"));
bw.write("y_pred\n");
for(IPrediction prediction: classifier.predict(dTest).getPredictions())
bw.write(labelList[(int) prediction.getPrediction()] + "\n");
bw.close();
}
}
It is worth noting that I have copied to my project directory the .json files configuring the search space: builder.withSearchSpaceConfigFile(new File("./automl/searchmodels/sklearn/sklearn-classification.json"));
I have also tried to modify the list of applicable classifiers to only select the DecisionTreeClassifier
, which makes that the getSelectedClassifier()
method returns such a classifier. However, the results it achieves are very bad (i.e. error rate closer to 0.8). I have created a simple python script to train a DecisionTree with the configuration returned by ML-Plan (DecisionTreeClassifier(criterion="gini",max_depth=6,min_samples_split=11,min_samples_leaf=11)
) over the same data partition and it returns much better results.
Am I misunderstanding something about the use of ML-Plan? Thank you in advance.
Train and test files:
car.zip