Giter Site home page Giter Site logo

srlearn / srlboost Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 3.0 3.85 MB

⚡ SRLBoost: BoostSRL but ¼ the size and twice the speed. Refactor for srlearn's core.

License: GNU General Public License v3.0

Java 100.00%
statistical-relational-learning relational-reasoning relational-dependency-network markov-logic-network mln

srlboost's Introduction

SRLBoost

A package for learning Statistical Relational Models with Gradient Boosting, forked for use as srlearn's core.

It's basically BoostSRL but half the size and significantly faster.

Graph comparing the number of lines of code in each fork: BoostSRL, BoostSRL-Lite, and SRLBoost. SRLBoost is about half the size of BoostSRL.

Graphs at commit cb952a4, measured with cloc-1.84.

How much faster?

Box plots comparing the RDN learning time with SRLBoost, BoostSRL-Lite, and BoostSRL 1.1.1

(Smaller numbers are better.)

This box plot compares the learning time (in seconds) for three data sets and three implementations of learning relational dependency networks. BoostSRL-Lite was built from the repository on GitHub, and BoostSRL_v1.1.1 is the latest official release.

Each data set included 4-5 cross validation folds, and these results were averaged over 10 runs. This appears to suggest that SRLBoost is at least twice as fast as other implementations.

With some parameter tuning we have sped this up even further.

Box plots comparing learning time on the cora data set

The tiny bar on the left shows the average SRLBoost time for Cora is around 17 seconds, compared to around 4.5 minutes for BoostSRL-Lite and BoostSRL (that's more like 15x faster).

However, on Cora this does lead to slightly degraded performance in AUC ROC, AUC PR, and conditional log likelihood (CLL); shown in the table below.

Implementation mean AUC ROC mean AUC PR mean CLL mean F1
SRLBoost 0.61 0.93 -0.27 0.96
BoostSRL-Lite 0.65 0.94 -0.29 0.96
BoostSRLv1.1.1 0.65 0.94 -0.29 0.78

[Measurements used to produce this table are available online (three_jar_comparison.csv)]

A main aim for this project is to have a faster library. We have made the faster parameters the defaults, and intend to expose them as things that users can tune in instances where slower, more effective learning is critical.


Getting Started

SRLBoost project structure still closely mirrors other implementations.

We're using Gradle to help with building and testing, targeting Java 8.

Windows Quickstart

  1. Open Windows Terminal in Administrator mode, and use Chocolatey (or your preferred package manager) to install a Java Development Kit.
choco install openjdk
  1. Clone and build the package.
git clone https://github.com/srlearn/SRLBoost.git
cd .\SRLBoost\
.\gradlew build
  1. Learn with a basic data set (switching the X.Y.Z):
java -jar .\build\libs\srlboost-X.Y.Z.jar -l -train .\data\Toy-Cancer\train\ -target cancer
  1. Query the model on the test set (again, swtiching the X.Y.Z)
java -jar .\build\libs\srlboost-X.Y.Z.jar -i -model .\data\Toy-Cancer\train\models\ -test .\data\Toy-Cancer\test\ -target cancer

MacOS / Linux

  1. Open your terminal (MacOS: + spacebar + "Terminal"), and use Homebrew to install a Java Development Kit. (On Linux: apt, dnf, or yum depending on your Linux flavor).
brew install openjdk
  1. Clone and build the package.
git clone https://github.com/srlearn/SRLBoost.git
cd SRLBoost/
./gradlew build
  1. Run a basic example (switching the X.Y.Z):
java -jar build/libs/srlboost-X.Y.Z.jar -l -train data/Toy-Cancer/train/ -target cancer
  1. Query the model on the test set (again, swtiching the X.Y.Z)
java -jar build/libs/srlboost-X.Y.Z.jar -i -model data/Toy-Cancer/train/models/ -test data/Toy-Cancer/test/ -target cancer

srlboost's People

Contributors

dependabot[bot] avatar hayesall avatar lgtm-migrator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

srlboost's Issues

Removing File-IO

Almost all file input-output traces through a few classes:

  • will.Utils.condor.CondorFile
  • will.Utils.NamedReader
  • will.Utils.NamedInputStream
  • BufferedReader
  • BufferedWriter

Removing the dependence on file-io means we no longer have to simulate an operating system under the code, and will make this much easier to test.

Deprecate precomputes

On the road to: #28

Precomputes are really neat, but they are basically a data preprocessing step that create new facts using background knowledge about the domain and a bit of prolog.

Deprecate `.gz` files

Part of removing file-io #28

The main time the .gz compression step occurs is if there are a large number of precomputes. It should be pretty straightforward to trace out uses of them with something like git grep '.gz'

In-memory Modes

Good first part to #28 : Extract a small API to set the modes directly in Java instead of loading them from a file every time.


Running something like this:

package edu.wisc.cs.will;

import edu.wisc.cs.will.Boosting.Common.RunBoostedModels;

public class RunToyCancer
{
    public void runToyCancerLearnInfer() {
        String[] trainArgs = {"-l", "-train", "/toy_cancer/train/", "-target", "cancer", "-trees", "10"};
        RunBoostedModels.main(trainArgs);

        String[] testArgs = {"-i", "-model", "/toy_cancer/train/models/", "-test", "/toy_cancer/test/", "-target", "cancer"};
        RunBoostedModels.main(testArgs);
    }
}

Passes a list of _pos, _neg, _facts, and _bk files between objects, eventually ending up as buffered readers in ILPOuterLoop and LearnOneClause.

Start with something like this:

public void runTC()
{
  String newline = System.getProperty("line.separator");
  String localModes = String.join(
          newline,
          "usePrologVariables: true.",
          "mode: friends(+person,-person).",
          "mode: friends(-person,+person).",
          "mode: smokes(+person).",
          "mode: cancer(+person)."
  );

  // Fill
  // String[] trainArgs = {""};

  RunBoostedModels.newMain(trainArgs, localModes);
}

Incorrect steps saved with successive model checkpointing

Running from a checkpoint doesn't properly save the stepsize array to the resulting model file.

For example, learning and checkpointing three times:

-trees 3 // Checkpoint the model
-trees 4 // Checkpoint the model
-trees 5 // Checkpoint the model

Results in:

5
cancer
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
-1.8
cancer

i.e., a merge occurs instead of an override:

# Length 3 + Length 4 + Length 5
[1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0]

Regression setting raises an exception

See discussion in srlearn/srlearn#106

This should be reproducible with the Boston Housing dataset

Exception in thread "main" edu.wisc.cs.will.Utils.WILLthrownError: 
 Probability greater than 1!!: 34.15294117647059
	at edu.wisc.cs.will.Utils.Utils.error(Utils.java:263)
	at edu.wisc.cs.will.Utils.ProbDistribution.setProbOfBeingTrue(ProbDistribution.java:110)
	at edu.wisc.cs.will.Utils.ProbDistribution.<init>(ProbDistribution.java:20)
	at edu.wisc.cs.will.Boosting.Regression.RegressionTreeInference.getExampleProbability(RegressionTreeInference.java:34)
	at edu.wisc.cs.will.Boosting.Common.SRLInference.setExampleProbability(SRLInference.java:38)
	at edu.wisc.cs.will.Boosting.Common.SRLInference.getProbabilities(SRLInference.java:47)
	at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.buildDataSet(LearnBoostedRDN.java:411)
	at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.learnRDN(LearnBoostedRDN.java:168)
	at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.learnNextModel(LearnBoostedRDN.java:74)
	at edu.wisc.cs.will.Boosting.Regression.RunBoostedRegressionTrees.learn(RunBoostedRegressionTrees.java:40)
	at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.learnModel(RunBoostedModels.java:64)
	at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.runJob(RunBoostedModels.java:45)
	at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.main(RunBoostedModels.java:196)

Remove AUCCalculator

AUCCalculator was distributed without a license, the following probably isn't valid under any interpretation of free/open source software that I'm aware of:

Utils.println("\n% Running command: " + command); // See http://mark.goadrich.com/programs/AUC/
Process p = Runtime.getRuntime().exec(command);
InputStream is = p.getInputStream();

There's been a comment to rewrite it for some time:

* TODO Write our own code OR get the source code for the JAR to compute AUC
* @author Tushar Khot

Additionally this would fix LGTM warning about external code execution:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.