wolfe-pack / wolfe Goto Github PK

View Code? Open in Web Editor NEW

136.0 28.0 17.0 81.09 MB

Wolfe Language and Engine

Home Page: https://wolfe-pack.github.io/wolfe

License: Apache License 2.0

Scala 66.74% Shell 0.04% HTML 5.38% JavaScript 14.68% CSS 13.16%

wolfe's Introduction

Please note that Wolfe is in very early alpha stage, so use it at your own risk. Installation

The easiest way to start a wolfe project is via g8:

g8 wolfe-pack/wolfe

If you want to incorporate wolfe into an existing sbt project, add to your build file:

resolvers ++= Seq(
  "Wolfe Release" at "http://homeniscient.cs.ucl.ac.uk:8081/nexus/content/repositories/releases",
  "Wolfe Snapshots" at "http://homeniscient.cs.ucl.ac.uk:8081/nexus/content/repositories/snapshots")

libraryDependencies ++= Seq(
  "ml.wolfe" %% "wolfe-core" % "0.5.0",
  "ml.wolfe" %% "wolfe-examples" % "0.5.0"
)

Extending Wolfe

To extend wolfe first clone this repository

git clone [email protected]:wolfe-pack/wolfe.git

Since sbt support has been integrated into Intellij Idea version 13, simply importing wolfe as a new project in Intellij Idea causes all dependencies to be resolved. Plus, Intellij Idea automatically refreshes the project when Build.scala is changed.

wolfe's People

Contributors

Stargazers

Watchers

Forkers

narad lionoil rudimk shyamupa timesofbadri yuxwind maybeluo fayimora peratham digideskio semanticbeeng desperado1992 mainsley appscluster codeaudit bingrao

wolfe's Issues

MLN transformation

Experiment Checkpointing

Messagepassing with ILP

Implement an ILP solving technique based on the MP graph

Decide on package, group and artifact name

Assuming that we can get either "wolfe.cc" or "wolfe-pack.org" as domain...
Candidates:
group: org.wolfe_pack; artifact: wolfe; package: org.wolfe_pack.wolfe (or something "inconsistent" like org.wolfepack.wolfe)
group: cc.wolfe; artifact: wolfe; package: cc.wolfe.wolfe (or something "inconsistent" like cc.wolfe)
More?

Inject benchmarking code in generated code

We could generally benchmark all training and inference calls. Clients could opt-in to automatically uploading benchmark data to our benchmark gathering store.

Refactoring and moving wolfe

• Everyone commits his/her changes and comments to this post when done
• Then Sameer moves wolfe from uclmr to wolfe-pack
• Then everyone cleans up wolfe

Move classes that we don't need but could be useful to look at later to a designated package called: legacy
Delete obsolete code
• After that refactor package structure to cc.wolfe

Implement Gibbs sampler with MPGraph

After renaming MPGraph to FactorGraph, as it now serves several inference mechanisms (nodes have an integer setting variable)

Coming up with a name

Candidate:

Sameer

Logging support

There are two settings where wolfe needs to report to the user. The first is at compilation time #51 and the second is at runtime. For example, inference and training code should report progress. One option is to use a logging framework and inject logging calls into the generated code.

Matching should happen on fully typed trees

Currently pattern matching on ASTs happens partly on untyped trees (those we get through inlining of other trees in the enclosed class). Only the arguments to the macro are typed. This means that we are often checking for method name matches, which can be risky. Ideally all trees we match on should be typed.

Connected to #40: ideally ASTs should incrementally enter the macro code, and always in a typed fashion. There should be a class responsible for this functionality.

LogZ implementation

Currently c419e27 implements a simple brute force logZ term.

Need the following enhancements:

Option to inject meaningful node and factor names into MPGraph

useful for debugging. Could happen when building the structured graph

make sure that factorie vector dimensions are big enough

When creating factorie dense vectors in the trainer I am using 10000 or so as default size. How can this be done dynamically (and hidden from user)?

annotation information on sub-objectives get lost when inlining

Maybe incrementally inline during pattern matching, so that we still have a chance to find symbol annotation

fields that are not used in features should not get own nodes

If they are observed we can give the observed values in the structure.value method, otherwise a default element.

Serialization of data (usually case classes) and weights (maps)

Maybe scala pickling

Gibbs Sampling

Interactive PPL

Working practical and competitive linear chain application

Untyped Term Pattern Matching and Construction

Cutting Plane Inference

g8 template that helps users to start a wolfe project

CNF Normalizer

Implemented in ILP Branch

TermDebugger

We need a better debug messages when something goes wrong while evaluating a model

Exception in BruteForce Max for the MLN example

Following exception occurs if byMessagePassing() is replaced by byBruteForce.

java.util.NoSuchElementException: None.get
    scala.None$.get(Option.scala:313)
    scala.None$.get(Option.scala:311)
    scalapplcodefest.term.State$class.apply(State.scala:27)
    scalapplcodefest.term.State$$anon$3.apply(State.scala:135)
    scalapplcodefest.term.Max$ByBruteForce$$anonfun$1.apply(MultiVariate.scala:58)
    scalapplcodefest.term.Max$ByBruteForce$$anonfun$1.apply(MultiVariate.scala:57)
    scalapplcodefest.WithStateDo.get(Util.scala:140)
    scalapplcodefest.term.Max$ByBruteForce$$anonfun$argmaxState$1.apply(MultiVariate.scala:66)
    scalapplcodefest.term.Max$ByBruteForce$$anonfun$argmaxState$1.apply(MultiVariate.scala:66)
    scalapplcodefest.term.StateTerm$$anon$1.eval(StateTerm.scala:16)
    scalapplcodefest.term.StateTerm$$anon$1.eval(StateTerm.scala:15)
    scalapplcodefest.term.Term$class.value(Term.scala:69)
    scalapplcodefest.term.StateTerm$$anon$1.value(StateTerm.scala:15)

Don't use an extra integer domain for quantified sums that are already defined over integers

Progress Monitor

Trainer (and possibly other longer tasks) report progress to some progress monitor. This monitor could be connected to wolfenstein to provide a visual progress bar.

Allow definition of data structures, objectives etc. in other compilation units

Currently all information needs to be in the same compilation unit to be useful for macro expansion. This should be more general.

One step could be to use the type information on case classes instead of the definition of the case class in the enclosing unit.

State Persistence

Import statements in model definitions lead to errors in MPGraph generation

Because we replace occurences of the data variable/argument with occurences of the value for the top root structure, and this value is unstable (can't be imported). Possible solution: create a temp val and then import it. There is still a problem, because the matching fails when imports are used.

Better Message Passing

Reuse definitions of node domains

Currently the generated MPGraph code creates the same domains for nodes several times (e.g. bools.toArray). Write a tree processor that takes the generated code so far and introduces sharing of domains

Term2LaTeX and Term2PrettyString

Improve BP schedule / edge order for local classifier

For the IRIS classifier the factor graph currently has two nodes, one for the hidden variable and one structured variable corresponding to the observation. That's fine, but the schedule is wrong because it starts with sending messages to the observation from the hidden variable, and it should be the other way around.

Matrix Factorization

continuous benchmarking

Define structured potentials

MLN Structure Learning

Combinatorial Factors

Introduce an "Atomic" annotation that ensures a function is used as a single potential

Should be easy once we have #40, as it only means we should stop breaking up the function when we encounter an atomic annotation.

Optimize nested indexOf(valueOf(index)) calls in generated code

Working practical and competitive local classifier application

Which application should this be? Would be nice to have one w/o too many features to make the example compact.

Crazy Math Implicits

I want to have aliases to be able to write things like λx.x+1, Σ, ⇒, ×

use macro to ignore gurobi related classes when gurobi is not available

Use macro annotation

@IfSwitchReplaceMeWith("DefaultImplementation.scala","gurobi")
class MyGurobiBasedImplementationOfGenericILPSolverInterface {
   import gurobi._
   ...
}

This macro would replace the code below with the code in DefaultImplementation.scala if the "gurobi-switch" was passed to the compiler. The macro would need to be in an own library (or separate sub-module?)

Easier definition of search space through predicates in Chunking example

Option (a la Vivek) if you want to specify what is observed

def observed(d:Sentence) = d.tokens.map(t => (t.word,t.tag))
argmax(S)(observed(_) == observed(instance))(obj)

Or if you want to specify what is hidden

def observed(s:Sentence) = s.tokens.map(_.copy(chunk = wildcard))
argmax(S)(observed(_) == observed(instance))(obj)

where wildcard can be any (constant object). This is convenient when your data structure is very large and inference only concerns a few (or one) attributes.