Giter Site home page Giter Site logo

blocking's Introduction

Some code to test out ideas about searching for blocking rules.

write/output describes the local search algorithm that I'm using.

block/output contains results -- that includes the final rule found for each simulated dataset along with numbers of true matches covered by the rule and the total number of pairs generated. There you can also find timestamps from each iteration, to get an idea of performance.

The goal is to find a disjunction of conjunctions that covers as many matching pairs as possible, constrained by a maximum budget of total pairs generated.

Each conjunction gives rise to an associated subproblem, that is, the original problem minus the pairs covered by the conjunction. The disjunction search is then just an iterative conjunction search -- we find a conjunction, we generate the subproblem, then find a conjunction for the subproblem, add it to our growing disjunction, and continue until we run out of budget or we've covered all matching pairs in the training data.

Conjunction search proceeds via local moves. A local move is either adding a single conjunct or removing a single conjunct from a conjunction. Starting from some initial position (such as the empty conjunction, the "full" conjunction of all atomic rules, a random conjunction, etc.), we make local moves based on some heuristic until we enounter a stopping criterion, and then emit the conjunction, and start a new conjunction search on the subproblem.

I've implemented two local search heuristics (block/src/search.jl):

  • add_conjunct starts with the empty conjunction and, at each step, selects the conjunct whose addition would result in the highest ratio of value/cost. A conjuntion's value is the number of true pairs it covers, and the cost is the total number of pairs it would generate. We stop when we can no longer improve the ratio.
  • Similarly, drop_conjunct starts with the full conjunction (every available rule), and greedily drops conjuncts.

Both of these heuristics involve evaluating every possible move at each step. Another possibility is to propose local moves one at a time, and select a move with probability proportional to the value/cost ratio . . ..

blocking's People

Contributors

tarakc02 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.