Giter Site home page Giter Site logo

Build Status

ConfigV

A tool for learning rules about configuration languages.

ConfigV uses a generalization of Association Rule Learning to learn user-defined predicates over datasets of configuration files. You can think of this as data science meets programming languages meets systems.

Basic Setup and Testing

To get ConfigV installed (assuming you have Haskell on your system), follow these steps. The first installation will take a bit of time to download packages, but will be quick(er) after that.

git clone https://github.com/ConfigV/ConfigV
cd ConfigV
cabal build

This assumes GHC 8.4.3 or 8.6.3 and cabal 3.0.0. If you are on Ubuntu, these can be easily installed here https://launchpad.net/~hvr/+archive/ubuntu/ghc If you want to run the test suites you need to install the ghv-$VERS-prof package from the hvr repo as well

Basic Usage

For usage on your own dataset, you can use the command line tool, in a similar way to below. Your exact location of the executable may vary.

cabal run ConfigVtool -- learn --learntarget "Datasets/benchmarks/CSVTest/" --enableorder 

You can also use ConfigV as an API from a Haskell program. For an example of this usage, see how the command line tool is built in the Executables directory, or inspect some of the tests in the Tests directory.

CloudFormation Templates

To build a datset of Amazon CFN templates, we need to first preprocess the templates into a form ConfigV can handle (key-value pairs). First collect a set of json CFN templates into a directory. Note, yaml is not supported at the moment, though I was able to use the tool yq at one point for fairly good conversion. To then convert the json to CSV for ConfigV, use the preprocessCFN.py file. You just need to change the constants in the .py code, then run with python preprocessCFN.py. This might not work for every file (some templates are malformed), so these templates might need to be discarded.

To run the learning process on the Amazon CFN templates, check the settings in Executables/AmazonCFN.hs, then run:

cabal run AmazonCFN

Datasets

Please feel free to add or modify datasets. All datasets should remain in the Datasets directory.

Helpful Tips

Inspecting Rules

The default location of the learned rules is cachedRules.json This can file be manually inspected as a sanity check. To pretty print this file, you use python -m json.tool cachedRules.json

To see the files in a benchmark set use tail -n +1 Datasets/benchmarks/MissingCSV/*

Support and Confidence

Thresholds cannot be set using the command line tool. If you want to change the thresholds for the command line tool, you will need to change the code of the command line tool directly (Executables/Main.hs). All you need to do is pass in the Thresholds obeject that you prefer to use

When using the API version of ConfigV, you can pass in thresholds either as PercentageThresholds (the traditional support and confidence way), or as RawThresholds (which is specialized to the size of your training set). RawThresholds is a good way to build benchmark programs, but for real use cases, PercentageThresholds is the only practical choice.

Publications

For more information on ConfigV and the theory behind how/why it works see:

Maintainers

  • Mark Santolucito

Feel free to reach out if you have any questions about the tool or how to use it - happy to help!

configv's Projects

configv icon configv

A tool for automatically verifying configuration files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.