c-cube / frog-utils Goto Github PK

View Code? Open in Web Editor NEW

5.0 4.0 0.0 731 KB

[frozen] Scheduling and running jobs on a shared computer, then analyse their output

License: BSD 2-Clause "Simplified" License

OCaml 99.92% Makefile 0.08%

benchmark prover run batch-job ocaml

frog-utils's Introduction

Frog-utils

Scheduling and running jobs on a shared computer, then analyse their output.

License: BSD.

Usage

Provide several tools and one library, frogutils. The tools are the following ones:

froglock: execute commands sequentially, no overlap.
frogmap: run a command on many inputs (files?), with parallelization.
frogiter: run a command on every result obtained with frogmap.
frogtptp: run a TPTP prover (more specific!) and analyse its output.
frogplot: prints the cumulative times of TPTP provers

Froglock

The basic usage is froglock -- <cmd>. This has the same behavior as <cmd>, except for one point: at any given time, on a given computer, at most one command launched from froglock -- <cmd> runs. Until it terminates, the other commands aren't started yet; once it terminates one waiting command is started and takes ownership of the "lock". This works by connecting to a daemon on a given port, starting it if required. The daemon will stop if its queue is empty.

froglock status can be used to display the current queue of tasks (including the one that owns the lock). It prints nothing if no task is launched.

There is one "lock" per TCP port on the machine (although only ports > 1024 should be used, otherwise only root will be able to use froglock). The port can be changed with --port <n>.

Frogmap

frogmap -- <cmd> <args...> applies a command <cmd> (shell command) to a list of elements, and stores the result of <cmd> <arg> for every such element <arg>.

Parallelism (on a single computer) can be achieved with -j <n> where <n> is the number of parallel invocations of <cmd>

If frogmap is killed or stopped for any other reason before it could process every argument, it is possible to resume the computation from where it left: frogmap resume <state_file.json>.

Results stored in a file <file.json> can be analysed either using the module FrogMapState (in the library frogutils, see src/frogMapState.mli) or with frogiter (see below).

Frogiter

frogiter goes hand in hand with frogmap. It is a simple way to apply a command to all the results stored in some <file.json> produced by frogmap. Example:

    frogmap -o foo.json -j 20 'sleep 3; echo ' `seq 1 1000`

    # later...
    frogiter -c foo.json 'grep 11'

will print all number from 1 to 1000 that contains the string 11 in their decimal representation. In a slightly inefficient way. The command is passed the following environment:

FROG_OUT: the command's stdout
FROG_ERR: the command's stderr
FROG_ARG: the argument given to the command
FROG_TIME: number of seconds the command took to complete
FROG_ERRCODE: exit code of the command

Example:

    frogiter shell foo.json \
    'echo on $FROG_ARG, time $FROG_TIME, `wc -l <<< "$FROG_OUT"` lines'

Frogtptp

Frogtptp provides several commands to deal with TPTP provers (although it would probably work almost as-is with SMT solvers too).

The commands are:

frogtptp run <prover_name> file.p runs a theorem prover on the TPTP file. <prover_name> is the name of a prover as listed in the config file $HOME/.frogtptp.toml (or with the -config flag). Other options specify the memory limit and timeout for the prover.
frogtptp list lists the known provers (those detailed in the config file).
frogtptp analyse <prover>=<file.json> analyses a single output file as obtained from frogmap 'frogtptp -run <prover>' file1 file2 .... frogtptp analyse <prover1>=<file1.json>,<prover2>=<file2.json> ... will do the same but also compare the performance of the different provers. No other option might follow analyse. This is still work in progress.
frogtptp plot -o out <prover>=<file.json> prints the cumulative time of the provers on the unsat results.

Example:

    # I have 10 cores, let's prove stuff with E
    frogmap -j 10 -o bench.json \
      'frogtptp run eprover -t 5' \
      $TPTP/Problems/*/*.p

    # gosh, I have to reboot!
    sleep 500; reboot

    # resume where I left. But now I have 30 cores!
    frogmap resume -j 30 bench.json

    # then: basic statistics on the results
    frogtptp analyse eprover=bench.json

    # print the cumulative times
    frogtptp plot -o plot.png eprover=bench.json

runs the E prover (named eprover) on all files in a TPTP archive, resumes the computation after a reboot, prints some basic statistics about the results, and finally plots the cumulative times in a file plot.png.

A sample config file for frogtptp can be found in data/frogtptp.toml. Its format is toml, a simple textual configuration format, close to ini.

Install

With opam:

$ opam pin add frogutils https://github.com/c-cube/frog-utils.git

frog-utils's People

Contributors

Stargazers

Watchers

frog-utils's Issues

remove DB

just rely on self-contained JSON files, and a list of directories to search for those files. They should contain:

the raw results
a UUID (use uuidm with V4)
a timestamp for the start time (and maybe finish time)
maybe some other metadata ($USER, etc?)

dynamic loading of "plugins"

plugins:

irc-client
mail

Plugins can be optionally compiled (with opam), and loaded only if required. Also, they can have their own section in the config file.

System timeout and memory management

In frogtptp, the $timeout and $memory variables are currently handled by the provers but not all of them feature options for timeout and memory management and the one who do might behave differently (for example: are these options only used for proof search by itself or also for problem preprocessing?).

For fair comparison of tools, it would be better to wrap the prover call with the system commands (timeout and ulimit).

Man page for config files

It would be nice to have a man page to explain how config files for frogtptp works (but also for other binaries that uses config files), and/or give some examples of standard config files.

add opam package in opam-deducteam

forward subprocess' input to stdout

When a long-running command is join-locke'd, and prints its progress step by step, it would be nice to display its output at the same time rather than at the end.

We still need to keep the whole output for other purpose (for instance to send it into a mail). Use Lwt_io.read_lines?

remove dependency on `smtp`

Support for sending mails should be in a plugin.

split library

Make a small library for client/server/message that can be used from other executables.

use `mqtt` for communication

Missing opam dependancy : cairo2

Frogutils needs cairo2 as dependancy, because otherwise archimedes is built without and it results in
ocamlfind: Package `archimedes.cairo' not found

jsoo web interface

Use jsoo to write a client-side version of the web interface, keeping opium only for serving the .js and .json files.

http://ocsigen.org/js_of_ocaml/2.8.2/api/index
https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction
big table in the DOM updated upon search
buttons/forms to filter/compare
use olinq for the pure query part, with a flat representation of all records
http://git.isomorphis.me/dict/tree/

use gzip to store results

big json files can be quite inefficient

choose:

use the gzip command line tool + configure time detection, simpler than a OCaml library.
use camlzip

use config file(s)

required for:

SMTP configuration
IRC configuration
default timeout, priority, etc.

Probably use toml (with /etc/join-locke.conf and ~/.join-locke.conf if they are available, the second being more important).

add `extra-args` to file comments

in a given benchmark, need # extra-args(zipperposition): --arith and the likes to customize the command line of a prover on this file?

irc client

use irc-client lib to notify a user when her job is done (requires a config file...)

frogmap

make the frogmap tool+library:

Usage should be something like

frogmap -j 15 [options] -- cmd f1 f2 f3.... f1000

frogmap -j 15 -F <file listing problems> cmd

frogmap --resume <state file>

The state is stored in one file, that can be used to --resume a task that was killed/stopped.

file format to store the task state
- one json object at the top, with
  - list of files/arguments to deal with
  - the command
  - the user, parameters, parallelism level
- one json object per line, with the result of the command on one file/arg
- remaining tasks = all tasks to deal with \ those whose result is already stored
tool that runs a given amount of tasks in parallel, using Lwt
provide a nice library to parse such files (to be used by frogfold)

transition to sqlite for all storage

Should be much more efficient for all sorts of queries and storage.

Proposition for schema (pseudo sql):

create table benchmaks (uid uuid primary key, timestamp int64, metadata blob);
create table text_blobs (key sha1 primary key, value blob);
create table provers (id sha1 primary key, name blob, version blob, cmd blob, sat blob, unsat blob);
create table problems (abspath blob primary key, content sha1);
create table results (prover sha1, problem sha1, stdout_key sha1, stderr_key sha1, bench_id uuid);

Froglock/frogmap seems to freeze above at about 1018 tasks

While running some benchs lately, i have seen that when doing a lot of tests (i.e about 20000), frogmap seems to freeze after doing 1018 tasks, using internal locking (i.e --lock true option, which is the default).
Additionally, when trying to check what happens with froglock status, froglock seemed to be frozen too.

This might be a very problematic issue.

frogmap: equivalent of find's {} argument

In frogmap, it would be nice if the command that is to be applied to every argument could have a way to name that argument. Unix' find command provides a {} argument for this.

more info on jobs

In jobs, store more info:

user (capture $USER in the executable when creating job, send it to daemon)
directory (capture $PWD...)
time at which the job was submitted
user-provided optional tag?
- with cli option -tag <tell my life>
- useful to explain why a task is so long, or that it mustn't be killed...
expected degree of parallelism
- maybe option -j <num>, that sets the expected degree of parallelism and
  adds j <num> to the command. Example:
```
join-locke -j 15 -- make benchmark
```
- use it to run several jobs in parallel if there are enough cores for both?
- see issue #13

second log file

Parsable log file (json) to be used for monitoring and statistics.

cgroup management

Try to use cgroups to reserve some number of cores/amount of memory to the process that owns the lock.

use cmdliner

Much nicer, more powerful, and should give better documentation.

Missing dependency on lwt-zmq ?

When trying to install the latest dev version, it fails with the following error message:

#=== ERROR while installing frogutils.dev =====================================#
# opam-version 1.2.2
# os           linux
# command      ./configure --bindir /home/guigui/.opam/4.04.0+flambda/bin --sharedstatedir /home/guigui/.opam/4.04.0+flambda/share --enable-web --disable-irc --mandir /home/guigui/.opam/4.04.0+flambda/man
# path         /home/guigui/.opam/4.04.0+flambda/build/frogutils.dev
# compiler     4.04.0+flambda
# exit-code    1
# env-file     /home/guigui/.opam/4.04.0+flambda/build/frogutils.dev/frogutils-3692-edb7c0.env
# stdout-file  /home/guigui/.opam/4.04.0+flambda/build/frogutils.dev/frogutils-3692-edb7c0.out
# stderr-file  /home/guigui/.opam/4.04.0+flambda/build/frogutils.dev/frogutils-3692-edb7c0.err
### stderr ###
# ocamlfind: Package `lwt-zmq' not found
# W: Field 'pkg_lwt_zmq' is not set: Command ''/home/guigui/.opam/4.04.0+flambda/bin/ocamlfind' query -format %d lwt-zmq > '/tmp/oasis-62c9e6.txt'' terminated with error code 2
# E: Cannot find findlib package lwt-zmq
# E: Failure("1 configuration error")

Check proofs

Most, or at least some theorem provers include backends that output proofs in some format. It would be nice if frogtptp and/or frogmap was able to produce these proofs and check them with the appropriate tool when doing benchmarks. This raises some interesting questions.

First is more of a philosophical question: producing a proof can be time and memory consuming compared to simply searching for one, which means that asking the prover to produce a proof may alter in significant ways the result of a benchmark (i.e time and/or memory used, etc..). A simple solution to this would be to call the prover twice: once for the benchmark purposes, and a second time in order for it to produce the proof. However this might make benchmarks a lot longer. We could also consider that outputting the proof is part of the benchmark, but in that case, for benchmarks of different provers to be comparable, they should all produce proofs in the same format, which seems highly unlikely right now.

The second question is a design implementation: since obviously the checking of the proof is not part of the benchmark of a theorem prover, the proof checker's process statistics such as time used should be recorded apart from those of the theorem prover, which is not currently possible with a single use of frogmap. A possible solution would be to extend frogmap to map not only a single command, but a chain of commands, which would allow to record their execution time separately. That would require a serious modification of the existing code of frogmap and most likely introduce non-backward compatible changes.

It could be possible to add another tool (such as frogbench), which would execute frogmap to benchmark and produce proofs, and then another frogmap to check these proofs. However this would separate the results of the bench from the results of the proof checking, which is not very nice I find.

Any comments ?

block new tasks

By able to send the daemon (if started) a message to prevent it from accepting new tasks (clients fail), and send notification when all remaining tasks are done.

Typically, useful to update the code or restart the computer, run an IO bound program, etc.

use nanomsg to communicate

More interesting possibilities of communication.

pattern: each tool has one Bus socket on a common \Ipc "frog"` address. Interesting events (mainly "job is finished" are broadcasted.)

Opens up the possibility of splitting tools into microservices; e.g., a mail or IRC notifier would be a separate tool (with its own dependencies), and it would listen on the Bus socket for interesting news.

Absolute path in systemd file

As I'm learning, it seems systemd requires an absolute path in the ExecStart command of a service file, which means the current file in data cannot be directly used.

Continuous integration ?

As shown in #34 , dependencies in the opam file, or misconfiguration of the opam file may lead to the package not being installable anymore. These bugs are not easily detectable on work machines (which often have the dependencies manually installed), but are very easily catchable using continuous integration.

It would be interesting to setup even a minimal test that verifies that the opam package builds as intended. If we want to try and test for different version of ocaml, we can use the files in https://github.com/Gbury/mSAT as examples.

file locking in frogmap

currently the module FrogMapState doesn't ensure the files are not corrupted. A unix lock would be useful...

progress bar for frogmap

would be nice to have some shiny progress bar, like ocamlbuild or wget.

New tool: froglimit

It could be nice to have a new tool for imposing limits to a program.
Using ulimit as is done currently is fine, but it has a few flaws:

it is a console tool, so not as stable as using a system call would be
it seems to limit the process and each children separately, rather than limit the sum of resources

Additionally, froglimit could behave in a way similar to that used in the CASC competition (e.g sending SIGALRM and/or SIGXCPU when a process timeouts), which could be nice.

additional arguments for frogiter through env

use the environment to give the command:

errcode
time
stderr
arg

better scheduling in froglock

use a priority queue (to have different priorities)
be fair w.r.t users?
add a -j <num cores> option and run several jobs at once if they won't overlap (advanced)

do not crash if log file cannot be opened
fix permissions for log file (thought the problem seems to be Lwt_log rather than join_locke)

AUR package

would be nice to have, so we can reinstall the last version easily. Some AUR packages are git versions so it's definitely doable.