ericaltendorf / plotman Goto Github PK

View Code? Open in Web Editor NEW

917.0 917.0 282.0 1.08 MB

Chia plotting manager

License: Apache License 2.0

Python 98.18% Shell 1.06% Dockerfile 0.76%

plotman's People

Contributors

Stargazers

Watchers

Forkers

alfondotnet fdachille olivernyc tayyib1437 pitblacktulips ctrlaltdel notpeter ca136 serpent213 scrygl altendky wbreuschel purpaboo wolfrage76 akubera pieterhelsen niucongshuo chelseadole handri-kosada cmeyer90 sooboy risc5 wdragen 7e4 gigatron cciollaro mikehw r-anybbo superfh huzhengkui lannyyuan will123195 peterwillcn kaowul lanceyan davidtaylorhq wangzan1990 basilhorowt guanzhenggang pietermurray nardis556 asukahan ernestrc dextml snyderep taylorwenone pclanmcne dendisuhubdy fooforlan incisive llosimura alexluft mohism-research mcwalina super3 slimlime zshao mornhuang phexupyeosy xinyang128 alexjcampbell infinity312 paysonwelch roybotbot kanasite harukama tiger-tomcat tokenchain artorantala jrschellenberg wqxhouse pablocasares pavkriz jcronyn graemes opteek drewmullen zmxhot paulll iluvpcs alillebakk aaroneden nhlijiaming kickcellardoor dondiegosanchez shaunstanislauslau laudney prpatel kxknet jordybaylac obviouscap mjsr lovepocky mist0706 toyyan flmsc lopesmcc chia-net-ru stubmirror ed-yang

plotman's Issues

Issue with changing -d (dst) in config.yaml -- old dst keep getting reused despite restart (root cause ID'd)

I started plotting with two destinations (CHIA1 and CHIA2).
As CHIA1 was filling up, I decided to edit config.yaml to add CHIA3 and remove CHIA1. closed plotman interactive and restarted it.
Plotman interactive UI correctly only shows the DST specified in config.yaml (CHIA2 and CHIA3). plotman also correctly shows some //plots continuing on CHIA1.
PROBLEM: NEW //plots are still being kicked off on CHIA1 despite it not showing up on the list.

alfonsopereze on keybase found the root cause:
It seems that if you have running jobs with the dest dirs that are not part of the config file, they can still be picked up here and added to the dst list. IMO, this is NOT good.

plotman/manager.py

Line 37 in c1665b2

def dstdirs_to_youngest_phase(all_jobs):

Potential Fix
He suggests adding a filter by the current list of dest dirs as per the config file.

My problem is that while I can write python enough to write a script that filters a logfile, i don't know how to fix this issue in this code yet.

My current workaround is that after editing config.yaml to reflect my new dst folders, I create a symlink from CHIA1 to the new destination CHIA3. That's kind of breaks plotmans ability to balance the load between destinations and is a total hack but works for now.
Thanks in advance.
-g

Archive.py missing import and trailing slash in free space check command

I had to make the two following changes to archiving working.

Add an import for contextlib
Remove the trailing slash for the free space check

$ git diff archive.py
diff --git a/archive.py b/archive.py
index 4f69ea5..274229d 100644
--- a/archive.py
+++ b/archive.py
@@ -7,6 +7,7 @@ import psutil
 import re
 import random
 import sys
+import contextlib
 
 import texttable as tt
 
@@ -49,7 +50,7 @@ def compute_priority(phase, gb_free, n_plots):
 
 def get_archdir_freebytes(arch_cfg):
     archdir_freebytes = { }
-    df_cmd = ('ssh %s@%s df -aBK | grep " %s/"' %
+    df_cmd = ('ssh %s@%s df -aBK | grep " %s"' %
         (arch_cfg['rsyncd_user'], arch_cfg['rsyncd_host'], arch_cfg['rsyncd_path']) )
     with subprocess.Popen(df_cmd, shell=True, stdout=subprocess.PIPE) as proc:
         for line in proc.stdout.readlines():

Prioritize tmp dirs

Currently plotman grabs the first tmp dir that is ready.

This means if the global stagger setting is the limitation (rather than disk phase readiness), jobs will bunch up on the earliest disks. We should schedule jobs to the disks that are in the best state to take a job (similar to how we prioritize archiving from dst dirs)

Accelerate stagger schedule during rampup

It seems likely that scheduling would work with a lower global-delay during initial rampup of jobs. E.g., starting with 5m delay until most tmp drives are occupied then backing off to a more normal e.g. 20m delay.

Allow parallel archive jobs

Currently archive jobs are scheduled one at a time.

If there are multiple dst and multiple archive dirs, and a fast network (significantly faster than individual drive i/o), we could go faster by running multiple archive jobs at once.

This would require making archive scheduling be aware of other jobs, however, so it doesn't hit the same drives.

LICENSE file contains placeholder copy

Here: https://github.com/ericaltendorf/plotman/blob/main/LICENSE#L189
Should be added with your details, @ericaltendorf

Archiving status should still update when archiving inactive

when archiving is inactive, the status message never gets updated.

this means an old archiving job's PID may continue to be displayed as running long after it's gone.

Permit adapting plotting configuration to different drive types

SS and HD drives require different plotting parameters for optimal speed. When there is a mixture of drives in a system, there should be a way to tailor the plotting parameters for each drive type, rather than just choose one.

This could be achieved by specifying a plotting configuration for each tmp drive, and then listing multiple plotting configurations, each with a name. Then, when initiating a plot, select the named plotting configuration for the tmp drive.

Crash on unexpected datetime format

ValueError: time data 'Fri Feb 12 16:02:56 2021' does not match format '%a %b %d %H:%M:%S %Y'

config should be validated (like a schema check)

User reported some confusing issues with paths. Turns out they had tmp: /my/path instead of tmp: [/my/path] (or a multiline yaml list). The config file should be checked against a schema before we go using it. The following code expects a list but is just fine taking a string '/my/path' and turning it into a bunch of temporary directories like ['/', 'm', 'y', '/', 'p', 'a', 't', 'h']. This bug (we guess) also resulted in temporary files getting dumped in / which strikes me as extra bad.

        tmp: /root/venvChia/plots/temp1

plotman/manager.py

Lines 79 to 80 in 31ccb8f

    
           tmp_to_all_phases = [ (d, job.job_phases_for_tmpdir(d, jobs)) 
        
                   for d in dir_cfg['tmp'] ]

complete config file

# Where to plot and log.
directories:
        # One directory in which to store all plot job logs (the STDOUT/
        # STDERR of all plot jobs).  In order to monitor progress, plotman
        # reads these logs on a regular basis, so using a fast drive is
        # recommended.
        log: /root/venvChia/plotman/logs
 
        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp: /root/venvChia/plots/temp1
 
 
 
        # Optional: tmp2 directory.  If specified, will be passed to
        # chia plots create as -2.  Only one tmp2 directory is supported.
        # tmp2:
 
        # One or more directories; the scheduler will use all of them.
        # These again are presumed to be on independent physical devices,
        # so writes (plot jobs) and reads (archivals) can be scheduled
        # to minimize IO contention.
        dst: /root/venvChia/plots/plot4
 
        # Archival configuration.  Optional; if you do not wish to run the
        # archiving operation, comment this section out.
        #
        # Currently archival depends on an rsync daemon running on the remote
        # host, and that the module is configured to match the local path.
        # See code for details.
        # archive:
        #        rsyncd_module: plots
        #        rsyncd_path: /plots
        #        rsyncd_bwlimit: 80000  # Bandwidth limit in KB/s
        #        rsyncd_host: myfarmer
        #        rsyncd_user: chia
 
 
# Plotting scheduling parameters
scheduling:
        # Don't run a job on a particular temp dir until all existing jobs
        # have progresed at least this far.  Phase major corresponds to the
        # plot phase, phase minor corresponds to the table or table pair
        # in sequence.
        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
 
        # Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 1
 
        # Don't run any jobs (across all temp dirs) more often than this.
        global_stagger_m: 30
 
        # How often the daemon wakes to consider starting a new plot job
        polling_time_s: 1000
 
 
# Plotting parameters.  These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
        k: 32
        e: True              # Use -e plotting option
        n_threads: 4         # Threads per job
        n_buckets: 128       # Number of buckets to split data into
        job_buffer: 3300     # Per job memory

subprocess.Popen in manager.py looking for directories not specified by user

$ python plotman.py plot 
...starting plot loop
Traceback (most recent call last):
  File "plotman.py", line 92, in <module>
    wait_reason = manager.maybe_start_new_plot(dir_cfg, sched_cfg, plotting_cfg)
  File "/home/billy/Desktop/plotman/manager.py", line 120, in maybe_start_new_plot
    p = subprocess.Popen(plot_args,
  File "/home/billy/anaconda3/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/billy/anaconda3/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'chia'

Here is what my config file looks like:

# Where to plot and log.
directories:
        # One directory in which to store all plot job logs (the STDOUT/
        # STDERR of all plot jobs).  In order to monitor progress, plotman
        # reads these logs on a regular basis, so using a fast drive is
        # recommended.
        log: /home/billy/.chia/mainnet/plotman

        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp:
                - /media/billy/Data/chiatmp
                - /media/billy/Windows/chiatmp

        # Optional: tmp2 directory.  If specified, will be passed to
        # chia plots create as -2.  Only one tmp2 directory is supported.
        # tmp2: /mnt/tmp/a

        # One or more directories; the scheduler will use all of them.
        # These again are presumed to be on independent physical devices,
        # so writes (plot jobs) and reads (archivals) can be scheduled
        # to minimize IO contention.
        dst:
                - /media/billy/chia-1-16TB/chiaplots
                - /media/billy/chia-2-16TB/chiaplots
                - /media/billy/chia-3-16TB/chiaplots
                - /media/billy/chia-4-16TB/chiaplots
                - /media/billy/chia-5-16TB/chiaplots

        # Archival configuration.  Optional; if you do not wish to run the
        # archiving operation, comment this section out.
        #
        # Currently archival depends on an rsync daemon running on the remote
        # host, and that the module is configured to match the local path.
        # See code for details.


# Plotting scheduling parameters
scheduling:
        # Don't run a job on a particular temp dir until all existing jobs
        # have progresed at least this far.  Phase major corresponds to the
        # plot phase, phase minor corresponds to the table or table pair
        # in sequence.
        tmpdir_stagger_phase_major: 3
        tmpdir_stagger_phase_minor: 4

        # Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 4

        # Don't run any jobs (across all temp dirs) more often than this.
        global_stagger_m: 15

        # How often the daemon wakes to consider starting a new plot job
        polling_time_s: 60


# Plotting parameters.  These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
        k: 32
        e: True              # Use -e plotting option
        n_threads: 4         # Threads per job
        n_buckets: 128       # Number of buckets to split data into
        job_buffer: 6750     # Per job memory

I have no idea where or why this is looking for a directory named 'chia' when nothing I've specified has a directory name of 'chia'.

Emit CSV format from analyze

add an option to analyze to emit CSV instead of a rendered ASCII table.

this would allow people to more easily pull stats into spreadsheets.

GitHub setup: PR template or development as default branch

With main as the default branch, PRs are automatically made against this. If this is kept then a PR template should likely be added to remind people as they are creating the PR that it should be developed and submitted against development. If I understand the branch usage intention correctly...

Alternatively, the branch which should be developed against could be made the default. I'm guessing the interest in main being the default is so that people that aren't developing can clone and jump straight to running a 'stable' version. Following #61 where plotman becomes installable we can start publishing releases to PyPI and people that aren't developing can skip the git clone and the source entirely.

[Enhancement] Add a few lines to the ./plotman.py status to share additional details

In theory, plotman status could be used to generate a text file summarizing the current system plotting (e.g. './plotman.py status > /chia/chialogs/currentplots.txt'). In theory, this command can be scripted and added as a cronjob and that text file can then be served with apache/lighttpd for a simple web view of what's going on.
ENHANCEMENT: it would be nice if plotman status added the following to this simple text output

Summarize the # of plots running right now.
Maybe a simple visual indicator (the dots) from the plotman interactive.
Goal should be to show as much info in a short text-only box that can then be served from the webserver.

Now, I'm currently running into an issue where the plotmanstatus.sh script i created works perfectly to generate the text file when run from commandline but generates a 0byte file when run from cronjob. I suspect its related to environmental variable differences between the non-interactive shell started by crontab vs the 'real' user shell I'm using to start the command manually but could also be tied to some python plugins that exist for my real user but somehow not accessible by the crontab started shell. Im still working through this.

Once i solve this issue though, it would be nice to have these two enhancements.

Be robust to different platforms' psutil function availability

psutil offers some functions which are platform dependent. we could be more robust and simply not report ones that aren't available.

e.g., fix this stacktrace:

File "/home/chia/plotman-main/interactive.py", line 207, in curses_main
    jobs_win.addstr(0, 0, reporting.status_report(jobs, n_cols, jobs_height, 
  File "/home/chia/plotman-main/reporting.py", line 107, in status_report
    plot_util.time_format(j.get_time_iowait())
  File "/home/chia/plotman-main/job.py", line 282, in get_time_iowait
    return int(self.proc.cpu_times().iowait)
AttributeError: 'pcputimes' object has no attribute 'iowait'

Should check the other calls to psutil as well.

Be more robust to manually-started plot jobs

Right now plotman fails if it sees an existing plot job but can't find it's log (typically because it was started manually outside of plotman). It should be robust to this; simplest thing would be to issue a warning and ignore the job.

Check in a current and canonicalized config.yaml

need to do a fork of my local config.yaml so i can maintain both my own version for use and a canonical default for the github repository.

see: https://gist.github.com/eFishCent/dcce711d6babb123d8ab8d0ba3dc0532

If current //plots -d are NOT in config.yaml, I see an error

@ericaltendorf I found two more issues I wanted to see if you or someone else confirmed as my second two HDDs were filled today. I was busy getting my farmer 100% ready for mainnet (including evicting some small HDDs back into USB cases and into a milkcrate) but
My use model is that as I fill a USB HDD, I remove it from plotman's config.yaml, add the new one, and then restart plotman interactive. The idea is that that plotman will just pickup and send new plots to the new drive(s) instead of the old one.

I can live with removing the drive when its full and temporarily creating a symlink to the new drive so the old plots complete but then I ran into problem #2

If there are // plots currently in progress BUT I don't have the -d defined in the plotman config.yaml, I get this error.
Traceback (most recent call last):
File "./plotman.py", line 123, in
interactive.run_interactive()
File "/chia/plotman/interactive.py", line 312, in run_interactive
curses.wrapper(curses_main)
File "/usr/lib/python3.8/curses/init.py", line 105, in wrapper
return func(stdscr, *args, **kwds)
File "/chia/plotman/interactive.py", line 207, in curses_main
jobs_win.addstr(0, 0, reporting.status_report(jobs, n_cols, jobs_height,
_curses.error: addwstr() returned ERR

Yeah, this is not sustainable. I can keep things going continuously by just creating symlinks from the old targets to the new target BUT its not clean. I guess on my next reboot (when I take down ALL my //plots), I'll configure plotman to target 1-2 symlinks called output1 and output2. I will then symlink those to my destiantion drives until plotman can detect that dst drives are full.

I will then avoid messing with plotman's folders entirely but its kind of a hack.

Add prioritization logic for tmpdir selection

manager.py currently implements tmpdir selection logic. Meanwhile, parallel separate code in reporting.py computes when a tmpdir is eligible for plotting. We should create a single library for tmpdir prioritzation computation, and share this between both locations.

Create a web dashboard

Create a web dashboard that shows basically what the interactive curses-based dashboard shows.

Upgrade `plotman status` to show what dashboard shows

plotman status currently just shows current jobs.

we should make it more like what's currently shown on the plotman interactive dashboard.

maybe make the dashboard obsolete.....

Allow use without archival functionality

Currently, unconfigured or misconfigured archive settings probably cause plotman to not work at all. Also, when starting plotman interactive, it always begins with archiving active.

Make plotman support use case without running archiving (i.e., just leaving plots in the configured dst dirs).

When killing a job, plotman does not delete temp files

Killed server jobs with the command ./plotman.py kill xxxx, plotman correctly identifies the job and plot and asks if I want to kill it. When I said yes, it errors out saying that it cannot find the log file. The plot process is killed but I have to manually go in and delete all the temp files from the temp directory.

Allow scheduling a specified number of plot jobs

Currently, the plotting loop runs infinitely, continuing to plot whenever the conditions permit.

Some users would like to be able to request the plotting of exactly n plots. Implement this.

Bug: analyzer module gets overloaded

On the current development branch (haven't tried anywhere else), this command fails:

plotman analyze # anthing else, doesn't matter

because the analyzer module is overloaded with a variable of the same name.

Use time.monotonic() for refresh clock

Actual time of day doesn't matter, we just want a timer.

(maybe) support use of ~ or possible even environment variables in configuration

https://docs.python.org/3/library/os.path.html#os.path.expanduser
https://docs.python.org/3/library/os.path.html#os.path.expandvars

At least a couple people have run into exceptions over ~ not being supported. I don't know if there are good arguments for not supporting this, or not. If using more structured deserialization (possibly a result of working on #77) then this could be integrated in there to readily cover all paths.

Set up stubs for Job objects to enable unit testing

Support multiple -2 tmp dirs

Currently plotman distributes plotting, destination, and archival across arrays of disks. However, it assumes a single -2 tmp dir.

Personal experience suggests the load on the -2 tmp dir is low, and a single drive can support the plotting operations of about a dozen tmp drives. However, for robustness and scalability, we should support distributing -2 usage across multiple configured drives/directories.

Only show "(remote)" in prefixes line when it makes sense

This line:

plotman/interactive.py

Line 203 in 31ccb8f

header_win.addnstr(' (remote)', linecap)

Is shown unconditionally, which means it's also shown when archiving is disabled, in which case it seems to suggest that the dst prefix is remote, which it isn't.

Should be conditional on archiving being active.

Show tmp dir free space in tmp dir table

Sure it's available in other monitoring systems, but it's pretty relevant to tuning plotting.

Consider using pytest instead of unittest

Show status on startup

plotman interactive, on startup, doesn't show useful information in the first line status message. Only after the first refresh does it show status. It should show this immediately on startup.

Implement tmp dir diagnostics and cleanup

When things go wrong, old files get left in the tmp dirs. Implement a cleanup operation to remove orphaned files that do not appear to be owned by any active chia processes.

add sample size stat to analyze

analyze reports means and stdev but not sample size. sample size can vary across the columns, but we could still report something about it.

Dum question.. but how to I run this?

Installs fine.

However I run "python3 plotman.py" and nothing happens. It simply returns to the prompt without an error or anything.

Does config happen in a YAML file? Cant see any documentation. Am I blind ? :)

add start-stagger metric to job list

it would be useful to see the actual delta in wall time for a job compared to its previous job, ie to see how regularly the jobs are starting.

Too many plotters per temp drive

I am occasionally seeing the limit for how many plotters per temp drive being ignored.

My suspicion is that the phases like 2:? and 4:? are to blame.

My config only allows 1 plotter per temp drive. In the screenshot below, notice how drive "temp7" show as OK even though there is a job present. This happened while in phase 2:?. Once it moved on to phase 2:1, it was no longer showing as ready.

Support unconfigured -2 (to use dst dirs as tmp2 dirs)

If the dst dirs are SSDs, people may wish to not use a separate -2 dir and instead use the final dst dirs as the -2 tmp dir.

The configs don't currently allow this to be set up -- we should support it.

Bug: After config dst change, old (no longer configured) dst is still being used for new plots

Thanks for plotman!
I noticed a bug:
I started plotman (interactive) with two dst folders, say dst01 and dst02. Plotting went fine, plots were created etc.
Then I removed dst01 folder from the config, leaving only dst02.
I restarted plotman (interactive).
Now I noticed that still a plot command with "-d dst01" was generated. Though only one dst is (correctly) shown in plotman (interactive).
I suspect this is due to method "dstdirs_to_youngest_phase" in manager.py. This method takes the running jobs (including ones with dst01 from before the config update) and their dst directories for selection. However these dst directories should be filtered to not include any not in config.

How to fix:
In manager.py, variable dir2ph should not contain dst entries which are not in the config.

Create hooks for triggering commands on certain events

E.g., each time a plot is created. This would let people do various alerts or monitoring, e.g. a curl to a service like https://healthchecks.io

Archiving priority should be zero when there are no plots to archive

A nonzero priority is harmless but is slightly confusing to the user and makes it harder to quickly spot which dst dirs have any plots at all.

Parse command lines of existing jobs more robustly

E.g.:

$ ./plotman.py interactive
Warning: unrecognized args: -k32 -n1
Warning: unrecognized args: -t/tmp -2/tmp2
Warning: unrecognized args: -d/dst -b6000
Warning: unrecognized args: -u128 -r3
...

Should probably use a standard library for parsing the args. :)

Be able to find chia command without activating the env

I'm not really sure what the priority should be in the case where there's a chia on the $PATH and also in the same env as plotman, but at least in the case where chia is not on the on the $PATH and is available in the env, it seems like it should be used? Or maybe it must be configured? At a high level this relates to being able to run plotman without activating the environment.

sysconfig.get_path() can get us the scripts (bin) path.

$ venv/bin/python -c 'import sysconfig; print(sysconfig.get_path("scripts"))'
/farm/venv/bin

Or, maybe we can get the executable path from psutil when there are existing plots and if they all match then we can presume that's the chia to use?

Here are some issues that are at least in part related to chia-finding.

#79
#155

catch all curses addwstr() errors and print a helpful error message

addwstr() is used to display text in curses and throws an error if there's not enough space for the text.

aside from trying to not add too much text, we could at least wrap the writes in a try and catch the error and print out something like "try increasing your terminal size".

current workaround: try increasing your terminal size...

Be robust to disappearing processes when scanning process table

We scan the process table then inspect each process. During that time, the process can disappear. We should ensure we're robust to that possibility. The following stacktrace suggests we're not:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/psutil/_pslinux.py", line 1517, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/psutil/_pslinux.py", line 1637, in cmdline
    with open_text("%s/%s/cmdline" % (self._procfs_path, self.pid)) as f:
  File "/usr/local/lib/python3.8/dist-packages/psutil/_common.py", line 724, in open_text
    return open(fname, "rt", **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/2755564/cmdline'
During handling of the above exception, another exception occurred:    
Traceback (most recent call last):
  File "./plotman.py", line 123, in <module>
    interactive.run_interactive()
  File ".../chia/plotman/interactive.py", line 293, in run_interactive
    curses.wrapper(curses_main)
  File "/usr/lib/python3.8/curses/__init__.py", line 105, in wrapper
    return func(stdscr, *args, **kwds)
  File ".../chia/plotman/interactive.py", line 134, in curses_main
    (started, msg) = manager.maybe_start_new_plot(dir_cfg, sched_cfg, plotting_cfg)
  File ".../chia/plotman/manager.py", line 66, in maybe_start_new_plot
    jobs = job.Job.get_running_jobs(dir_cfg['log'])
  File ".../chia/plotman/job.py", line 53, in get_running_jobs
    if proc.name() == 'chia':
  File "/usr/local/lib/python3.8/dist-packages/psutil/__init__.py", line 622, in name
    cmdline = self.cmdline()
  File "/usr/local/lib/python3.8/dist-packages/psutil/__init__.py", line 675, in cmdline
    return self._proc.cmdline()
  File "/usr/local/lib/python3.8/dist-packages/psutil/_pslinux.py", line 1524, in wrapper
    raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=2755564, name='/usr/sbin/munin')

allow option to stop scheduling plots if dst dir is full

Add statistics for plot speed & throughput

Overall, and broken down by temp dir

config.yaml should not be searched based on the current directory.

There should be a central location for config.yaml (see this discussion).

I'm making this issue mostly because @altendky suggested I split this feature off of #61.

Enable local archiving (and improve archiving design)

Currently, the archiving process assumes archiving to a remote host using rsync.

We should allow multiple transports. At least, rysnc between local dirs.

The main complication here is that we not only need to transfer files, but also to check free disk space on the target directories. Currently this is implemented by remotely executing the df command on the remote host. So we need a clean and robust way to check free space that is coordinated with the file transport mechanism.

There's an additional complication because when transferring via rsync to a remote rsync daemon, we see virtual paths as exported by the rsync daemon configured modules, but when we execute df, we view paths as they exist natively on the remote host. We currently have a fairly crufty mechanism for mapping between the two views of paths. This mechanism needs to be robust, and to generalize across whatever transport mechanisms we implement.

	tmp_to_all_phases = [ (d, job.job_phases_for_tmpdir(d, jobs))
	for d in dir_cfg['tmp'] ]