A web app for managing & visualizing deep learning experiments. Crafted for internal use and needs further update for general use.
- Configure projects and nodes in
meowcat.catlog.config.ClusterConfig
. Currently they are hard-coded. - Make sure your records are stored in such arrangement:
[some project folder] / records
|-210901-145959 |-...
|-210901-210112
|-checkpoints
|-custom-pages
|-some-page
|-attachments/
|-document.json
|-manifest.json
|-models (for kerasplot_model
outputs)
|-models.json
|-xxx.png (referenced in model.json)
|-...
|-config.py (a copy of the config being used)
|-log.txt (log output of the experiment program,loguru
recommended)
|-progress.json (for progress recording and heartbeat timestamp)
|-results.json (for training history and final test performance)
|-summary.json (for summary incl. name, start time, and description) - Build with your IDE, run as
jar
, ormvn pack
towar
and deploy to containers. Make sure the host can access all used nodes via sftp. - Try not hard-coding usernames and passwords of nodes/servers like we do.
An example of the experiment program that saves records in the required arrangement is attachments/run_experiment.py
, though it's insufficient to be run in its own.