teamhg-memex / tensorboard_logger Goto Github PK

Log TensorBoard events without touching TensorFlow

License: MIT License

Python 100.00%

tensorboard_logger's Introduction

tensorboard_logger

Note: consider using https://pytorch.org/docs/stable/tensorboard.html instead which has the same goal and is part of pytorch.

Log TensorBoard events without TensorFlow

TensorBoard is a visualization tool (not this project, it's a part of TensorFlow framework) that makes it easy to check training progress, compare between different runs, and has lots of other cool features.

tensorboard_logger library allows to write TensorBoard events without TensorFlow:

from tensorboard_logger import configure, log_value

configure("runs/run-1234")

for step in range(1000):
    v1, v2 = do_stuff()
    log_value('v1', v1, step)
    log_value('v2', v2, step)

Note: if you are already using TensorFlow in your project, you probably don't need this library.

Installation

TensorFlow is required only for viewing logged events: please check installation guide on the official site (you probably want a CPU-only version).

tensorboard_logger can be installed with pip:

pip install tensorboard_logger

Usage

You can either use default logger with tensorboard_logger.configure and tensorboard_logger.log_value functions, or use tensorboard_logger.Logger class.

This library can be used to log numerical values of some variables in TensorBoard format, so you can use TensorBoard to visualize how they changed, and compare same variables between different runs. Log file is written into a directory, so you need a separate directory for each run (you can place other logs or output files you use in the same directory). Directories from different runs you wish to compare should have the same parent (there can be other files or directories with the same parent, TensorBoard will figure out which directories contain logs).

Apart from variable names and their values, another important thing is the step: this must be an integer that represents some increasing step - it can be a step in training or some other number. The values are ordered by step in TensorBoard, although you can view them ordered by time or relative step too.

A simple usage example:

from tensorboard_logger import configure, log_value

configure("runs/run-1234", flush_secs=5)

for step in range(1000):
    v1, v2 = do_stuff()
    log_value('v1', v1, step)
    log_value('v2', v2, step)

You can start TensorBoard right away:

tensorboard --logdir runs

And go check the metrics to TensorBoard UI at http://localhost:6006 (note that it binds to 0.0.0.0 by default). Metrics are refreshed on switch to browser tab, and there is also a refresh button at the top right.

Runtime overhead rather large, about 0.1 - 0.2 ms for a single value logged (so about 5,000 - 10,000 operations per second).

API

tensorboard_logger.configure(logdir, flush_secs=2)

Configure logging: a file will be written to logdir, and flushed every flush_secs. NOTE: right now file is flushed after each event written.

tensorboard_logger.log_value(name, value, step=None)

Log new value for given name on given step. value should be a real number (it will be converted to float), and name should be a string (it will be converted to a valid TensorFlow summary name). step should be an non-negative integer, and is used for visualization: you can log several different variables on one step, but should not log different values of the same variable on the same step (this is not checked). You can also omit step entirely.

tensorboard_logger.unconfigure()

Simply unconfigure the logger setting the global variable _default_logger to None.

tensorboard_logger.Logger

A class for writing logs in a directory. Use it if default logger used in two above functions is not enough for you (e.g. you want to log into several different directories, or don't like global variables). Constructor has the same signature as tensorboard_logger.configure, and it has a single log_value method with the same signature as tensorboard_logger.log_value.

Development

Compiling python protobuf files:

protoc --python_out . tensorboard_logger/tf_protobuf/summary.proto
protoc --python_out . tensorboard_logger/tf_protobuf/event.proto

License

MIT license

tensorboard_logger's People

Stargazers

Watchers

Forkers

ml-lab ajaytalati alexguo bpiwowar lidaguo codehacken fyxx tucciai albert0147 codeaudit tony32769 jules-diez vivichina hulalazz andres-fr jadoonf erotemic yougoforward beniz willdamon jaiabhayk shubhampachori12110095 yspaik uranusx86 afcarl collector-m huyuhuster escap-data-hub swetmelon tinohsu aliasrobotics autohe heath-lee karolinazmh paulmoore1 hzhang57 xiaozhah samanthafeidfischer moqingxinai fdsjk strategist922 battyone xrosliang sailfish009 5l1v3r1 matt-peters zhonglingyuxiuyyx yz-liu windsmilevalley pinkdiamond1 isabella232 iq-scm dearborn-open-ai

tensorboard_logger's Issues

Relevent project dmlc/tensorboard

Hi,
Great effort. Looks like a lot of people are interested in using tensorboard in a larger context.
We have also been working on extracting tensorboard logging and rendering from tensorflow. https://github.com/dmlc/tensorboard

Would you be interested in joining force?

Thanks,
Eric

Get rid of tensorflow runtime dependency

It's possible to write log events directly. Technically it's possible to do it with struct - the summary format is trivial, but then summary writer adds timestamps and some other info, and probably it's better to do it with protobuf library.

The goal is to speed up writing logs and get rid of big dependency.

Does this work for tensorboard embedding visualizations?

Do not flush after each write

Flush after flush_secs instead

Batch logging of values

Hey there, what about batch logging with for instance numpy arrays?

log_images fails with new versions of scipy

log_images relies on scipy.misc.toimage which was deprecated in scipy v1.0.0 and removed in v1.2.0. (cf. https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.misc.toimage.html#scipy.misc.toimage)

Scipy documentation suggests using Pillow's Image.fromarray directly. The following changes to tensorboard_logger.py resolve the issue for me:

line 157 change:

            # Change the following:
            scipy.misc.toimage(img).save(s, format="png")
            # to:
            Image.fromarray(img).save(s, format="png")

line 12 change:

# Remove this import:
import scipy.misc
# Replace it with:
from PIL import Image

I would be happy to issue a PR with this fix.

Could this library be used for image visulization?

Currently, it seems that it could only be used for recording loss.

Install tensorboard independently without TensorFlow

Hi,
Nice work. Can the tensorboard be installed without TensorFlow?

ValueError: default logger already configured

I am using tensorboard logger with pytorch. I have this error when i write:
configure("runs/run-1234", flush_secs=5)

for step in range(1000):
log_value('v1', epoch, step)
log_value('v2', D_cost.data, step)

what could have gone wrong?

Plotting Precision Recall Curve

Hi,
I'm training a object detection framework, at the end of each epoch I want to plot Precision Recall curve on tensorboard. As far as I can I have log_value function which takes one value (float/int) at a time and log_histogram takes a value (list/tuple) which can't be used to plot the PR curve.
Is there any way I can plot PR curve using tensorboard_logger?

Not working on jupyter notebook

any idea why? or how to fix?
on standard python interpreter on the same machine it works perfectly

Support for image summary

I can see that you could easily support images

https://github.com/alexguo/tensorboard_logger

TensorFlow import cause an error

If TensorFlow package is imported before or after importing tensorflow_logger, this error is raised:

TypeError: Couldn't build proto file into descriptor pool!

Check performance - maybe there is room for improvement?

How to interprete such graphs, isn't something wrong?

Hi,
I recorded values of a loss functions and various components in pytorch. when displaying the graph using tensorboard_logger, where the x-axis is steps I get graphs going in backward direction in x axis (i.e at a particular step I have multiple values of loss function)

How is this even possible? what am i interpreting wrong? Please correct me if I am missing out something.

tensorboard_logger import issue even without tensorflow import

I did notice a very similar issue but the following is happening even when I haven't imported tensorflow anywhere.

tensorboard_logger version is : 0.1.0

Traceback (most recent call last):
File "train.py", line 16, in
import tensorboard_logger as tb_logger
File "/home/sxr8618/.local/lib/python2.7/site-packages/tensorboard_logger/init.py", line 3, in
from .tensorboard_logger import *
File "/home/sxr8618/.local/lib/python2.7/site-packages/tensorboard_logger/tensorboard_logger.py", line 22, in
from .tf_protobuf import summary_pb2, event_pb2
File "/home/sxr8618/.local/lib/python2.7/site-packages/tensorboard_logger/tf_protobuf/summary_pb2.py", line 22, in
serialized_pb=_b('\n,tensorboard_logger/tf_protobuf/summary.proto\x12\ntensorflow"\x87\x01\n\x0eHistogramProto\x12\x0b\n\x03min\x18\x01 \x01(\x01\x12\x0b\n\x03max\x18\x02 \x01(\x01\x12\x0b\n\x03num\x18\x03 \x01(\x01\x12\x0b\n\x03sum\x18\x04 \x01(\x01\x12\x13\n\x0bsum_squares\x18\x05 \x01(\x01\x12\x18\n\x0c\x62ucket_limit\x18\x06 \x03(\x01\x42\x02\x10\x01\x12\x12\n\x06\x62ucket\x18\x07 \x03(\x01\x42\x02\x10\x01"\x84\x04\n\x07Summary\x12(\n\x05value\x18\x01 \x03(\x0b\x32\x19.tensorflow.Summary.Value\x1aX\n\x05Image\x12\x0e\n\x06height\x18\x01 \x01(\x05\x12\r\n\x05width\x18\x02 \x01(\x05\x12\x12\n\ncolorspace\x18\x03 \x01(\x05\x12\x1c\n\x14\x65ncoded_image_string\x18\x04 \x01(\x0c\x1a}\n\x05\x41udio\x12\x13\n\x0bsample_rate\x18\x01 \x01(\x02\x12\x14\n\x0cnum_channels\x18\x02 \x01(\x03\x12\x15\n\rlength_frames\x18\x03 \x01(\x03\x12\x1c\n\x14\x65ncoded_audio_string\x18\x04 \x01(\x0c\x12\x14\n\x0c\x63ontent_type\x18\x05 \x01(\t\x1a\xf5\x01\n\x05Value\x12\x11\n\tnode_name\x18\x07 \x01(\t\x12\x0b\n\x03tag\x18\x01 \x01(\t\x12\x16\n\x0csimple_value\x18\x02 \x01(\x02H\x00\x12&\n\x1cobsolete_old_style_histogram\x18\x03 \x01(\x0cH\x00\x12*\n\x05image\x18\x04 \x01(\x0b\x32\x19.tensorflow.Summary.ImageH\x00\x12+\n\x05histo\x18\x05 \x01(\x0b\x32\x1a.tensorflow.HistogramProtoH\x00\x12*\n\x05\x61udio\x18\x06 \x01(\x0b\x32\x19.tensorflow.Summary.AudioH\x00\x42\x07\n\x05valueB\x03\xf8\x01\x01\x62\x06proto3')
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 878, in new
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "tensorboard_logger/tf_protobuf/summary.proto":
tensorflow.HistogramProto.min: "tensorflow.HistogramProto.min" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto.max: "tensorflow.HistogramProto.max" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto.num: "tensorflow.HistogramProto.num" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto.sum: "tensorflow.HistogramProto.sum" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto.sum_squares: "tensorflow.HistogramProto.sum_squares" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto.bucket_limit: "tensorflow.HistogramProto.bucket_limit" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto.bucket: "tensorflow.HistogramProto.bucket" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.HistogramProto: "tensorflow.HistogramProto" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.value: "tensorflow.Summary.value" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Image.height: "tensorflow.Summary.Image.height" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Image.width: "tensorflow.Summary.Image.width" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Image.colorspace: "tensorflow.Summary.Image.colorspace" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Image.encoded_image_string: "tensorflow.Summary.Image.encoded_image_string" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Image: "tensorflow.Summary.Image" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Audio.sample_rate: "tensorflow.Summary.Audio.sample_rate" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Audio.num_channels: "tensorflow.Summary.Audio.num_channels" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Audio.length_frames: "tensorflow.Summary.Audio.length_frames" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Audio.encoded_audio_string: "tensorflow.Summary.Audio.encoded_audio_string" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Audio.content_type: "tensorflow.Summary.Audio.content_type" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Audio: "tensorflow.Summary.Audio" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.value: "tensorflow.Summary.Value.value" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.node_name: "tensorflow.Summary.Value.node_name" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.tag: "tensorflow.Summary.Value.tag" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.simple_value: "tensorflow.Summary.Value.simple_value" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.obsolete_old_style_histogram: "tensorflow.Summary.Value.obsolete_old_style_histogram" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.image: "tensorflow.Summary.Value.image" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.histo: "tensorflow.Summary.Value.histo" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.audio: "tensorflow.Summary.Value.audio" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value: "tensorflow.Summary.Value" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary: "tensorflow.Summary" is already defined in file "tensorflow/core/framework/summary.proto".
tensorflow.Summary.Value.image: "tensorflow.Summary.Image" seems to be defined in "tensorflow/core/framework/summary.proto", which is not imported by "tensorboard_logger/tf_protobuf/summary.proto". To use it here, please add the necessary import.
tensorflow.Summary.Value.histo: "tensorflow.HistogramProto" seems to be defined in "tensorflow/core/framework/summary.proto", which is not imported by "tensorboard_logger/tf_protobuf/summary.proto". To use it here, please add the necessary import.
tensorflow.Summary.Value.audio: "tensorflow.Summary.Audio" seems to be defined in "tensorflow/core/framework/summary.proto", which is not imported by "tensorboard_logger/tf_protobuf/summary.proto". To use it here, please add the necessary import.
tensorflow.Summary.value: "tensorflow.Summary.Value" seems to be defined in "tensorflow/core/framework/summary.proto", which is not imported by "tensorboard_logger/tf_protobuf/summary.proto". To use it here, please add the necessary import.

NameError: name 'do_stuff' is not defined

can it show train/eval summaries in one plot?

Update pip package

Can you please update the pip package? It doesn't include log_histogram and other things you've implemented.

Scalar names not propogating properly

My code has a list of classes, and at the end of every epoch its supposed to go through the list and log scalars, tagging them "recall_classname", "precision_classname" and so forth.

What's showing up in tensorboard isn't those names though, its "recall_", "recall_/1", "recall_/2", and so forth, then finally "recall_.", etc.

What am I doing wrong?

The code is:

classestouse = [2,3,4,9,12,13,16,17]
classes = ["none", "comma", "period", "colon", "questionmark", "semicolon", "and", "exclamationpoint"]

def on_end_epoch(state):
    logger.log_value("loss_mean", avgmeter.value()[1], state['epoch'])
    logger.log_value("loss_sd", avgmeter.value()[1], state['epoch'])
    avgacc = sum(classmeter.value()) / len(classmeter.value())
    logger.log_value("loss_avgacc", avgacc, state['epoch'])
    state['iterator'].set_postfix(accuracy = avgacc, loss = avgmeter.value()[0], losssd = avgmeter.value()[1])
    confusion = confusionmeter.value()
    if state['train']:
        for i, v in zip(classestouse, classes):
            correct = confusion[i][i]
            groundtruth = confusion[i].sum()
            predicted = confusion[:, i].sum()
            recall = correct / groundtruth
            precision = correct / predicted
            v = punctypes.vocab.itos[i]
            logger.log_value("recall_%s" % (v), recall, state['epoch'])
            logger.log_value("precision_%s" % (v), precision, state['epoch'])
            logger.log_value('accuracy_%s' % (v), classmeter.value(i), state['epoch'])

(BTW - yes, I know those aren't the correct formula for precision and recall.)

Fix travis build

Tensorflow installation is failing: https://travis-ci.org/TeamHG-Memex/tensorboard_logger/jobs/172227819

How to visualize network architecture with this logger?

It is a cool visualization tool for Pytorch. I just wonder whether this support network architecture visualization.

Python2 crashes if tensorboard_logger is imported before torch

Thanks for the great package, it really brings much value for me. But I've recently come across a python crash.

*** Error in `python': malloc(): memory corruption: 0x000000007842e4c0 ***
Aborted (core dumped)

Steps to reproduce. Running the script below causes the crash on the last line (forward pass of the network).

from tensorboard_logger import configure

import torch
from torch.autograd import Variable

mymodel = torch.nn.Sequential(torch.nn.Conv2d(3, 10, kernel_size=3, bias=True))
imgs = Variable(torch.zeros((1,3,64,64), dtype=torch.float32)).cuda()
mymodel.cuda()
mymodel(imgs)

I also found that switching the order of the imports solves the problem. The following works fine.

import torch
from torch.autograd import Variable

from tensorboard_logger import configure

mymodel = torch.nn.Sequential(torch.nn.Conv2d(3, 10, kernel_size=3, bias=True))
imgs = Variable(torch.zeros((1,3,64,64), dtype=torch.float32)).cuda()
mymodel.cuda()
mymodel(imgs)

If I am not using .cuda() in the code, any order works fine.

System:

Ubuntu 14.04.5 LTS
Cuda  8.0, V8.0.61

Packages:

python                    2.7.15               h33da82c_4    conda-forge
pytorch                   0.4.1                py27__9.0.176_7.1.2_2
tensorboard-logger        0.1.0

I installed them with

conda install pytorch torchvision -c pytorch
pip install tensorboard_logger

I assume the order of imports was tested before, so my only guess is that conda and pip don't work well together and load different versions of some package.

Tensorboard logger with XGBOOST or SCI-KIT Learn

I'm trying to view training runs on tensorboard using the tensorboard logger, but I'm having a hard time following the instructions.

from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(n_estimators=100, random_state=0)
clf.fit (x,y)

bst = dxgb.train(client, params, df_train, labels_train)

I want to see the errors/AUC curves logged so I can view in tensorboard.

How do I configure the logger to get the curve?

Thanks,