Giter Site home page Giter Site logo

neptune-ai / neptune-tensorboard Goto Github PK

View Code? Open in Web Editor NEW
13.0 12.0 6.0 211 KB

Neptune - TensorBoard integration 🧩 Experiment tracking with advanced UI, collaborative features, and user access management.

Home Page: https://docs.neptune.ai/integrations/tensorboard/

License: Apache License 2.0

Python 100.00%
python tensorflow collaboration tensorflow2 comparison dashboard monitor sharing team training versioning visualization

neptune-tensorboard's Issues

Keep getting missing API token in Google Colab

Hello,
I am running a pytorch code with tensorboard in google colab and I wanted to track experiments.
So, I have created a project and given api token as (gave it two ways). None works.

!env = ' '
import neptune
neptune.init(
api_token=" ",
    project_qualified_name=" "
)

But when I run this
!neptune tensorboard '/logdir' --project name

I keep this issue
neptune.exceptions.MissingApiToken: Missing API token. Use "NEPTUNE_API_TOKEN" environment variable or pass it as an argument to neptune.init. Open this link to get your API token https://ui.neptune.ai/get_my_api_token

Any guidance or help??
Thank you

Wrong channel names in generated tensorboard experiments

When I compare my runs with tensorboard I am getting expected behaviour with different runs compared on the epoch_acc and epoch_loss channels.

However, when I sync it with neptune the channels are no longer named epoch_acc, epoch_loss.
For example check this experiment.

Because of that, one cannot compare experiment runs right now.

Non-intuitive experiment name and tag

I have subdirectories for each experiment run in my logdir but when I convert it to Neptune, the name of the experiment is not the name of the run (which I would expect):

run_2770230158384141070/events.out.tfevents.1553086648.pascal01.intra.codilime.com

The tag is also a bit confusing to me since it is the same as the experiment name.
I would expect to have the run directory, event file, and computer name as separate entities, preferably as an experiment property even.

Connection lost error

When running the sync command:

neptune tensorboard logs --project jakub-czakon/tensorboard-intergation

I am getting ConnectionLost error:

TypeError: '>=' not supported between instances of 'ConnectionLost' and 'int'

I checked that I can run an experiment from the very same terminal.

NPT-14408: Unable to see data when importing tensorboard logs.

I am not able to view metrics from tensorboard generated logs.

1. To reproduce

Train a tensorflow model with keras using the tesnorboad callback.

import keras
import numpy

inp = keras.layers.Input( (16, ) )
op = keras.layers.Dense(4)(inp)

mdl = keras.models.Model( inputs=[inp], outputs = [op] )

x = numpy.random.random((128, 16))
y = (numpy.random.random((128, 4))>0.75) * 1.0

mdl.compile()

metrics = [ keras.metrics.BinaryAccuracy(name = "ba")]
mdl.compile( optimizer = keras.optimizers.Adam(0.00001), loss="mean_squared_error", metrics = metrics)

callbacks = [ keras.callbacks.TensorBoard(log_dir='./junk-logs'), keras.callbacks.EarlyStopping(patience=5, monitor="loss", mode='min')]

mdl.fit( x, y, callbacks=callbacks, epochs=100)

Then I try to synch the logs with the neptune.

export NEPTUNE_PROJECT="project/name"
export NEPTUNE_API_TOKEN="xxx"
./neptune-env/bin/neptune tensorboard junk-logs

The result is a new entry in my project, but I cannot find any of the values that were tracked.

Expected behavior

I can see graphs for the metrics in tensorboard, but I cannot see the graphs in neptune

3. Environment

Default environment that was installed when I used ./neptune-env/bin/pip install neptune-tensorboard

./neptune-env/bin/python --version

> Python 3.10.13

./neptune-env/bin/pip list

Package Version


absl-py 2.0.0
arrow 1.3.0
astunparse 1.6.3
attrs 23.2.0
boto3 1.34.14
botocore 1.34.14
bravado 11.0.3
bravado-core 6.1.1
cachetools 5.3.2
certifi 2023.11.17
charset-normalizer 3.3.2
click 8.1.7
contourpy 1.2.0
cycler 0.12.1
exceptiongroup 1.2.0
execnet 2.0.2
filelock 3.13.1
flatbuffers 23.5.26
fonttools 4.47.0
fqdn 1.5.1
fsspec 2023.12.2
future 0.18.3
gast 0.5.4
gitdb 4.0.11
GitPython 3.1.40
google-auth 2.26.1
google-auth-oauthlib 1.2.0
google-pasta 0.2.0
grpcio 1.60.0
h5py 3.10.0
idna 3.6
iniconfig 2.0.0
isoduration 20.11.0
Jinja2 3.1.2
jmespath 1.0.1
jsonpointer 2.4
jsonref 1.1.0
jsonschema 4.20.0
jsonschema-specifications 2023.12.1
keras 2.15.0
kiwisolver 1.4.5
libclang 16.0.6
Markdown 3.5.1
MarkupSafe 2.1.3
matplotlib 3.8.2
ml-dtypes 0.2.0
monotonic 1.6
mpmath 1.3.0
msgpack 1.0.7
neptune 1.8.6
neptune-tensorboard 1.0.2
networkx 3.2.1
numpy 1.26.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
opt-einsum 3.3.0
packaging 23.2
pandas 2.1.4
pillow 10.2.0
pip 23.0.1
pluggy 1.3.0
protobuf 4.23.4
psutil 5.9.7
pyasn1 0.5.1
pyasn1-modules 0.3.0
PyJWT 2.8.0
pyparsing 3.1.1
pytest 7.4.4
pytest-xdist 3.5.0
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
referencing 0.32.1
requests 2.31.0
requests-oauthlib 1.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.16.2
rsa 4.9
s3transfer 0.10.0
setuptools 65.5.0
simplejson 3.19.2
six 1.16.0
smmap 5.0.1
swagger-spec-validator 3.0.3
sympy 1.12
tbparse 0.0.8
tensorboard 2.15.1
tensorboard-data-server 0.7.2
tensorboardX 2.6.2.2
tensorflow 2.15.0.post1
tensorflow-estimator 2.15.0
tensorflow-io-gcs-filesystem 0.35.0
termcolor 2.4.0
tomli 2.0.1
torch 2.1.2
triton 2.1.0
types-python-dateutil 2.8.19.14
typing_extensions 4.9.0
tzdata 2023.4
uri-template 1.3.0
urllib3 2.0.7
webcolors 1.13
websocket-client 1.7.0
Werkzeug 3.0.1
wheel 0.41.2
wrapt 1.14.1

This is a 64bit linux, I think centos with 4.18 kernel.

More tensorboard-like behavior

Currently, neptune-tensorboard traverses the log directory recursively and creates an experiment for every single file (not just .*tfevents.* file as stated in the docstring). This is unexpected and it's a problem, because my directory structure contains lots of non-event files (config files, checkpoints, outputs, …).

To mimic Tensorboard as much as possible:

  1. Only .*tfevents.* files should be included.
  2. This is more tricky – if there are multiple .*tfevents.* files in one directory, they should be considered parts of a single run. I think Tensorboard basically just reads all of them (in the order of their timestamps) and concatenates all the events.

Error on image dimension

Hi Neptune team,

I am getting an error when logging an image to tensorboard after runnig neptune_tb.integrate_with_tensorflow().

The problem is that an image has to be of dimension 4 (k, h, w, c) when logged to tensorboard (see https://www.tensorflow.org/api_docs/python/tf/summary/image).

However, when tf.summary.image is called with an image of dimension 4 (shape=(1, 1200, 1200, 3) in my case), the following error is raised:

  File ".../python3.8/site-packages/neptune_tensorboard/integration/tensorflow_integration.py", line 207, in image
    experiment_getter().log_image(get_channel_name(name), x=step, y=data, description=description)
  File ".../python3.8/site-packages/neptune/experiments.py", line 539, in log_image
    image_content = get_image_content(y)
  File ".../python3.8/site-packages/neptune/internal/utils/image.py", line 43, in get_image_content
    raise ValueError("Incorrect size of numpy.ndarray. Should be 2-dimensional or"
ValueError: Incorrect size of numpy.ndarray. Should be 2-dimensional or3-dimensional with 3rd dimension of size 1, 3 or 4.

I think the limitation here is that neptune.experiment.Experiment.log_image only accepts one image.
As a workaround, I replaced line 207 of neptune_tensorboard/integration/tensorflow_integration.py by:

for image in data: 
    experiment_getter().log_image(   
        get_channel_name(name), x=step, y=image, description=description
    )  

I can create a PR if you want.

Versions:

  • python: 3.8
  • tensorflow: 2.3.1
  • tensorboard: 2.3.0
  • neptune-tensorboard: 0.5.0

No module named 'cli'

File "C:\Users\afaq.ahmad.conda\envs\tf_gpu\Scripts\neptune-script.py", line 5, in
from cli.main import main
ModuleNotFoundError: No module named 'cli'

Unreadable error

When running the sync command

neptune tensorboard logs --project jakub-czakon/tensorboard-intergation

I am getting a really weird error:

TypeError: '>=' not supported between instances of 'ConnectionLost' and 'int'

When I read stack trace it seems to be about the lost connection.

I think it could be more user-friendly, say:

Connection was lost, check xyz and run your command again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.