Giter Site home page Giter Site logo

neptune-ai / neptune-tensorboard Goto Github PK

View Code? Open in Web Editor NEW
13.0 12.0 6.0 206 KB

Neptune - TensorBoard integration 🧩 Experiment tracking with advanced UI, collaborative features, and user access management.

Home Page: https://docs.neptune.ai/integrations/tensorboard/

License: Apache License 2.0

Python 100.00%
python tensorflow collaboration tensorflow2 comparison dashboard monitor sharing team training

neptune-tensorboard's Introduction

Neptune and TensorFlow logos

Neptune-TensorBoard integration

Log TensorBoard-tracked metadata to neptune.ai.

What will you get with this integration?

  • Log, organize, visualize, and compare ML experiments in a single place
  • Monitor model training live
  • Version and query production-ready models and associated metadata (e.g. datasets)
  • Collaborate with the team and across the organization

What will be logged to Neptune?

  • Model summary and predictions
  • Training code and Git information
  • System metrics and hardware consumption

You can also log:

Dashboard with TensorBoard metadata

Resources

Example

Install Neptune and the integration:

pip install -U "neptune[tensorboard]"

Enable Neptune logging:

import neptune
from neptune_tensorboard import enable_tensorboard_logging

neptune_run = neptune.init_run(
    project="workspace-name/project-name",  # replace with your own
    tags = ["tensorboard", "test"],  # optional
    dependencies="infer",  # optional
)

enable_tensorboard_logging(neptune_run)

Export existing TensorBoard logs:

neptune tensorboard --api_token YourNeptuneApiToken --project YourNeptuneProjectName logs

Support

If you got stuck or simply want to talk to us, here are your options:

  • Check our FAQ page.
  • You can submit bug reports, feature requests, or contributions directly to the repository.
  • Chat! In the Neptune app, click the blue message icon in the bottom-right corner and send a message. A real person will talk to you ASAP (typically very ASAP).
  • You can just shoot us an email at [email protected].

neptune-tensorboard's People

Contributors

aleksanderwww avatar aniezurawski avatar dependabot[bot] avatar hubertjaworski avatar inquisitive-me avatar jakubczakon avatar kshitij12345 avatar normandy7 avatar patrycja-j avatar piotrjander avatar raalsky avatar shnela avatar siddhantsadangi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neptune-tensorboard's Issues

Non-intuitive experiment name and tag

I have subdirectories for each experiment run in my logdir but when I convert it to Neptune, the name of the experiment is not the name of the run (which I would expect):

run_2770230158384141070/events.out.tfevents.1553086648.pascal01.intra.codilime.com

The tag is also a bit confusing to me since it is the same as the experiment name.
I would expect to have the run directory, event file, and computer name as separate entities, preferably as an experiment property even.

Wrong channel names in generated tensorboard experiments

When I compare my runs with tensorboard I am getting expected behaviour with different runs compared on the epoch_acc and epoch_loss channels.

However, when I sync it with neptune the channels are no longer named epoch_acc, epoch_loss.
For example check this experiment.

Because of that, one cannot compare experiment runs right now.

Connection lost error

When running the sync command:

neptune tensorboard logs --project jakub-czakon/tensorboard-intergation

I am getting ConnectionLost error:

TypeError: '>=' not supported between instances of 'ConnectionLost' and 'int'

I checked that I can run an experiment from the very same terminal.

Error on image dimension

Hi Neptune team,

I am getting an error when logging an image to tensorboard after runnig neptune_tb.integrate_with_tensorflow().

The problem is that an image has to be of dimension 4 (k, h, w, c) when logged to tensorboard (see https://www.tensorflow.org/api_docs/python/tf/summary/image).

However, when tf.summary.image is called with an image of dimension 4 (shape=(1, 1200, 1200, 3) in my case), the following error is raised:

  File ".../python3.8/site-packages/neptune_tensorboard/integration/tensorflow_integration.py", line 207, in image
    experiment_getter().log_image(get_channel_name(name), x=step, y=data, description=description)
  File ".../python3.8/site-packages/neptune/experiments.py", line 539, in log_image
    image_content = get_image_content(y)
  File ".../python3.8/site-packages/neptune/internal/utils/image.py", line 43, in get_image_content
    raise ValueError("Incorrect size of numpy.ndarray. Should be 2-dimensional or"
ValueError: Incorrect size of numpy.ndarray. Should be 2-dimensional or3-dimensional with 3rd dimension of size 1, 3 or 4.

I think the limitation here is that neptune.experiment.Experiment.log_image only accepts one image.
As a workaround, I replaced line 207 of neptune_tensorboard/integration/tensorflow_integration.py by:

for image in data: 
    experiment_getter().log_image(   
        get_channel_name(name), x=step, y=image, description=description
    )  

I can create a PR if you want.

Versions:

  • python: 3.8
  • tensorflow: 2.3.1
  • tensorboard: 2.3.0
  • neptune-tensorboard: 0.5.0

Unreadable error

When running the sync command

neptune tensorboard logs --project jakub-czakon/tensorboard-intergation

I am getting a really weird error:

TypeError: '>=' not supported between instances of 'ConnectionLost' and 'int'

When I read stack trace it seems to be about the lost connection.

I think it could be more user-friendly, say:

Connection was lost, check xyz and run your command again

No module named 'cli'

File "C:\Users\afaq.ahmad.conda\envs\tf_gpu\Scripts\neptune-script.py", line 5, in
from cli.main import main
ModuleNotFoundError: No module named 'cli'

Keep getting missing API token in Google Colab

Hello,
I am running a pytorch code with tensorboard in google colab and I wanted to track experiments.
So, I have created a project and given api token as (gave it two ways). None works.

!env = ' '
import neptune
neptune.init(
api_token=" ",
    project_qualified_name=" "
)

But when I run this
!neptune tensorboard '/logdir' --project name

I keep this issue
neptune.exceptions.MissingApiToken: Missing API token. Use "NEPTUNE_API_TOKEN" environment variable or pass it as an argument to neptune.init. Open this link to get your API token https://ui.neptune.ai/get_my_api_token

Any guidance or help??
Thank you

More tensorboard-like behavior

Currently, neptune-tensorboard traverses the log directory recursively and creates an experiment for every single file (not just .*tfevents.* file as stated in the docstring). This is unexpected and it's a problem, because my directory structure contains lots of non-event files (config files, checkpoints, outputs, …).

To mimic Tensorboard as much as possible:

  1. Only .*tfevents.* files should be included.
  2. This is more tricky – if there are multiple .*tfevents.* files in one directory, they should be considered parts of a single run. I think Tensorboard basically just reads all of them (in the order of their timestamps) and concatenates all the events.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.