iterative / dvc-render Goto Github PK

View Code? Open in Web Editor NEW

6.0 9.0 6.0 2.34 MB

Library for rendering DVC plots

Home Page: http://docs.iterative.ai/dvc-render/

License: Apache License 2.0

Python 100.00%

dvc package plots rendering visualization

dvc-render's Introduction

dvc-render

dvc-render is a library for rendering data stored in DVC plots format into different output formats, like Vega. It can also generate HTML and MarkDown reports containing multiple plots.

It is used internally by DVC, DVCLive, and Studio.

Features

Renderers

Take data stored in DVC plots format alongside plot properties in order to render a plot in different formats.

Reports

Take multiple renderers and build an HTML or MarkDown report.

Templates

Support for rendering Vega plots using custom of pre-defined templates.

Requirements

The basic usage of rendering Vega Plots doesn't have any dependencies outside Python>=3.8.

Additional features are specified as optional requirements:

dvc-render/setup.cfg

Lines 27 to 32 in 49b8f8a

    
           [options.extras_require] 
        
           table = 
        
               tabulate>=0.8.7 
        
           markdown = 
        
               %(table)s 
        
               matplotlib

Installation

You can install DVC render via pip from PyPI:

$ pip install dvc-render

Usage

Renderer & Templates

from dvc_render import VegaRenderer
properties = {"template": "confusion", "x": "predicted", "y": "actual"}
 datapoints = [
     {"predicted": "B", "actual": "A"},
     {"predicted": "A", "actual": "A"},
 ]

renderer = VegaRenderer(datapoints, "foo", **properties)
plot_content = renderer.get_filled_template()

plot_content contains a valid Vega plot using the confusion matrix template.

Report

from dvc_render import render_html
render_html([renderer], "report.html")

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, DVC render is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

dvc-render's People

Contributors

Stargazers

Watchers

Forkers

pared daniel-falk hv10 yaellevy eliyahou alexk101

dvc-render's Issues

Plots are not rendered properly when there is a single value

Screen.Recording.2022-12-16.at.3.43.28.PM.mov

VS Code:

Limit size of rendered images.

@daavoo yep, I think we should also handle gracefully this on dvc plots show - images could be large after all, we should be normalizing their sizes

Originally posted by @shcheklein in iterative/example-repos-dev#114 (comment)

vega: Escape special characters (`.[]`) in `field`

If a field contains special characters, the plot will be rendered empty. These characters need to be escaped with \\ in order to vega-lite to properly render the plot.

render: image: have option to diff images with a slider instead of side by side

See how tensorboard shows images over multiple steps for an example: https://stackoverflow.com/questions/43763858/change-images-slider-step-in-tensorboard.

It would be great to have an option to view like this instead of loading all image revisions side by side.

template filling: allow filling template on no datapoints

Currently, we return empty string if no VegaRender.datapoints are empty. Studio has to provide dummy datapoints to have pre-rendered template. Also modifying this behavior should be more in line with our do the most we can plotting policy.

`html`: Improve aesthetics

Current look:

Here are some ideas:

Include some DVC or Iterative branding (logo, background colors, style)
Improve plots layout , similar to Studio? (i.e. use card style container for each plot)
Add sidebar tree view to enable/disable plots
#6
#22

`plots`: Update "smooth` template to match `linear`

The linear template has different sizes and additional features (i.e. on hover labels).

Invisible lines :bug:

Which operating system and Python version are you using?

Operating System: Arch Linux
Kernel: Linux 6.6.10-arch1-1
Architecture: x86-64
Python: 3.9.18

Which version of this project are you using?

dvc-render 1.0.0

What did you do?

"""Minimal working example to reproduce bug."""

from dvc_render import VegaRenderer, render_html

if __name__ == "__main__":
    data = [
        {"x": "-1", "y": "-1", "rev": "a"},
        {"x": "1", "y": "1", "rev": "a"},
        {"x": "-1", "y": "1", "rev": "b"},
        {"x": "1", "y": "-1", "rev": "b"},
    ]

    properties = {
        "x": "x",
        "y": "y",
        "title": "mwe",
        "template": "simple",
    }

    render_html([VegaRenderer(data, "foo", **properties)], "report.html")

What did you expect to see?

Two lines colored according to revs a and b.

What did you see instead?

Nothing, lines were invisible.

HTML that was produced using render_html.

<!doctype html>
<html>
  <head>
    <title>DVC Plot</title>

    <script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>
    <script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>
    <script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>

    <style>
      table {
        border-spacing: 15px;
      }
    </style>
  </head>
  <body>
    <div id="foo">
      <script type="text/javascript">
        var spec = {
          $schema: "https://vega.github.io/schema/vega-lite/v5.json",
          data: {
            values: [
              { x: "-1", y: "-1", rev: "a" },
              { x: "1", y: "1", rev: "a" },
              { x: "-1", y: "1", rev: "b" },
              { x: "1", y: "-1", rev: "b" },
            ],
          },
          title: "mwe",
          params: [{ name: "grid", select: "interval", bind: "scales" }],
          width: 300,
          height: 300,
          mark: { type: "line", tooltip: { content: "data" } },
          encoding: {
            x: { field: "x", type: "quantitative", title: "x" },
            y: {
              field: "y",
              type: "quantitative",
              title: "y",
              scale: { zero: false },
            },
            color: { field: "rev", scale: { domain: [], range: [] } },
            strokeDash: {},
            tooltip: [{ field: "rev" }, { field: "x" }, { field: "y" }],
          },
        };
        vegaEmbed("#foo", spec);
      </script>
    </div>
  </body>
</html>

Problem Source

I managed to boil it down to this.

color: { field: "rev", scale: { domain: [], range: [] } }

After removing the empty scale field the HTML plot was working as intended.

anchors: don't auto - fill labels on complex config

We should not set x_label if x is not a string.
The same goes for y.
Studio cannot fill in the data and generate flat datapoints before passing the data to frontend. Only when we have flat datapoints we can be sure that we can infer the {}_label.

Research JavaScript wrapper

I now wonder if this use case is not another point for dvc-render to have some support for javascript. If there is some pythonic wrapper for js maybe we could leverage vega to produce images in non-linear template use cases. But I guess that would require some research to determine this idea's feasibility.

Originally posted by @pared in #69 (review)

plots: require templates to be JSON and treat them as such

In our current code, we assume that the template might be in any string format and so we substitute the anchors as strings. We should really start requiring the templates to be JSON, so we could treat them as dicts internally and have a better control over the special vega options. E.g. smoothing from iterative/dvc#3906 , could be handled not with a separate template but a special plot property (iterative/dvc#3906 (comment)).

Require templates to be in JSON format
handle template contents (Template.content currently might be str or dict) as dicts
add plot smoothing option (might be extracted into as separate issue later, depending on complexity)

Story : Plots , Metrics for LLM support [2]

Submission Type

Discussion

Context

Offer ML support for another use cases beyond Translation, that could mostly imply text generation (Q/A , summarization, etc)
Separated from #137 in other to focus on solving plots and one use case and offer possible support for others.

Impact

Offer support

Issue creator Goal

Offer support

Leaving a placeholder here to possible questions.
Will prob submit some q and a coming from discussion in the meantime
Thanks!

Tasks

Beta Give feedback

Define use case Q&A , metrics and plots for each one
Define use case Summarization , metrics and plots for each one
Options

Smoothing doesn't work as expected

Smoothing in linear plots uses loess regression, which isn't suitable for linear plots of step-based trends. See iterative/vscode-dvc#3837.

tests: integration

We need some end-to-end tests verifying that provided renderers gets converter to expected HTML. So far we only have those for parallel coordinates plot.

markdown: Support confusion matrix.

Markdown report only supports Linear plots.

`plots`: Add histogram template

`plots`: Add bar chart template

https://vega.github.io/vega-lite/examples/bar.html

plots: test created html

We might not be aware what are our rendering limitations (#6894).
Even if we fix it (#6900) we need to manually test it in order to be sure the problem has been fixed.

It seems to me that we could limit such problems if we were able to somehow test our HTML.
In the cases mentioned above, if we were able to parse js code, execute it and get stderr output we could determine whether our visualization was successful.

Another thing to consider would be: how to check if the support for images works as intended?

This issue needs some research on that topic. Ideally, we would like to test produced HTML as any other tests.

Related: iterative/dvc#6944

`smooth` plots template broken

Followup iterative/vscode-dvc#3130

Screen.Recording.2023-01-21.at.3.59.55.PM.mov

Fill anchors instead of quoted anchors

In my custom vega-lite template, I tried do to something like this to compute the sum of squared residuals:

           "transform": [
                {
                    "joinaggregate": [
                        {
                            "op": "mean",
                            "field": "lab",
                            "as": "mean_y"
                        }
                    ]
                },
                {
                    "calculate": "pow(datum.<DVC_METRIC_Y> - datum.<DVC_METRIC_X>,2)",
                    "as": "SR"
                },
                {
                    "joinaggregate": [
                        {
                            "op": "sum",
                            "field": "SR",
                            "as": "SSR"
                        }
                    ]
                }
            ]

However, it seems impossible to do this because it looks like dvc-render replace only a quoted anchor like f'"{cls.anchor(name)}"' instead of just the anchor f'{cls.anchor(name)}'

dvc-render/src/dvc_render/vega_templates.py

Line 78 in 422bfe6

return f'"{cls.anchor(name)}"'

I didn't dig a lot into this, but I was just wondering if this could just be this way so we could use anchors on datum?

Thank you!

README

It's currently incomplete. Not informative. Noting this repo is public

The docs at https://docs.iterative.ai/dvc-render/ also don't provide any guidance, just an API ref.

Render inline in IPython

See iterative/dvclive#309

Show params in report

See iterative/dvclive#315

html: improve table

The table looks like this:

Ideas to clean it up:

Nest rows instead of showing raw json.
Make table tall instead of wide?
Clean up file extension. Not sure why .json becomes _json. Let's either keep it .json or strip the extension completely.
Consider a JS library like https://datatables.net/

Add matplotlib as a backend

Using https://mpld3.github.io/, it might be possible to have dvc plot matplotlib figures in HTML. This would be useful for both users who are accustomed to making plots in matplotlib but also for other backend libraries that use it for plotting, like shap.

CONTRIBUTING.md: add "Adding template" section

We should include "Adding template" section where we would guide potential contributors how to add template.
Main point there should be asking contributors to show the results of newly added template via vega editor

Support tooltips in plots

The fiftyone tool has a great precision-recall curve which includes tooltips with the threshold for a given precision/recall. It would be awesome to support tooltips and precision-recall curves are a common use case.

plot: implement more default templates

There are some plots that need to be implemented:

plots: expand legend character limit

We need to address this in the template:

                            "encoding": {
                                "color": {
                                    "type": "nominal",
                                    "field": "rev",
                                    "legend": {
                                        "labelLimit": 300
                                    }
                                }

Originally posted by @daavoo in iterative/dvc#7477 (comment)

`plots`: Add support for plotly as backend for render plots

plotly is a set of Open Source Graphing Libraries for building "I_nteractive charts and maps for Python, R, Julia, ggplot2, .NET, and MATLAB®_".

The "high level" concept is very similar to vega-lite (the current DVC plots backend): Both are javascript libraries based on d3.js using JSON to describe the plot "schema" and provide "bindings" to generate plots in different languages (altair would be the vega-lite Python equivalent). See a more detailed comparison

It would be nice to extend DVC plots to support plotly as an alternative backend. The following is a non-exhaustive list of what I consider advantages (in DVC context) of adding support to plotly:

Wider adoption

As a non exhaustive example, see differences between python bindings stats plotly / altair

Better default interactivity

Try plotly line chart / vega-lite line chart

This is especially relevant for some complex plots like iterative/dvc#4455 , where plotly provides many relevant interactions by default (i.e. reordering columns, selecting subsets) that seem quite complicated to add (if even possible) in vega-lite:

plotly parallel coordinates / vega-lite parallel coordinates

It seems fairly easy to implement 👼

After reviewing the internal dvc.render module and discussing it with @pared , it looks that it won't require too many changes on DVC to add support to plotly.

Edit by @dberenbaum to start a tasklist here of possible future plotly enhancements:

Tasks

Beta Give feedback

Better smoothing (#135)
Log-linear plots (#136)
Zooming and panning (iterative/vscode-dvc#4530)
Responsive sizing (iterative/vscode-dvc#3757)
Better / TB-like tooltips iterative/vscode-dvc#4532
Options

Set xlim/ylim in plot

See slack thread. It would be useful to have a way to set or adjust the min and max values of the axes.

`plots`: show path in html

Plots should show the path of the file or some other identifier for every plot. After iterative/dvc#7086, in which plots may not have a single path, this could be whatever key identifier is used (for example, train_vs_val instead of the path).

Support `markdown` as output format

Add the ability to generate a markdown report with the same content as the current HTML format.

Per iterative/cml#1036

Add new TableRenderer for `metrics`.

metrics are currently handled separately from the existing renderers and using a different input format:

dvc-render/src/dvc_render/html.py

Lines 53 to 66 in 3196929

    
           def with_metrics(self, metrics: Dict[str, Dict]) -> "HTML": 
        
               "Adds metrics element." 
        
               header: List[str] = [] 
        
               rows: List[List[str]] = [] 
        
               for _, rev_data in metrics.items(): 
        
                   for _, data in rev_data.items(): 
        
                       if not header: 
        
                           header.extend(sorted(data.keys())) 
        
                       rows.append([data[key] for key in header]) 
        
               self.elements.append(tabulate.tabulate(rows, header, tablefmt="html")) 
        
               return self

We could implement something like a TableRenderer and use it to render metrics. Making it use datapoints as input like the other renderers.

Story : Plots , UX for LLM support in Translation [1]

Submission Type

Feature Request
Discussion

Context

Neural Machine Translation Experiment Tracking scenario. Repo
ML Area : Transfer learning with LLMs. Fine-tunning of t5-small with opus100 dataset from HF
Use DVC VS Code extension with dvclive experiment tracking scenario

tensorflow-macos==2.9.0
tensorflow-metal==0.5.0
transformers==4.32.0.dev0
dvc==3.x.x
dvclive==TBD

Impact

Explainability of Transformer models with visualizations ( Attention Heads )
Plot text and translated text ( data string format) to evaluate empirically how well LLMs perform (scalable to another use cases)
Teach results at PyCon Spain Keynote 23
Comparison of two ML experiment tracking frameworks: mlflow VS dvc from DS perspective

Issue creator Goal

Discuss implementation
Offer support in other LLM use cases
Get help / possible template / contrib opportunity

1. Description

Attention Heads

What to plot and why ?

Attention Heads. It allows us to see how the words are mapping with respect to two different languages.
At a high level from the Machine Learning perspective it allows us to analyze how well verbs are translated, how well the model is understanding prepositions in one language to respect to another, etc... This visualization might be shown after fine-tunning.

Captura de pantalla 2023-08-07 a las 16 02 18

Code snippets useful for visualization

Main plot function . Previous Script

def plot_attention_head(in_tokens, translated_tokens, attention):
  # The model didn't generate `<START>` in the output. Skip it.
  translated_tokens = translated_tokens[1:]

  ax = plt.gca()
  ax.matshow(attention)
  ax.set_xticks(range(len(in_tokens)))
  ax.set_yticks(range(len(translated_tokens)))

  labels = [label.decode('utf-8') for label in in_tokens.numpy()]
  ax.set_xticklabels(
      labels, rotation=90)

  labels = [label.decode('utf-8') for label in translated_tokens.numpy()]
  ax.set_yticklabels(labels)

Questions :
Would it be possible to plot this in the plot section in the extension easily ?
From the DS standpoint, Vega is not the standard. In case of develop it by myself, any idea of how long do you estimate this would take ?
DO you know if something more interactive like bertviz is possible?

2. Description

Text

What to plot and why ?

The Goal is to plot the sentence and the translated sentence in the plots section in VSCode with the extension, but Im assuming that this would entail another template. In translation, the sentence in one language (input) and another language (output once the model is fine-tunned) , to evaluate empirically how well the model is translating a set of sentences that might be relevant.

Questions :
Can I log it via dvclive as well?

Thanks in advance!

Revert legend explicit positioning since it breaks VS Code

I suggest we revert #113. Plots have become unmanageable and I can't find a quick fix for this:

If there are more rev selected plot becomes smaller and smaller.

Here is the Vega config with the VS Code customizations (including an explicit instruction to disable the legend, but it's not working in this case):

```json { "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "data": { "values": [ { "step": 0, "accuracy": "0.41710779070854187", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy", "rev": "workspace" }, { "step": 1, "accuracy": "0.46896055340766907", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy", "rev": "workspace" }, { "step": 0, "accuracy": "0.47241726517677307", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy", "rev": "workspace" }, { "step": 1, "accuracy": "0.508525550365448", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy", "rev": "workspace" } ] }, "title": "dvc.yaml::accuracy", "width": 300, "height": 300, "params": [ { "name": "smooth", "value": 0.001, "bind": { "input": "range", "min": 0.001, "max": 1, "step": 0.001 } } ], "layer": [ { "mark": "line", "encoding": { "x": { "field": "step", "type": "quantitative", "title": "step" }, "y": { "field": "accuracy", "type": "quantitative", "title": "accuracy", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal", "legend": { "orient": "top", "direction": "vertical" } } }, "transform": [ { "loess": "accuracy", "on": "step", "groupby": [ "rev", "filename", "field", "filename::field" ], "bandwidth": { "signal": "smooth" } } ] }, { "mark": { "type": "point", "tooltip": { "content": "data" } }, "encoding": { "x": { "field": "step", "type": "quantitative", "title": "step" }, "y": { "field": "accuracy", "type": "quantitative", "title": "accuracy", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal" } } } ], "encoding": { "color": { "legend": { "disable": true }, "scale": { "domain": [ "workspace" ], "range": [ "#945dd6" ] } }, "strokeDash": { "field": "filename", "scale": { "domain": [ "evaluation/plots/metrics/eval/accuracy.tsv", "evaluation/plots/metrics/train/accuracy.tsv" ], "range": [ [ 1, 0 ], [ 8, 8 ] ] }, "legend": { "disable": true } } } } ```

`plots`: Add "group by" templates

However, I think that the user in iterative/studio-support#23 (comment) showcases a new scenario that we haven't discussed so far in #5980 (reply in thread) . It is basically using the values of a third column (z) to group the values of a single column (y) and the template looks very clean.

I think this could even be a separate issue to add grouping by a categorical column. We could add a template like linear_categories and add the DVC_METRIC_COLOR_LABEL. We could also add a similar template for the scatter plot (and maybe others like bar plots in the future).

Originally posted by @dberenbaum in iterative/dvc#6316 (comment)

markdown: Use https://github.com/vega/vl-convert

Official vega project with Python bindings to convert plots to images

Using CSS in header breaks html templating

When using a custom html template for rendering plots I wanted to add some layout-directives by virtue of adding some CSS directives in the header of my template similar to this:

<header>
  /* other stuff */
  <style>
    .grid {/*settings*/}
  </style>
</header>

This breaks the templating approach used by dvc_render, the offending function is HTML.embed() line 86 @ref

The usage of .format on the raw string of the template assumes that everything in curly braces will be filled by the given kwargs, which seems unnecessarily restricted and should - if actually wanted - be documented either here or in the corresponding documentation of dvc.

The line in question could be replaced by an iteration over the valid kwargs where each of them is used for a replace operation.
Slightly less efficient, as there will be multiple passes over the content, but way more resilient to bad usage.

# replace 
return self.template.format(**kwargs)
# with something along the lines of
for placeholder, value in kwargs.items():
    self.template = self.template.replace("{"+placeholder+"}", value)
return self.template

I would be willing to quickly write the specific changes and make a PR, but I am unsure about how to proceed with updating the corresponding unit-test, which uses the same .format and therefore fails on the same issue.
Should I just adjust the test to expect an explicit version of the string?

	[options.extras_require]
	table =
	tabulate>=0.8.7
	markdown =
	%(table)s
	matplotlib

	def with_metrics(self, metrics: Dict[str, Dict]) -> "HTML":
	"Adds metrics element."
	header: List[str] = []
	rows: List[List[str]] = []

	for _, rev_data in metrics.items():
	for _, data in rev_data.items():
	if not header:
	header.extend(sorted(data.keys()))

	rows.append([data[key] for key in header])

	self.elements.append(tabulate.tabulate(rows, header, tablefmt="html"))
	return self

iterative / dvc-render Goto Github PK

dvc-render's Introduction

dvc-render

Features

Requirements

Installation

Usage

Contributing

License

Issues

dvc-render's People

Contributors

Stargazers

Watchers

Forkers

dvc-render's Issues

Which operating system and Python version are you using?

Which version of this project are you using?

What did you do?

What did you expect to see?

What did you see instead?

Problem Source

Submission Type

Context

Impact

Issue creator Goal

Tasks

Tasks

Submission Type

Context

Impact

Issue creator Goal

1. Description

Attention Heads

What to plot and why ?

Code snippets useful for visualization

2. Description

Text

What to plot and why ?

Recommend Projects

Recommend Topics

Recommend Org