Giter Site home page Giter Site logo

dvc-render's Introduction

dvc-render

PyPI Status Python Version License

Tests Codecov pre-commit Black

dvc-render is a library for rendering data stored in DVC plots format into different output formats, like Vega. It can also generate HTML and MarkDown reports containing multiple plots.

It is used internally by DVC, DVCLive, and Studio.

Features

  • Renderers

Take data stored in DVC plots format alongside plot properties in order to render a plot in different formats.

  • Reports

Take multiple renderers and build an HTML or MarkDown report.

  • Templates

Support for rendering Vega plots using custom of pre-defined templates.

Requirements

The basic usage of rendering Vega Plots doesn't have any dependencies outside Python>=3.8.

Additional features are specified as optional requirements:

dvc-render/setup.cfg

Lines 27 to 32 in 49b8f8a

[options.extras_require]
table =
tabulate>=0.8.7
markdown =
%(table)s
matplotlib

Installation

You can install DVC render via pip from PyPI:

$ pip install dvc-render

Usage

  • Renderer & Templates
from dvc_render import VegaRenderer
properties = {"template": "confusion", "x": "predicted", "y": "actual"}
 datapoints = [
     {"predicted": "B", "actual": "A"},
     {"predicted": "A", "actual": "A"},
 ]

renderer = VegaRenderer(datapoints, "foo", **properties)
plot_content = renderer.get_filled_template()

plot_content contains a valid Vega plot using the confusion matrix template.

  • Report
from dvc_render import render_html
render_html([renderer], "report.html")

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, DVC render is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

dvc-render's People

Contributors

alexk101 avatar daavoo avatar daniel-falk avatar dberenbaum avatar dependabot[bot] avatar github-actions[bot] avatar hv10 avatar karajan1001 avatar mattseddon avatar pared avatar pre-commit-ci[bot] avatar shcheklein avatar sisp avatar skshetry avatar tibor-mach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dvc-render's Issues

template filling: allow filling template on no datapoints

Currently, we return empty string if no VegaRender.datapoints are empty. Studio has to provide dummy datapoints to have pre-rendered template. Also modifying this behavior should be more in line with our do the most we can plotting policy.

`html`: Improve aesthetics

Current look:

Screenshot 2022-03-31 at 17-03-41 DVC Plot

Here are some ideas:

  • Include some DVC or Iterative branding (logo, background colors, style)
  • Improve plots layout , similar to Studio? (i.e. use card style container for each plot)
  • Add sidebar tree view to enable/disable plots
  • #6
  • #22

Invisible lines :bug:

Which operating system and Python version are you using?

  • Operating System: Arch Linux
  • Kernel: Linux 6.6.10-arch1-1
  • Architecture: x86-64
  • Python: 3.9.18

Which version of this project are you using?

dvc-render 1.0.0

What did you do?

"""Minimal working example to reproduce bug."""

from dvc_render import VegaRenderer, render_html

if __name__ == "__main__":
    data = [
        {"x": "-1", "y": "-1", "rev": "a"},
        {"x": "1", "y": "1", "rev": "a"},
        {"x": "-1", "y": "1", "rev": "b"},
        {"x": "1", "y": "-1", "rev": "b"},
    ]

    properties = {
        "x": "x",
        "y": "y",
        "title": "mwe",
        "template": "simple",
    }

    render_html([VegaRenderer(data, "foo", **properties)], "report.html")

What did you expect to see?

Two lines colored according to revs a and b.

lines_visible

What did you see instead?

Nothing, lines were invisible.

lines_invisible

HTML that was produced using render_html.

<!doctype html>
<html>
  <head>
    <title>DVC Plot</title>

    <script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>
    <script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>
    <script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>

    <style>
      table {
        border-spacing: 15px;
      }
    </style>
  </head>
  <body>
    <div id="foo">
      <script type="text/javascript">
        var spec = {
          $schema: "https://vega.github.io/schema/vega-lite/v5.json",
          data: {
            values: [
              { x: "-1", y: "-1", rev: "a" },
              { x: "1", y: "1", rev: "a" },
              { x: "-1", y: "1", rev: "b" },
              { x: "1", y: "-1", rev: "b" },
            ],
          },
          title: "mwe",
          params: [{ name: "grid", select: "interval", bind: "scales" }],
          width: 300,
          height: 300,
          mark: { type: "line", tooltip: { content: "data" } },
          encoding: {
            x: { field: "x", type: "quantitative", title: "x" },
            y: {
              field: "y",
              type: "quantitative",
              title: "y",
              scale: { zero: false },
            },
            color: { field: "rev", scale: { domain: [], range: [] } },
            strokeDash: {},
            tooltip: [{ field: "rev" }, { field: "x" }, { field: "y" }],
          },
        };
        vegaEmbed("#foo", spec);
      </script>
    </div>
  </body>
</html>

Problem Source

I managed to boil it down to this.

color: { field: "rev", scale: { domain: [], range: [] } }

After removing the empty scale field the HTML plot was working as intended.

anchors: don't auto - fill labels on complex config

We should not set x_label if x is not a string.
The same goes for y.
Studio cannot fill in the data and generate flat datapoints before passing the data to frontend. Only when we have flat datapoints we can be sure that we can infer the {}_label.

Research JavaScript wrapper

I now wonder if this use case is not another point for dvc-render to have some support for javascript. If there is some pythonic wrapper for js maybe we could leverage vega to produce images in non-linear template use cases. But I guess that would require some research to determine this idea's feasibility.

Originally posted by @pared in #69 (review)

plots: require templates to be JSON and treat them as such

In our current code, we assume that the template might be in any string format and so we substitute the anchors as strings. We should really start requiring the templates to be JSON, so we could treat them as dicts internally and have a better control over the special vega options. E.g. smoothing from iterative/dvc#3906 , could be handled not with a separate template but a special plot property (iterative/dvc#3906 (comment)).

  • Require templates to be in JSON format
  • handle template contents (Template.content currently might be str or dict) as dicts
  • add plot smoothing option (might be extracted into as separate issue later, depending on complexity)

Story : Plots , Metrics for LLM support [2]

Submission Type

  • Discussion

Context

Offer ML support for another use cases beyond Translation, that could mostly imply text generation (Q/A , summarization, etc)
Separated from #137 in other to focus on solving plots and one use case and offer possible support for others.

Impact

  • Offer support

Issue creator Goal

  • Offer support

Leaving a placeholder here to possible questions.
Will prob submit some q and a coming from discussion in the meantime
Thanks!

Tasks

tests: integration

We need some end-to-end tests verifying that provided renderers gets converter to expected HTML. So far we only have those for parallel coordinates plot.

plots: test created html

We might not be aware what are our rendering limitations (#6894).
Even if we fix it (#6900) we need to manually test it in order to be sure the problem has been fixed.

It seems to me that we could limit such problems if we were able to somehow test our HTML.
In the cases mentioned above, if we were able to parse js code, execute it and get stderr output we could determine whether our visualization was successful.

Another thing to consider would be: how to check if the support for images works as intended?

This issue needs some research on that topic. Ideally, we would like to test produced HTML as any other tests.

Related: iterative/dvc#6944

Fill anchors instead of quoted anchors

In my custom vega-lite template, I tried do to something like this to compute the sum of squared residuals:

           "transform": [
                {
                    "joinaggregate": [
                        {
                            "op": "mean",
                            "field": "lab",
                            "as": "mean_y"
                        }
                    ]
                },
                {
                    "calculate": "pow(datum.<DVC_METRIC_Y> - datum.<DVC_METRIC_X>,2)",
                    "as": "SR"
                },
                {
                    "joinaggregate": [
                        {
                            "op": "sum",
                            "field": "SR",
                            "as": "SSR"
                        }
                    ]
                }
            ]

However, it seems impossible to do this because it looks like dvc-render replace only a quoted anchor like f'"{cls.anchor(name)}"' instead of just the anchor f'{cls.anchor(name)}'

return f'"{cls.anchor(name)}"'

I didn't dig a lot into this, but I was just wondering if this could just be this way so we could use anchors on datum?

Thank you!

html: improve table

The table looks like this:

Screenshot 2023-02-07 at 11 06 14 AM

Ideas to clean it up:

  1. Nest rows instead of showing raw json.
  2. Make table tall instead of wide?
  3. Clean up file extension. Not sure why .json becomes _json. Let's either keep it .json or strip the extension completely.
  4. Consider a JS library like https://datatables.net/

Add matplotlib as a backend

Using https://mpld3.github.io/, it might be possible to have dvc plot matplotlib figures in HTML. This would be useful for both users who are accustomed to making plots in matplotlib but also for other backend libraries that use it for plotting, like shap.

Support tooltips in plots

The fiftyone tool has a great precision-recall curve which includes tooltips with the threshold for a given precision/recall. It would be awesome to support tooltips and precision-recall curves are a common use case.

pr_curve

`plots`: Add support for plotly as backend for render plots

plotly is a set of Open Source Graphing Libraries for building "I_nteractive charts and maps for Python, R, Julia, ggplot2, .NET, and MATLAB®_".

The "high level" concept is very similar to vega-lite (the current DVC plots backend): Both are javascript libraries based on d3.js using JSON to describe the plot "schema" and provide "bindings" to generate plots in different languages (altair would be the vega-lite Python equivalent). See a more detailed comparison

It would be nice to extend DVC plots to support plotly as an alternative backend. The following is a non-exhaustive list of what I consider advantages (in DVC context) of adding support to plotly:

  • Wider adoption

As a non exhaustive example, see differences between python bindings stats plotly / altair

  • Better default interactivity

Try plotly line chart / vega-lite line chart

This is especially relevant for some complex plots like iterative/dvc#4455 , where plotly provides many relevant interactions by default (i.e. reordering columns, selecting subsets) that seem quite complicated to add (if even possible) in vega-lite:

plotly parallel coordinates / vega-lite parallel coordinates

  • It seems fairly easy to implement 👼

After reviewing the internal dvc.render module and discussing it with @pared , it looks that it won't require too many changes on DVC to add support to plotly.

Edit by @dberenbaum to start a tasklist here of possible future plotly enhancements:

Tasks

`plots`: show path in html

Plots should show the path of the file or some other identifier for every plot. After iterative/dvc#7086, in which plots may not have a single path, this could be whatever key identifier is used (for example, train_vs_val instead of the path).

Add new TableRenderer for `metrics`.

metrics are currently handled separately from the existing renderers and using a different input format:

def with_metrics(self, metrics: Dict[str, Dict]) -> "HTML":
"Adds metrics element."
header: List[str] = []
rows: List[List[str]] = []
for _, rev_data in metrics.items():
for _, data in rev_data.items():
if not header:
header.extend(sorted(data.keys()))
rows.append([data[key] for key in header])
self.elements.append(tabulate.tabulate(rows, header, tablefmt="html"))
return self

We could implement something like a TableRenderer and use it to render metrics. Making it use datapoints as input like the other renderers.

Story : Plots , UX for LLM support in Translation [1]

Submission Type

  • Feature Request
  • Discussion

Context

Neural Machine Translation Experiment Tracking scenario. Repo
ML Area : Transfer learning with LLMs. Fine-tunning of t5-small with opus100 dataset from HF
Use DVC VS Code extension with dvclive experiment tracking scenario

tensorflow-macos==2.9.0
tensorflow-metal==0.5.0
transformers==4.32.0.dev0
dvc==3.x.x
dvclive==TBD

Impact

  • Explainability of Transformer models with visualizations ( Attention Heads )
  • Plot text and translated text ( data string format) to evaluate empirically how well LLMs perform (scalable to another use cases)
  • Teach results at PyCon Spain Keynote 23
  • Comparison of two ML experiment tracking frameworks: mlflow VS dvc from DS perspective

Issue creator Goal

  • Discuss implementation
  • Offer support in other LLM use cases
  • Get help / possible template / contrib opportunity

1. Description

Attention Heads

What to plot and why ?

Attention Heads. It allows us to see how the words are mapping with respect to two different languages.
At a high level from the Machine Learning perspective it allows us to analyze how well verbs are translated, how well the model is understanding prepositions in one language to respect to another, etc... This visualization might be shown after fine-tunning.

Captura de pantalla 2023-08-07 a las 16 02 18

Code snippets useful for visualization

Main plot function . Previous Script

def plot_attention_head(in_tokens, translated_tokens, attention):
  # The model didn't generate `<START>` in the output. Skip it.
  translated_tokens = translated_tokens[1:]

  ax = plt.gca()
  ax.matshow(attention)
  ax.set_xticks(range(len(in_tokens)))
  ax.set_yticks(range(len(translated_tokens)))

  labels = [label.decode('utf-8') for label in in_tokens.numpy()]
  ax.set_xticklabels(
      labels, rotation=90)

  labels = [label.decode('utf-8') for label in translated_tokens.numpy()]
  ax.set_yticklabels(labels)

Questions :
Would it be possible to plot this in the plot section in the extension easily ?
From the DS standpoint, Vega is not the standard. In case of develop it by myself, any idea of how long do you estimate this would take ?
DO you know if something more interactive like bertviz is possible?

2. Description

Text

What to plot and why ?

The Goal is to plot the sentence and the translated sentence in the plots section in VSCode with the extension, but Im assuming that this would entail another template. In translation, the sentence in one language (input) and another language (output once the model is fine-tunned) , to evaluate empirically how well the model is translating a set of sentences that might be relevant.

Questions :
Can I log it via dvclive as well?

Thanks in advance!

Revert legend explicit positioning since it breaks VS Code

I suggest we revert #113. Plots have become unmanageable and I can't find a quick fix for this:

Screenshot 2023-02-04 at 4 43 26 PM

Screenshot 2023-02-04 at 3 50 07 PM

If there are more rev selected plot becomes smaller and smaller.

Here is the Vega config with the VS Code customizations (including an explicit instruction to disable the legend, but it's not working in this case):

```json { "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "data": { "values": [ { "step": 0, "accuracy": "0.41710779070854187", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy", "rev": "workspace" }, { "step": 1, "accuracy": "0.46896055340766907", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/train/accuracy.tsv", "field": "accuracy", "rev": "workspace" }, { "step": 0, "accuracy": "0.47241726517677307", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy", "rev": "workspace" }, { "step": 1, "accuracy": "0.508525550365448", "dvc_data_version_info": { "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy" }, "filename": "evaluation/plots/metrics/eval/accuracy.tsv", "field": "accuracy", "rev": "workspace" } ] }, "title": "dvc.yaml::accuracy", "width": 300, "height": 300, "params": [ { "name": "smooth", "value": 0.001, "bind": { "input": "range", "min": 0.001, "max": 1, "step": 0.001 } } ], "layer": [ { "mark": "line", "encoding": { "x": { "field": "step", "type": "quantitative", "title": "step" }, "y": { "field": "accuracy", "type": "quantitative", "title": "accuracy", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal", "legend": { "orient": "top", "direction": "vertical" } } }, "transform": [ { "loess": "accuracy", "on": "step", "groupby": [ "rev", "filename", "field", "filename::field" ], "bandwidth": { "signal": "smooth" } } ] }, { "mark": { "type": "point", "tooltip": { "content": "data" } }, "encoding": { "x": { "field": "step", "type": "quantitative", "title": "step" }, "y": { "field": "accuracy", "type": "quantitative", "title": "accuracy", "scale": { "zero": false } }, "color": { "field": "rev", "type": "nominal" } } } ], "encoding": { "color": { "legend": { "disable": true }, "scale": { "domain": [ "workspace" ], "range": [ "#945dd6" ] } }, "strokeDash": { "field": "filename", "scale": { "domain": [ "evaluation/plots/metrics/eval/accuracy.tsv", "evaluation/plots/metrics/train/accuracy.tsv" ], "range": [ [ 1, 0 ], [ 8, 8 ] ] }, "legend": { "disable": true } } } } ```

`plots`: Add "group by" templates

However, I think that the user in iterative/studio-support#23 (comment) showcases a new scenario that we haven't discussed so far in #5980 (reply in thread) . It is basically using the values of a third column (z) to group the values of a single column (y) and the template looks very clean.

I think this could even be a separate issue to add grouping by a categorical column. We could add a template like linear_categories and add the DVC_METRIC_COLOR_LABEL. We could also add a similar template for the scatter plot (and maybe others like bar plots in the future).

Originally posted by @dberenbaum in iterative/dvc#6316 (comment)

Using CSS in header breaks html templating

When using a custom html template for rendering plots I wanted to add some layout-directives by virtue of adding some CSS directives in the header of my template similar to this:

<header>
  /* other stuff */
  <style>
    .grid {/*settings*/}
  </style>
</header>

This breaks the templating approach used by dvc_render, the offending function is HTML.embed() line 86 @ref

The usage of .format on the raw string of the template assumes that everything in curly braces will be filled by the given kwargs, which seems unnecessarily restricted and should - if actually wanted - be documented either here or in the corresponding documentation of dvc.

The line in question could be replaced by an iteration over the valid kwargs where each of them is used for a replace operation.
Slightly less efficient, as there will be multiple passes over the content, but way more resilient to bad usage.

# replace 
return self.template.format(**kwargs)
# with something along the lines of
for placeholder, value in kwargs.items():
    self.template = self.template.replace("{"+placeholder+"}", value)
return self.template

I would be willing to quickly write the specific changes and make a PR, but I am unsure about how to proceed with updating the corresponding unit-test, which uses the same .format and therefore fails on the same issue.
Should I just adjust the test to expect an explicit version of the string?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.