rmadar / tree-gonalyzer Goto Github PK

Tool in go to produce publication-quality plot from ROOT TTree

License: BSD 3-Clause "New" or "Revised" License

Go 100.00%

tree-gonalyzer's Introduction

TTree GOnalyzer

This is a tool written in go to produce publication-quality plots from ROOT TTrees in an flexible and easy way. This tool is built on top of go-hep.org. The main supported features are:

histograming variables over many samples and selections,
displaying one or several signals (overlaid or stacked),
sample normalisation using cross-section and/or luminosity and/or number of generated events,
computing of new variables of arbitrary complexity,
joint trees to the main one, as in TTreeFriend,
dumping TTree's with float64 and []float64 branches,
concurent sample processings.

In a nutshell

// Define samples
samples := []*ana.Sample{
	ana.CreateSample("data", "data", `Data`, "data.root", "mytree"),
	ana.CreateSample("bkg1", "bkg", `Proc 1`, "proc1.root", "mytree"),
	ana.CreateSample("bkg2", "bkg", `Proc 2`, "proc2.root", "mytree"),
	ana.CreateSample("bkg3", "bkg", `Proc 3`, "proc3.root", "mytree"),
}

// Define variables
variables := []*ana.Variable{
	ana.NewVariable("plot1", ana.TreeVarBool("branchBool"), 2, 0, 2),
	ana.NewVariable("plot2", ana.TreeVarF32("branchF32"), 25, 0, 1000),
	ana.NewVariable("plot3", ana.TreeVarF64("branchF64"), 50, 0, 1000),
}

// Create analyzer object with some options
analyzer := ana.New(samples, variables, ana.WithHistoNorm(true))

// Produce plots and dump trees
analyzer.Run()

Gallery



Data/Background [code]	Unstacked signals [code]	Stacked signals [code]

Shape distortion [code]	Shape comparison [code]	Systematic variation [code]

Performances

For 2M events and 60 variables, a comparison with similar ROOT-based code (using t->Draw()) gives:

ROOT -> 6 ms/kEvts
GOHEP -> 2 ms/kEvts

For 2M event and one variable (avoiding t->Draw() repetition)

ROOT -> 0.4 ms/kEvts
GOHEP -> 0.1 ms/kEvts

tree-gonalyzer's People

Contributors

Watchers

tree-gonalyzer's Issues

Make the list of implemented `TreeFunc` function visible from the doc

Interface user-defined Formula with TreeFunc

The goal is to have a user API which is the same whether or not the function is a 'user-defined' (wrt groot) in ana package or not.

We could also add a warning if one function is actually not user-defined, like in go-hep/hep#740 (comment).

Improve user-defined rfunc management

store them in a different file at least, maybe another sub-package?
enable automatic generation of the code (using the sames tools as go-hep)
use a map to select the proper rfunc instead of a endless switch, like in go-hep, e.g. here

Consider having an interface for basic types of TreeFunc

Problem

Currently, we have:

// This will compile
f1 := ana.NewCutBool("branchBool") // f.Fct returns a bool
v1OK := treeFunc.GetFuncBool(r)    // Returning a func() bool
v1NotOK := treeFunc.GetFuncF64(r)  // Will crash, since asserted as func() float64

// This will also compile
f2 := ana.NewVarBool("branchBool") // f.Fct returns a float64 (needed for plotting)
v2OK := treeFunc.GetFuncBool(r)    // Will crash, since asserted as func() bool
v2NotOK := treeFunc.GetFuncF64(r)  // Returning a func() float64

This is quite confusing and not very safe from the user point of view.

Proposed solution

In the TreeFunc type, one could have instead something like:

type TreeFunc struct {
   VarsName []string,
   Fct      interface{}
} 

// Internal interface
tfunc interface {
   GetFunc(r reader)  // BUT: which returned type here? We might need to to type 
                      // assertion later in any case ...
}

// Internal Bool type
type outputBool TreeFunc
func (*outputBool) GetFunc(r reader) {
  return cutBool.GetFuncBool(r)  // i.e. formula.Func() type-assert as func() bool
}

// Internal F64 type
type outputF64 TreeFunc
func (*outputF64l) GetFunc(r reader) {
  return cutBool.GetFuncF64(r) // i.e. formula.Func() type-assert as func() float64
}

// User New function for boolean output
// BUT: wIll the output will be recognized as TreeFunc then?
func NewCutBool(v string) outputBool {
          return TreeFunc{ 
                VarsName: []string{v},
                Fct:      func(x bool) bool { return x },
        }
}

// User New function for float64 output
// BUT: wIll the output will be recognized as TreeFunc then?
func NewVarBool(v string) outputF64 {
          return TreeFunc{ 
                VarsName: []string{v},
                Fct: func(x bool) float64 { 
                          return map[bool]float64{false: 0, true: 1}[x]
                },
        }
}

Then, from the user point of view, it would look like:

f1 := ana.NewCutBool("branchBool") // f.Fct returns a bool
v1 := treeFunc.GetFunc(r)          // Returning a func() bool

f2 := ana.NewVarBool("branchBool") // f.Fct returns a float64 for plotting
v2 := treeFunc.GetFunc(r)          // Returning a func() float64

Questions

Which returned type to give to GetFunc() in the interface? And do we need anyway a type assertion later (coming back to the original problem)?
Will outputF64 (or outputBool) be recognized as a TreeFunc in the user code (I want to keep only this type exported).

(ping to @sbinet for well educated advises ... one more time! Nothing urgent though: this is "just" to make code nicer.)

Add concurrent processing of samples

Add support for slices as Variable w/o rtree.FormulaFunc

Try the Value field to be make([]float64), and do type assertions in the event loop (or better, outside) to decide how to fill the histogram.

Add a fill style for the error band

This can be done once gonum/plot#564 to be implemented.

Define signal sample behaviour

Add a new 'composition' plot style

Typical wanted plot

Strategy

From the user interface side, this could go with an option (to be choosen) such as:

ana.WithFraction(true)
ana.WithBinwiseNorm(true)
ana.WithBinCompo(true)
ana.WithComposition(true)

If this option is enabled, then each bin of the distribution will be normalized to 100% and stack mode will be forced to true.

@sbinet, I am not sure what approach should be used from hbook/hplot point of view. In ROOT, I would use TH1, grabe all values for a given bin using h.GetBinContent(ib), compute my fractions and "fill" (or rather set the bin content) of my new histograms with h.SetBinContent(ib) method. As far as I understand, there is no SetBinContent() method in hbook (since the link between error and value cannot be broken).

Manage subsamples

Transform the type Sample to be a slice of trees/files/cuts/weights, with a default length of 1 but with all cosmetic features in common.

Need to add a function to create a new Sample from sub-sample (maybe via an unexported type subSample?)

Fix the behaviour of the ratio plot

Add an ana.Maker option to set y-axis range.
For now:
- stacked: ok with data but not OK without data (we would want only the total ratio to itself, not the dots at 0 - might be ok with 1. though)
- not stacked: not ok with data, but ok without data (not clear what ratios are in that case)

This might well be related to this comment in Maker.go (int the if ana.RatioPlot part):

 default:                                                                                                              
   // [FIX-ME 0 (rmadar)] Ratio wrt data (or 1 bkg if data is empty) -> to be specied as an option?
   // [FIX-ME 1 (rmadar)] loop is over bhBkgs_postnorm while 'ana.Samples[is]' runs also over data. 
for is, h := range bhBkgs_postnorm {
...
}

Possible option to dump a tree

Re-organize type definitions

Put analyzer type on one side and the others (sample, variable, selection) in the same module.

Add a plot gallery to the README

Add a normalisation type

Containing xsection, ngen and lumi

Add a selection type and and loop over cuts inside the event loop

This would allow to perform plots for many selection within one event loop (needs go-hep/hep#634 as well)

sample: cosmetic options on top of autoStyle

One idea could be to store the config of the sample as internal field of sample, and apply it at the maker.PlotHisto() level after applyAutoStyle()

Move the perf in the main README

Switch to new formula by default

Re-factorize / improve `PlotVariables()` function

write one function which plots a single variable for a given cut.
break down the different steps: normalization, stack, ratio plot etc ...
use concurrency to plot several variables in the same time.

Add a log scale as variable option

Improve error band plotting in log scale

Discussed in go-hep/hep#751

Check needed field for each object

For e.g. the field Type of sample is needed. Panic if not set.

Improve benchmarking

group variable per type
write a ROOT code based on TTreeReader, with one event loop (not a t->Draw())

TreeFunc: update protection & doc

Add protection in maker.FillHisto() on type assertion for func. (Re-design the code maybe, removing getFuncT() ... Or think how to put a nice protection and print the variable that fails)
update doc with allowed returned types
add a withSlice() option for a variable

Add New function for sample and maker

This will allow a proper management of default values, as well as make sure that needed fields are filled (replace #8). One way could be to use the functional argument pattern:

func NewSample(fname, tname, type, leg string, opts ...ana.Options) Sample {

  // Required fields
  s := Sample{
       FileName: fname, 
       TreeName: tname,
       Type:     type, 
       LegLabel: leg,
  }

  // Configuration with defaults values for all optional fields
  cfg = ana.newCfg()

 // Update the configuration looping over functional options
 for _, opt : range opts {
   cfg = opt(cfg)
 }

 // Go through all fields and fill them with the cfg object
 s.CircleSize = cfg.CircleSize
 s.CircleMarkers = cfg.CircleMarkers 
 ...
}

The functional options might look like:

func (*cfg ana.Config) CircleSize(s float) {
 cfg.CircleSize = s
}

func (*cfg ana.Config) CutFunc(f TreeFunc) {
  cfg.CutFunc = f
}

And the declaration of the sample would look like:

s := ana.NewSample("file.root",  "tree",  "data", 
                   ana.CircleSize(2),
                   ana.CircleMarkers(true), 
                   ana.CutFormula(ana.TreeFunc{
                        VarsNames: []string{"pt", "eta"},
                        Fct: func(pt, eta float64) float64 {return pt+eta},
                   }),
)

Option to plot total bkg

Add reading of configuration files

Using JSON format to define samples, variables, selections to - at the end - run a single executable. Something like:

./run_ana --samples spl.JSON --variables var.JSON --selection sels.JSON

Print a list of slow TreeFunc functions in the report

These are the rtree.formula which are not already coded into rfunc package of go-hep, nor as TreeFunc private types.

Fixe slice reading

This is being discussed in go-hep/hep#736

Add total histogram with error band

It would be plotted on top of everything (but before data), and the error band can be done probably manually with band - but it might be good to incorporate it in go-hep/plot (?)

Re-organize PlotHistos() function

The current logic is a bit convoluted between hbook.H1D and hplot.H1D, when computing total background histogram and ratio to data. This could be better organize, I think.

Change the way the data style is set

Currently, one cannot pass linewidth or color ... Everything should be done in the s.CreateHisto() function, with default values for markersize, color and caps (but withCircle at false).

Add the possibility to stack samples

Add an option to stack backgrounds together or plot them separately. In the later case, force a normalization of histogram to 1.0

Considering a support TTreeFiend?

This would depend mostly on whether it is possible/easy to add in go-hep/hep (@sbinet?). I know it might not be an everyday use case but it is a quite nice feature to play around.

Considering changing TreeFunc API names

Maybe trying to change

v := ana.NewVarF64("branch")

into one of those:

v := ana.TreeVarF64("branch")
v := ana.NewTvarF64("branch")

to make clearer the difference between ana.Variable type and ana.TreeFunc type.

Try to add a DumpTree-only mode

Speed to be compared

Implement a ratio plot

Based on this example: https://github.com/go-hep/hep/tree/master/hplot#diff-plots
Possibly add an error band using band: https://github.com/go-hep/hep/tree/master/hplot#band-between-lines

Add latex examples

With automatic png conversion for the README

Add sample type

Make the tool able to distinguish between data, background and signal

Test hstack existance for its band (crashing when no background samples)

tree-gonalyzer/ana/varplotter.go

Line 119 in 6ab6c52

hBand.Band = stack.Band

TreeFunc: perform type assertion once

Add a internal field of rtree.Formula to store the type assertion result (and do it once).
print a warning for non optimized rtree.Formula

Kind of infinite loop when an hist is empty

The program keeps running when no event passes selection, for a given sample. The problem might be into the ratio computation, but not even sure.

Option to change color of error band per sample

Add a cutflow subpackage

First clean-up:

re-organize the ana part in a single directory (put ana-demo - if kept? - and ana-perf into the ana directory itself).
create a new directory cutflow

Structure of the code, based on this:

intput: file, tree, event model, definition of a cut sequence, weights definition (?)
ouput: ASCI table showing yields and/or absolute and/or relative efficiencies

Improve TickFormat() option

Split x-axis and y-axis into different options
add the number of ticks as argument of the option too
add some example featuring few typical cases

For thes examples, see discussion in go-hep/hep#629

Add a weight for each sample

Currently, a overall weight can be added, but to add one with a branch (combination), one would need go-hep/#634 to be implemented.

Protection in log scale for negative bin content

Problem

This can happen when the binning is very fine and/or the statistics is very low, and when event might have negative weights (typical in NLO calculations). This crashes in case of log scale for the y-axis, so a protection should be added. For example:

will leads to the following error message in log scale:

panic: Values must be greater than 0 for a log scale.

goroutine 1202 [running]:
gonum.org/v1/plot.LogTicks.Ticks(0xc03082e31d099e4f, 0x40ac1368b65e223a, 0x10, 0x1, 0x1)
	/users/divers/atlas/madar/go/pkg/mod/gonum.org/v1/[email protected]/axis.go:523 +0x5c0
gonum.org/v1/plot.verticalAxis.size(0xc03082e31d099e4f, 0x40ac1368b65e223a, 0xaa32f8, 0x10, 0x4014000000000000, 0xf377a0, 0xc0004e64f8, 0x402c000000000000, 0xa9f106, 0x9, ...)
	/users/divers/atlas/madar/go/pkg/mod/gonum.org/v1/[email protected]/axis.go:332 +0x85
gonum.org/v1/plot.(*Plot).DataCanvas(0xc0001ded80, 0xf59820, 0xc000914cf0, 0x4018000000000000, 0x4064000000000000, 0x407a000000000000, 0x4071a78000000000, 0x0, 0x0, 0x0, ...)

[...]

Proposed solutions

Set the bin content of histograms to a minimum (positive) value for bins with negative content. Not sure it's possible with the current hbook.H1D or hplot.H1D. So, maybe option 2 is better ...
Ignore these bins when plotting in log scale - but this should be done at the level of https://github.com/go-hep/hep

Add a function to plot a given selection/sample/variable

It could be convenient to access individual hbook and/or hplot histograms programatically after the step ana.FillHistos(). This can be done by a couple of functions:

// Get just histo
func (*ana Maker) GethplotH1D(sampleName, varName, selName string) *hplot.H1D {
}

// Get just histo
func (*ana Maker) GethbookH1D(sampleName, varName, selName string) *hbook.H1D {
}

// Get direct plot
func (*ana Maker) GetPlot(varName, selName string) *hplot.plt {
}

This would require to first map every names into an index, using pre-defined function getIdxMap(obj interface{}) filling fields cutIdx, samIdx and varIdx.

rmadar / tree-gonalyzer Goto Github PK

tree-gonalyzer's Introduction

TTree GOnalyzer

In a nutshell

Gallery

Performances

tree-gonalyzer's People

Contributors

Watchers

tree-gonalyzer's Issues

Problem

Proposed solution

Questions

Typical wanted plot

Strategy

Problem

Proposed solutions

Recommend Projects

Recommend Topics

Recommend Org