Giter Site home page Giter Site logo

tree-gonalyzer's Introduction

TTree GOnalyzer

Documentation

This is a tool written in go to produce publication-quality plots from ROOT TTrees in an flexible and easy way. This tool is built on top of go-hep.org. The main supported features are:

  • histograming variables over many samples and selections,
  • displaying one or several signals (overlaid or stacked),
  • sample normalisation using cross-section and/or luminosity and/or number of generated events,
  • computing of new variables of arbitrary complexity,
  • joint trees to the main one, as in TTreeFriend,
  • dumping TTree's with float64 and []float64 branches,
  • concurent sample processings.

In a nutshell

// Define samples
samples := []*ana.Sample{
	ana.CreateSample("data", "data", `Data`, "data.root", "mytree"),
	ana.CreateSample("bkg1", "bkg", `Proc 1`, "proc1.root", "mytree"),
	ana.CreateSample("bkg2", "bkg", `Proc 2`, "proc2.root", "mytree"),
	ana.CreateSample("bkg3", "bkg", `Proc 3`, "proc3.root", "mytree"),
}

// Define variables
variables := []*ana.Variable{
	ana.NewVariable("plot1", ana.TreeVarBool("branchBool"), 2, 0, 2),
	ana.NewVariable("plot2", ana.TreeVarF32("branchF32"), 25, 0, 1000),
	ana.NewVariable("plot3", ana.TreeVarF64("branchF64"), 50, 0, 1000),
}

// Create analyzer object with some options
analyzer := ana.New(samples, variables, ana.WithHistoNorm(true))

// Produce plots and dump trees
analyzer.Run()

Gallery

Data/Background [code] Unstacked signals [code] Stacked signals [code]
Shape distortion [code] Shape comparison [code] Systematic variation [code]

Performances

benchmarking

For 2M events and 60 variables, a comparison with similar ROOT-based code (using t->Draw()) gives:

  • ROOT -> 6 ms/kEvts
  • GOHEP -> 2 ms/kEvts

For 2M event and one variable (avoiding t->Draw() repetition)

  • ROOT -> 0.4 ms/kEvts
  • GOHEP -> 0.1 ms/kEvts

tree-gonalyzer's People

Contributors

rmadar avatar

Watchers

 avatar

tree-gonalyzer's Issues

Improve user-defined rfunc management

  • store them in a different file at least, maybe another sub-package?
  • enable automatic generation of the code (using the sames tools as go-hep)
  • use a map to select the proper rfunc instead of a endless switch, like in go-hep, e.g. here

Consider having an interface for basic types of TreeFunc

Problem

Currently, we have:

// This will compile
f1 := ana.NewCutBool("branchBool") // f.Fct returns a bool
v1OK := treeFunc.GetFuncBool(r)    // Returning a func() bool
v1NotOK := treeFunc.GetFuncF64(r)  // Will crash, since asserted as func() float64

// This will also compile
f2 := ana.NewVarBool("branchBool") // f.Fct returns a float64 (needed for plotting)
v2OK := treeFunc.GetFuncBool(r)    // Will crash, since asserted as func() bool
v2NotOK := treeFunc.GetFuncF64(r)  // Returning a func() float64

This is quite confusing and not very safe from the user point of view.

Proposed solution

In the TreeFunc type, one could have instead something like:

type TreeFunc struct {
   VarsName []string,
   Fct      interface{}
} 

// Internal interface
tfunc interface {
   GetFunc(r reader)  // BUT: which returned type here? We might need to to type 
                      // assertion later in any case ...
}

// Internal Bool type
type outputBool TreeFunc
func (*outputBool) GetFunc(r reader) {
  return cutBool.GetFuncBool(r)  // i.e. formula.Func() type-assert as func() bool
}

// Internal F64 type
type outputF64 TreeFunc
func (*outputF64l) GetFunc(r reader) {
  return cutBool.GetFuncF64(r) // i.e. formula.Func() type-assert as func() float64
}

// User New function for boolean output
// BUT: wIll the output will be recognized as TreeFunc then?
func NewCutBool(v string) outputBool {
          return TreeFunc{ 
                VarsName: []string{v},
                Fct:      func(x bool) bool { return x },
        }
}

// User New function for float64 output
// BUT: wIll the output will be recognized as TreeFunc then?
func NewVarBool(v string) outputF64 {
          return TreeFunc{ 
                VarsName: []string{v},
                Fct: func(x bool) float64 { 
                          return map[bool]float64{false: 0, true: 1}[x]
                },
        }
}

Then, from the user point of view, it would look like:

f1 := ana.NewCutBool("branchBool") // f.Fct returns a bool
v1 := treeFunc.GetFunc(r)          // Returning a func() bool

f2 := ana.NewVarBool("branchBool") // f.Fct returns a float64 for plotting
v2 := treeFunc.GetFunc(r)          // Returning a func() float64

Questions

  1. Which returned type to give to GetFunc() in the interface? And do we need anyway a type assertion later (coming back to the original problem)?
  2. Will outputF64 (or outputBool) be recognized as a TreeFunc in the user code (I want to keep only this type exported).

(ping to @sbinet for well educated advises ... one more time! Nothing urgent though: this is "just" to make code nicer.)

Add a new 'composition' plot style

Typical wanted plot

image

Strategy

From the user interface side, this could go with an option (to be choosen) such as:

  • ana.WithFraction(true)
  • ana.WithBinwiseNorm(true)
  • ana.WithBinCompo(true)
  • ana.WithComposition(true)

If this option is enabled, then each bin of the distribution will be normalized to 100% and stack mode will be forced to true.

@sbinet, I am not sure what approach should be used from hbook/hplot point of view. In ROOT, I would use TH1, grabe all values for a given bin using h.GetBinContent(ib), compute my fractions and "fill" (or rather set the bin content) of my new histograms with h.SetBinContent(ib) method. As far as I understand, there is no SetBinContent() method in hbook (since the link between error and value cannot be broken).

Manage subsamples

Transform the type Sample to be a slice of trees/files/cuts/weights, with a default length of 1 but with all cosmetic features in common.

Need to add a function to create a new Sample from sub-sample (maybe via an unexported type subSample?)

Fix the behaviour of the ratio plot

  1. Add an ana.Maker option to set y-axis range.
  2. For now:
    • stacked: ok with data but not OK without data (we would want only the total ratio to itself, not the dots at 0 - might be ok with 1. though)
    • not stacked: not ok with data, but ok without data (not clear what ratios are in that case)

This might well be related to this comment in Maker.go (int the if ana.RatioPlot part):

 default:                                                                                                              
   // [FIX-ME 0 (rmadar)] Ratio wrt data (or 1 bkg if data is empty) -> to be specied as an option?
   // [FIX-ME 1 (rmadar)] loop is over bhBkgs_postnorm while 'ana.Samples[is]' runs also over data. 
for is, h := range bhBkgs_postnorm {
...
}

Re-factorize / improve `PlotVariables()` function

  • write one function which plots a single variable for a given cut.
  • break down the different steps: normalization, stack, ratio plot etc ...
  • use concurrency to plot several variables in the same time.

Improve benchmarking

  • group variable per type
  • write a ROOT code based on TTreeReader, with one event loop (not a t->Draw())

TreeFunc: update protection & doc

  • Add protection in maker.FillHisto() on type assertion for func. (Re-design the code maybe, removing getFuncT() ... Or think how to put a nice protection and print the variable that fails)
  • update doc with allowed returned types
  • add a withSlice() option for a variable

Add New function for sample and maker

This will allow a proper management of default values, as well as make sure that needed fields are filled (replace #8). One way could be to use the functional argument pattern:

func NewSample(fname, tname, type, leg string, opts ...ana.Options) Sample {

  // Required fields
  s := Sample{
       FileName: fname, 
       TreeName: tname,
       Type:     type, 
       LegLabel: leg,
  }

  // Configuration with defaults values for all optional fields
  cfg = ana.newCfg()

 // Update the configuration looping over functional options
 for _, opt : range opts {
   cfg = opt(cfg)
 }

 // Go through all fields and fill them with the cfg object
 s.CircleSize = cfg.CircleSize
 s.CircleMarkers = cfg.CircleMarkers 
 ...
}

The functional options might look like:

func (*cfg ana.Config) CircleSize(s float) {
 cfg.CircleSize = s
}

func (*cfg ana.Config) CutFunc(f TreeFunc) {
  cfg.CutFunc = f
}

And the declaration of the sample would look like:

s := ana.NewSample("file.root",  "tree",  "data", 
                   ana.CircleSize(2),
                   ana.CircleMarkers(true), 
                   ana.CutFormula(ana.TreeFunc{
                        VarsNames: []string{"pt", "eta"},
                        Fct: func(pt, eta float64) float64 {return pt+eta},
                   }),
)

Add reading of configuration files

Using JSON format to define samples, variables, selections to - at the end - run a single executable. Something like:

./run_ana --samples spl.JSON --variables var.JSON --selection sels.JSON 

Add total histogram with error band

It would be plotted on top of everything (but before data), and the error band can be done probably manually with band - but it might be good to incorporate it in go-hep/plot (?)

Re-organize PlotHistos() function

The current logic is a bit convoluted between hbook.H1D and hplot.H1D, when computing total background histogram and ratio to data. This could be better organize, I think.

Change the way the data style is set

Currently, one cannot pass linewidth or color ... Everything should be done in the s.CreateHisto() function, with default values for markersize, color and caps (but withCircle at false).

Considering a support TTreeFiend?

This would depend mostly on whether it is possible/easy to add in go-hep/hep (@sbinet?). I know it might not be an everyday use case but it is a quite nice feature to play around.

Considering changing TreeFunc API names

Maybe trying to change

v := ana.NewVarF64("branch") 

into one of those:

v := ana.TreeVarF64("branch")
v := ana.NewTvarF64("branch") 

to make clearer the difference between ana.Variable type and ana.TreeFunc type.

Add sample type

Make the tool able to distinguish between data, background and signal

Add a cutflow subpackage

First clean-up:

  • re-organize the ana part in a single directory (put ana-demo - if kept? - and ana-perf into the ana directory itself).
  • create a new directory cutflow

Structure of the code, based on this:

  • intput: file, tree, event model, definition of a cut sequence, weights definition (?)
  • ouput: ASCI table showing yields and/or absolute and/or relative efficiencies

Improve TickFormat() option

  • Split x-axis and y-axis into different options
  • add the number of ticks as argument of the option too
  • add some example featuring few typical cases

For thes examples, see discussion in go-hep/hep#629

Protection in log scale for negative bin content

Problem

This can happen when the binning is very fine and/or the statistics is very low, and when event might have negative weights (typical in NLO calculations). This crashes in case of log scale for the y-axis, so a protection should be added. For example:

image

will leads to the following error message in log scale:

panic: Values must be greater than 0 for a log scale.

goroutine 1202 [running]:
gonum.org/v1/plot.LogTicks.Ticks(0xc03082e31d099e4f, 0x40ac1368b65e223a, 0x10, 0x1, 0x1)
	/users/divers/atlas/madar/go/pkg/mod/gonum.org/v1/[email protected]/axis.go:523 +0x5c0
gonum.org/v1/plot.verticalAxis.size(0xc03082e31d099e4f, 0x40ac1368b65e223a, 0xaa32f8, 0x10, 0x4014000000000000, 0xf377a0, 0xc0004e64f8, 0x402c000000000000, 0xa9f106, 0x9, ...)
	/users/divers/atlas/madar/go/pkg/mod/gonum.org/v1/[email protected]/axis.go:332 +0x85
gonum.org/v1/plot.(*Plot).DataCanvas(0xc0001ded80, 0xf59820, 0xc000914cf0, 0x4018000000000000, 0x4064000000000000, 0x407a000000000000, 0x4071a78000000000, 0x0, 0x0, 0x0, ...)

[...]

Proposed solutions

  1. Set the bin content of histograms to a minimum (positive) value for bins with negative content. Not sure it's possible with the current hbook.H1D or hplot.H1D. So, maybe option 2 is better ...
  2. Ignore these bins when plotting in log scale - but this should be done at the level of https://github.com/go-hep/hep

Add a function to plot a given selection/sample/variable

It could be convenient to access individual hbook and/or hplot histograms programatically after the step ana.FillHistos(). This can be done by a couple of functions:

// Get just histo
func (*ana Maker) GethplotH1D(sampleName, varName, selName string) *hplot.H1D {
}

// Get just histo
func (*ana Maker) GethbookH1D(sampleName, varName, selName string) *hbook.H1D {
}

// Get direct plot
func (*ana Maker) GetPlot(varName, selName string) *hplot.plt {
}

This would require to first map every names into an index, using pre-defined function getIdxMap(obj interface{}) filling fields cutIdx, samIdx and varIdx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.