Giter Site home page Giter Site logo

fast-plotter's People

Contributors

benkrikler avatar dbanthony avatar eshwen avatar linacre avatar snwebb avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

fast-plotter's Issues

Signal line colours are inconsistent

Even if specifying a colour for a dataset in the dataset_colours block of a plotting config, it is not adhered to. Signal lines always seem to be ordered by total yield, and follow the tab10 colourmap from matplotlib

Bugs in v0.1.4

Imported from gitlab issue 10

Recent changes have broken things for Data / MC plots:

  • Return code not set properly if there are crashes
  • Crashes caused by treating single line plots as multiple line plots

Streamline re-ordering/merging/dropping columns (Rob)

So streamline the functions to let user merge columns together, then re-order or drop columns as necessary, for input to fast-plotter, ensuring the index is set for the new merged column (see working code below)

stages:
    - {rename_region: ReBin}
    - {rebin_met: ReBin}
    - {combine_region_met: CombineColumns}
    - {drop_region_met: AssignDim}
    - {save: WriteOut}

rename_region:
    axis: region
    drop_others: true
    mapping:
	0: SR
	6: SB0
	7: SB1
	8: SB2
	9: SB3
	10: SB4

rebin_met:
    axis: met
    drop_others: true
    mapping:
	"[200.0, 300.0)":   "[200.0, 300.0)"
        "[300.0, 400.0)":   "[300.0, 400.0)"
        "[400.0, 500.0)":   "[400.0, 600.0)"
        "[500.0, 600.0)":   "[400.0, 600.0)"
        "[600.0, 700.0)":   "[600.0, 2000.0)"
        "[700.0, 800.0)":   "[600.0, 2000.0)"
        "[800.0, 900.0)":   "[600.0, 2000.0)"
        "[900.0, 1000.0)":   "[600.0, 2000.0)"
        "[1000.0, inf)":   "[600.0, 2000.0)"

combine_region_met:
    format_strings: {"region_met":"{region}_{met}"}
    as_index: [region_met]

drop_region_met:
    drop_cols: [region, category, met]

save: #{}
    filename: "tbl_dataset.region.category.met--sig_regions_fit.csv"

Versatility in signal vs background plots

Imported from gitlab issue 8

Include a "summary" panel for signal/sqrt(background) or the Asimov formula when plotting signal vs background. Ideally use colour-coding so that in the summary plot, for each bin, there's a f(S, B) value for each signal where the colour of the point matches the top plot.

Dataset order not applied to signal

When specifying certain datasets as signal in a config, i.e.,

signal: '.*SVJ*.'

and a dataset order is specified for them, i.e.,

dataset_order:
    - SVJ_3000_20_0.9_peak (Pythia)
    - 'SVJ_3000_20_0.9_peak (MadGraph)'
    - SVJ_3000_20_0.1_peak (Pythia)
    - 'SVJ_3000_20_0.1_peak (MadGraph)'
    - SVJ_1000_20_0.3_peak (Pythia)
    - 'SVJ_1000_20_0.3_peak (MadGraph)'

the order in which the datasets are plotted is not the same. Presumably, they are just ordered by yield, as can be demonstrated with the following plots produced even with the above config fragments applied.

plot_dataset dijet_mt--dijet_mt_df--weight_nominal--project_dijet_mt-yscale_log
plot_dataset ht--HT_df--weight_nominal--project_ht-yscale_log

ValueError: scatter requires x column to be numeric

Imported from gitlab issue 2

csv at: /afs/cern.ch/user/d/danthony/public/combined_signal/

(chip_env) [zw18769@soolin updated_combined_signals]$ fast_plotter /users/zw18769/CHIP/analysis/output/MC_signals/combined_signal/updated_combined_signals/tbl_dataset.njet.nbjet--weight_nominal.csv

 
fast_plotter - INFO - Processing: /users/zw18769/CHIP/analysis/output/MC_signals/combined_signal/updated_combined_signals/tbl_dataset.njet.nbjet--weight_nominal.csv
Traceback (most recent call last):
  File "/users/zw18769/.local/bin/fast_plotter", line 11, in <module>
    load_entry_point('fast-plotter', 'console_scripts', 'fast_plotter')()

  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 45, in main
    process_one_file(infile, args)

  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 67, in process_one_file
    scale_sims=args.lumi, yscale=args.yscale)

  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 21, in plot_all
    plot = plot_1d_many(projected, data=data, dataset_col=dataset_col, yscale=yscale, scale_sims=scale_sims)

  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 63, in plot_1d_many
    _actually_plot(df_data, kind=kind_data, label="Data", ax=main_ax)

  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 52, in _actually_plot
    df.reset_index().plot.scatter(x=x_axis, y="sumw", yerr="err", color="k", label=label, ax=ax)

  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 3461, in scatter
    return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)

  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 2941, in __call__
    sort_columns=sort_columns, **kwds)

  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1977, in plot_frame
    **kwds)

  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1743, in _plot
    kind=kind, **kwds)

  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 845, in __init__
    super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs)

  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 820, in __init__

raise ValueError(self._kind + ' requires x column to be numeric')

ValueError: scatter requires x column to be numeric

Feature requests

A few suggestions:

  • Add 'merge_df' functionality into Plotter
  • Allow datasets to be normalised to unity (grouped by the datasets in 'value replacements')
  • Add optional 'ignore' as command-line argument for columns you don't want to plot
  • Add optional 'vars' as command-line argument for variables you do want to plot (to avoid requring such a specific naming scheme)
  • Add option to to classify datasets undefined in the config as 'other'; this is an easy way to plot the datasets you're intereted and combine togethers the other you don't care about
  • Add optional 'style' to how variables should be plotted. By this, I just mean using the 'style' option to effectively treat all datasets as signal or all datasets as background for the purposes of plotting

ModuleNotFoundError `cycler` issue

I was trying to use fast_plotter recently but it couldn't find cycler yet it's installed.

$ fast_plotter --help
Traceback (most recent call last):
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/bin/fast_plotter", line 5, in <module>
    from fast_plotter.__main__ import main
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/fast_plotter/__main__.py", line 7, in <module>
    import matplotlib
  File "/home/anaylor/.local/lib/python3.6/site-packages/matplotlib/__init__.py", line 139, in <module>
    from . import cbook, rcsetup
  File "/home/anaylor/.local/lib/python3.6/site-packages/matplotlib/rcsetup.py", line 31, in <module>
    from cycler import Cycler, cycler as ccycler
ModuleNotFoundError: No module named 'cycler'
$ pip freeze | grep cycler
cycler==0.10.0
$ pip freeze | grep fast-plotter
fast-plotter==0.8.1

If dataset_order is a list, fast-plotter fails

If a user specifies dataset_order as a list in their plotting config file, fast-plotter complains at this line

if dataset_order.startswith("sum"):
as lists don't have the startswith() attribute. This line should be at the top of the function:
if isinstance(dataset_order, list):

Also, when specifying dataset_order as a list, the datasets are actually plotted in reverse order.

Return of 'ValueError: scatter requires x column to be numeric'

Imported from gitlab issue 3

CSV's are attached - seems to be an issue with all of them

(chip_env) [zw18769@soolin updated_combined_signals]$ fast_plotter -s ".*" -o ~/CHIP/analysis/output/MC_signals/combined_signal_5-12-18/ -l 41800 -w weight_nominal ~/CHIP/analysis/output/MC_signals/combined_signal/tbl_dataset.leadJet_pt.leadJet_eta--weight_nominal.csv 

fast_plotter - INFO - Processing: /users/zw18769/CHIP/analysis/output/MC_signals/combined_signal/tbl_dataset.leadJet_pt.leadJet_eta--weight_nominal.csv

Traceback (most recent call last):
  File "/users/zw18769/.local/bin/fast_plotter", line 11, in <module>
    load_entry_point('fast-plotter', 'console_scripts', 'fast_plotter')()
  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 49, in main
    process_one_file(infile, args)
  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 71, in process_one_file
    data=args.data, signal=args.signal, scale_sims=args.lumi, yscale=args.yscale)
  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 21, in plot_all
    plot = plot_1d_many(projected, data=data, signal=signal, dataset_col=dataset_col, yscale=yscale, scale_sims=scale_sims)
  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 99, in plot_1d_many
    plot_ratio(summed_data, summed_sims, x=x_axis, y=y, yvar=yvar, ax=summary_ax)
  File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 135, in plot_ratio
    ratio.reset_index().plot.scatter(x=x, y="Data / MC", yerr="err", ax=ax)
  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 3461, in scatter
    return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 2941, in __call__
    sort_columns=sort_columns, **kwds)
  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1977, in plot_frame
    **kwds)
  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1743, in _plot
    kind=kind, **kwds)
  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 845, in __init__
    super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs)
  File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 820, in __init__
    raise ValueError(self._kind + ' requires x column to be numeric')

ValueError: scatter requires x column to be numeric`

tbl_dataset.leadJet_eta--weight_nominal.csv

cuts_signal_selection-.csv

tbl_dataset.met--weight_nominal.csv

tbl_dataset.leadJet_pt.leadJet_eta--weight_nominal.csv

tbl_dataset.njet.nbjet--weight_nominal.csv

tbl_dataset.sublJet_eta--weight_nominal.csv

tbl_dataset.sublJet_pt.sublJet_eta--weight_nominal.csv

Regex comparison issue when using Latex notation

If marking signal contributions using Latex e.g. including backslashes, cannot generate this in the plot given this regex comparison line (https://github.com/FAST-HEP/fast-plotter/blob/master/fast_plotter/utils.py#L77) will raise an error re.error: bad escape \X at position Y.

Workaround is to manually append latex-formatted string to the first_values array, with double-backslashes. This is due to the known re.match issue when including "" in string.

Additional customisation options

Some small, additional customisation options could be useful (or used as default):

  • Option for hatched error bar on background MC instead of filled (in main plot)
  • Alter the range of the data/MC sub-plot (e.g., 0.5 - 1.5)

Fast plotter fails to extract weights from file name

Imported from gitlab issue 1

Fast plotter fails to open my dataframe in output from fast-carpenter with this backtrace:

  File "testplot.py", line 6, in <module>
    dataframe = read_binned_df("output/tbl_genJetPt.deltaPt.csv")
  File "/users/sb17498/.local/lib/python2.7/site-packages/fast_plotter/utils.py", line 25, in read_binned_df
    read_opts = get_read_options(filename)
  File "/users/sb17498/.local/lib/python2.7/site-packages/fast_plotter/utils.py", line 17, in get_read_options
    index_cols, _ = decipher_filename(filename)
  File "/users/sb17498/.local/lib/python2.7/site-packages/fast_plotter/utils.py", line 13, in decipher_filename
    weights = groups.group("weights").split(".")

when run on tbl_genJetPt.deltaPt.csv.
The example in the fast_cms_public_tutorial repo does not fail as it does not to perform the groups.group("weights") operation.

I presume the error is due to not using weights in the sequence, see sequence_cfg.yaml.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.