fast-hep / fast-plotter Goto Github PK
View Code? Open in Web Editor NEWManipulate binned pandas dataframes into plots
Home Page: https://fast-hep.web.cern.ch
Manipulate binned pandas dataframes into plots
Home Page: https://fast-hep.web.cern.ch
Imported from gitlab issue 5
The current error calculation assumes the ratio is for an efficiency plot, which will not always be the case.
https://gitlab.cern.ch/fast-hep/public/fast-plotter/blob/master/fast_plotter/plotting.py#L134
It would be good to add the error calculation type as an option. The main options I can think of are
Imported from gitlab issue 4
There are still a few functions in init.py from the initial commit.
We should review these, move anything useful to utils.py, and remove anything obsolete.
Even if specifying a colour for a dataset in the dataset_colours
block of a plotting config, it is not adhered to. Signal lines always seem to be ordered by total yield, and follow the tab10
colourmap from matplotlib
Imported from gitlab issue 10
Recent changes have broken things for Data / MC plots:
So streamline the functions to let user merge columns together, then re-order or drop columns as necessary, for input to fast-plotter, ensuring the index is set for the new merged column (see working code below)
stages:
- {rename_region: ReBin}
- {rebin_met: ReBin}
- {combine_region_met: CombineColumns}
- {drop_region_met: AssignDim}
- {save: WriteOut}
rename_region:
axis: region
drop_others: true
mapping:
0: SR
6: SB0
7: SB1
8: SB2
9: SB3
10: SB4
rebin_met:
axis: met
drop_others: true
mapping:
"[200.0, 300.0)": "[200.0, 300.0)"
"[300.0, 400.0)": "[300.0, 400.0)"
"[400.0, 500.0)": "[400.0, 600.0)"
"[500.0, 600.0)": "[400.0, 600.0)"
"[600.0, 700.0)": "[600.0, 2000.0)"
"[700.0, 800.0)": "[600.0, 2000.0)"
"[800.0, 900.0)": "[600.0, 2000.0)"
"[900.0, 1000.0)": "[600.0, 2000.0)"
"[1000.0, inf)": "[600.0, 2000.0)"
combine_region_met:
format_strings: {"region_met":"{region}_{met}"}
as_index: [region_met]
drop_region_met:
drop_cols: [region, category, met]
save: #{}
filename: "tbl_dataset.region.category.met--sig_regions_fit.csv"
Imported from gitlab issue 8
Include a "summary" panel for signal/sqrt(background) or the Asimov formula when plotting signal vs background. Ideally use colour-coding so that in the summary plot, for each bin, there's a f(S, B)
value for each signal where the colour of the point matches the top plot.
When specifying certain datasets as signal in a config, i.e.,
signal: '.*SVJ*.'
and a dataset order is specified for them, i.e.,
dataset_order:
- SVJ_3000_20_0.9_peak (Pythia)
- 'SVJ_3000_20_0.9_peak (MadGraph)'
- SVJ_3000_20_0.1_peak (Pythia)
- 'SVJ_3000_20_0.1_peak (MadGraph)'
- SVJ_1000_20_0.3_peak (Pythia)
- 'SVJ_1000_20_0.3_peak (MadGraph)'
the order in which the datasets are plotted is not the same. Presumably, they are just ordered by yield, as can be demonstrated with the following plots produced even with the above config fragments applied.
Imported from gitlab issue 6
Currently cannot plot only signal, requires a background row in order to plot (i.e. will always require plotting stacks) as lines
Imported from gitlab issue 7
Imported from gitlab issue 2
csv at: /afs/cern.ch/user/d/danthony/public/combined_signal/
(chip_env) [zw18769@soolin updated_combined_signals]$ fast_plotter /users/zw18769/CHIP/analysis/output/MC_signals/combined_signal/updated_combined_signals/tbl_dataset.njet.nbjet--weight_nominal.csv
fast_plotter - INFO - Processing: /users/zw18769/CHIP/analysis/output/MC_signals/combined_signal/updated_combined_signals/tbl_dataset.njet.nbjet--weight_nominal.csv
Traceback (most recent call last):
File "/users/zw18769/.local/bin/fast_plotter", line 11, in <module>
load_entry_point('fast-plotter', 'console_scripts', 'fast_plotter')()
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 45, in main
process_one_file(infile, args)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 67, in process_one_file
scale_sims=args.lumi, yscale=args.yscale)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 21, in plot_all
plot = plot_1d_many(projected, data=data, dataset_col=dataset_col, yscale=yscale, scale_sims=scale_sims)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 63, in plot_1d_many
_actually_plot(df_data, kind=kind_data, label="Data", ax=main_ax)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 52, in _actually_plot
df.reset_index().plot.scatter(x=x_axis, y="sumw", yerr="err", color="k", label=label, ax=ax)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 3461, in scatter
return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 2941, in __call__
sort_columns=sort_columns, **kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1977, in plot_frame
**kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1743, in _plot
kind=kind, **kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 845, in __init__
super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 820, in __init__
raise ValueError(self._kind + ' requires x column to be numeric')
ValueError: scatter requires x column to be numeric
If, in fast-carpenter
, variables are binned discretely (e.g., njet), it is plotted incorrectly. The first two bins share the same value, and the last two bins share the same value. For example,
plot_dataset.njet--n_jets--weight_nominal--project_njet-yscale_log.pdf
A few suggestions:
This involves editing from [https://github.com/FAST-HEP/fast-plotter/blob/master/fast_plotter/plotting.py#L47]
I was trying to use fast_plotter
recently but it couldn't find cycler
yet it's installed.
$ fast_plotter --help
Traceback (most recent call last):
File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/bin/fast_plotter", line 5, in <module>
from fast_plotter.__main__ import main
File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/fast_plotter/__main__.py", line 7, in <module>
import matplotlib
File "/home/anaylor/.local/lib/python3.6/site-packages/matplotlib/__init__.py", line 139, in <module>
from . import cbook, rcsetup
File "/home/anaylor/.local/lib/python3.6/site-packages/matplotlib/rcsetup.py", line 31, in <module>
from cycler import Cycler, cycler as ccycler
ModuleNotFoundError: No module named 'cycler'
$ pip freeze | grep cycler
cycler==0.10.0
$ pip freeze | grep fast-plotter
fast-plotter==0.8.1
Imported from gitlab issue 9
If a user specifies dataset_order
as a list in their plotting config file, fast-plotter complains at this line
fast-plotter/fast_plotter/utils.py
Line 132 in 814f0f7
startswith()
attribute. This line should be at the top of the function: fast-plotter/fast_plotter/utils.py
Line 140 in 814f0f7
Also, when specifying dataset_order
as a list, the datasets are actually plotted in reverse order.
Imported from gitlab issue 3
CSV's are attached - seems to be an issue with all of them
(chip_env) [zw18769@soolin updated_combined_signals]$ fast_plotter -s ".*" -o ~/CHIP/analysis/output/MC_signals/combined_signal_5-12-18/ -l 41800 -w weight_nominal ~/CHIP/analysis/output/MC_signals/combined_signal/tbl_dataset.leadJet_pt.leadJet_eta--weight_nominal.csv
fast_plotter - INFO - Processing: /users/zw18769/CHIP/analysis/output/MC_signals/combined_signal/tbl_dataset.leadJet_pt.leadJet_eta--weight_nominal.csv
Traceback (most recent call last):
File "/users/zw18769/.local/bin/fast_plotter", line 11, in <module>
load_entry_point('fast-plotter', 'console_scripts', 'fast_plotter')()
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 49, in main
process_one_file(infile, args)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/__main__.py", line 71, in process_one_file
data=args.data, signal=args.signal, scale_sims=args.lumi, yscale=args.yscale)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 21, in plot_all
plot = plot_1d_many(projected, data=data, signal=signal, dataset_col=dataset_col, yscale=yscale, scale_sims=scale_sims)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 99, in plot_1d_many
plot_ratio(summed_data, summed_sims, x=x_axis, y=y, yvar=yvar, ax=summary_ax)
File "/users/zw18769/CHIP/fast-plotter/fast_plotter/plotting.py", line 135, in plot_ratio
ratio.reset_index().plot.scatter(x=x, y="Data / MC", yerr="err", ax=ax)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 3461, in scatter
return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 2941, in __call__
sort_columns=sort_columns, **kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1977, in plot_frame
**kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 1743, in _plot
kind=kind, **kwds)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 845, in __init__
super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs)
File "/users/zw18769/.local/lib/python2.7/site-packages/pandas/plotting/_core.py", line 820, in __init__
raise ValueError(self._kind + ' requires x column to be numeric')
ValueError: scatter requires x column to be numeric`
tbl_dataset.leadJet_eta--weight_nominal.csv
tbl_dataset.met--weight_nominal.csv
tbl_dataset.leadJet_pt.leadJet_eta--weight_nominal.csv
tbl_dataset.njet.nbjet--weight_nominal.csv
If marking signal contributions using Latex e.g. including backslashes, cannot generate this in the plot given this regex comparison line (https://github.com/FAST-HEP/fast-plotter/blob/master/fast_plotter/utils.py#L77) will raise an error re.error: bad escape \X at position Y
.
Workaround is to manually append latex-formatted string to the first_values
array, with double-backslashes. This is due to the known re.match issue when including "" in string.
Some small, additional customisation options could be useful (or used as default):
Imported from gitlab issue 1
Fast plotter fails to open my dataframe in output from fast-carpenter with this backtrace:
File "testplot.py", line 6, in <module>
dataframe = read_binned_df("output/tbl_genJetPt.deltaPt.csv")
File "/users/sb17498/.local/lib/python2.7/site-packages/fast_plotter/utils.py", line 25, in read_binned_df
read_opts = get_read_options(filename)
File "/users/sb17498/.local/lib/python2.7/site-packages/fast_plotter/utils.py", line 17, in get_read_options
index_cols, _ = decipher_filename(filename)
File "/users/sb17498/.local/lib/python2.7/site-packages/fast_plotter/utils.py", line 13, in decipher_filename
weights = groups.group("weights").split(".")
when run on tbl_genJetPt.deltaPt.csv.
The example in the fast_cms_public_tutorial repo does not fail as it does not to perform the groups.group("weights") operation.
I presume the error is due to not using weights in the sequence, see sequence_cfg.yaml.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.