lhcb / starterkit-lessons Goto Github PK

View Code? Open in Web Editor NEW

31.0 23.0 86.0 33.42 MB

Lessons taught at the Starterkit workshops.

Home Page: https://lhcb.github.io/starterkit-lessons/

License: Other

Python 86.43% Jupyter Notebook 13.51% CSS 0.06%

lhcb starterkit

starterkit-lessons's People

Contributors

Stargazers

Watchers

Forkers

chrisburr alexpearce vlisovsk roelaaij goi42 mesmith75 vbellee lcapriot saschastahl andriiusachov arnaubrossa dgerstel tmaltsev fdesse zsn0000000 dionusoscn casaisa manuelfs rmatev dominikmuller laurenyeomans phillipmarshall marthaisabelhilton seadanda gsarpis willpietrak bixel qh428 dajiaonao laurentdufour jakelane137 lakshan-ram ben-westhenry felipegarciar mfcicala jlammering lukascalle santvde martinapili pgironel nxc413 shantam-taneja jlombaca alexandre-brea martinoborsato mick-mulder srishtibhasin olantwin dylanjaide ram1123 admorris wkrzemien shanzhenchen vvolkl pikacic chaen pkoppenb adamdddave ryuwd biljanamitreska andylhcb dcervenkov cagapopo kmiec96 alescarab lidelagr mwuvandijk xuelihua shahip2016 davidfriday cookenaomi8 niallmchugh hosseinafsharnia mvieitesdiaz aidanwiederhold nsahoo vlukashenko xlr91 tiangeerr heistera miguel-fernandez jonas-eschle cpviolation dillfitz caosq

starterkit-lessons's Issues

Replace indices in CHILD functor with decay selector

As part of the LoKi functors lesson, the CHILD functor is taught with indices to select the child. As this depends on the ordering of the decay descriptor, it is discouraged. Perhaps the indices could be replaced with with decay selectors?

This brings its own complications, as (some minimal version of the) the grammar would have to be explained. At the same time, this is already part of the DTT lesson. Perhaps the lessons can be reordered to have the DTT lesson first up to the HybridTupleTool part, then the LoKi functors lesson, and then return to the DTT with the newly gained insights.

This might have the advantage that some "magic" particular to GaudiPython, e.g. explicit creation of the LoKi::DistanceCalculator can be avoided.

ImpactKit 2018 - Project ideas

This issue is for keeping track of project ideas for the next ImpactKit. There doesn't need to be any additional details, just enough words to make sure so we don't forget them.

(Though information/links/contact people/etc. are also welcome!)

My ideas:

Correcting the magnetic field map
Adding pdfs to RooFit (Hypatia, Triple Gaussian, ...)
Write some self guided lessons (scikit-learn, fitting, limit setting with CLs)

Configure DaVinci that the data is simulated to get rid of some errors

The interactive-dst lesson is run on simulated data, but DaVinci is not told about this so you see errors like:
RunStampCheck ERROR Database not up-to-date. No valid data for run 6251347 at 1970-01-01 00:00:00.0 UTC

Update lesson on ganga

The ganga lesson currently uses SetupProject ganga and should be updated now this is no longer functional.

Add lesson about the use of database tags

Using and finding the right database tags is often not very intuitive. And it also depends on the use case, what the recommended way is. E.g. in MC you should always use the one used in the generation while in data the latest one is often but not always preferred.
There are some twiki pages containing information but they look a bit outdated.
E.g. this one:
https://twiki.cern.ch/twiki/bin/view/LHCb/RecommendedTags

Ethernet cable in prerequisites

In the prerequisites you urge us to bring an ethernet cable with us.
But after following the starterkit it was clearly not necessary to have it.
Maybe remove this?

Wrong DST in the Downloading files from grid lesson

In the very end of the lesson, the callout box 'such a clever script' points to the file
MC_2012_27163003_Beam4000GeV2012MagDownNu2.5Pythia8_Sim08e_Digi13_Trig0x409f0045_Reco14a_Stripping20NoPrescalingFlagged_ALLSTREAMS.DST.py

It is a remnant from the times when we used D0->K+pi- decay in our examples.
This path needs to be changed to
MC_2016_27163002_Beam6500GeV2016MagDownNu1.625nsPythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping28NoPrescalingFlagged_ALLSTREAMS.DST.py already mentioned above on the same page.

Update Loki lesson

In the new MC we use, the first candidate is a D0+ instead of D0-
Update as well the example with CHILD

"Exceeded Maximum Dataset Limit (100)" in Running DaVinci on the grid

In the "Running DaVinci on the grid" lesson it currently submits a single job to process an entire bookkeeping path which results in the error "Exceeded Maximum Dataset Limit (100)".

We should instead limit the number of files processed until students have been taught about splitting.

Also:

"will download most files below a size of XX MB" should have the XX replaced.

Add bash + basic python lesson

lhcb-proxy-init?

Should we tell people to run lhcb-proxy-init command before launching ganga?
Or it is enough that ganga requests the password itself at some point?

Link in the LHCb data flow lesson points to a wrong stripping line

In the end of the lesson, we mention the stripping line D2hhPromptDst2D2KKLine which is used in following lessons, however the link afterwards points to a wrong line.
The proper link should be this one.

Add a lesson on 'how do we measure variables'

While we teach the data flow in LHCb, it is not explained how do we measure the momentum, perform a vertex fit or define the primary vertex. I had to spend a significant amount of time explaining this during the LoKi functors lesson (e.g. to explain the difference between M and MM), following by questions of students.
It would be convenient to have a little explanation on this before we move to exploring the actual physics variables.

Turbo Data Flow

In the "Changes to Data Flow in Run II" lesson, we show the Turbo as a way to bypass the stripping step. While updating the "An Introduction to LHCb Software" lesson, I noticed that MC/2016/27163002/Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8/Sim09b/Trig0x6138160F/Reco16/Turbo03/Stripping28NoPrescalingFlagged/ALLSTREAMS.DST in the bookkeeping lists Stripping28 under Turbo.

I want to make sure I teach this right. Does this mean the Turbo stream was "resurrected" with Tesla and then included in the Stripping28 campaign? If so, should we update the data flow diagram in the lesson?

Fix the url to the Gaudi doxygen

In the intro to LHCb software lesson, the link to the Gaudi doxygen does not work.

Math not rendered properly?

In https://lhcb.github.io/starterkit-lessons/second-analysis-steps/building-decays.html, I don't see any math, and actually the callout box is cut.

See screenshot below:

Is it only me?

Import history from other repositories

If we move multiple current repositories into a single gitbook it would be nice to keep all the existing history. It's possible for a git repo to have multiple initial commits[1] and we could import the other repositories by doing something like (untested):

git remote add new_repo XXXXX
git fetch
git checkout new_repo/master
git checkout -b add-new-repo
git mv * new_location_for_files
git commit -m "Import new_repo"
# Make a PR as normal

[1] https://www.destroyallsoftware.com/blog/2017/the-biggest-and-weirdest-commits-in-linux-kernel-git-history

Summarise the different turbo types

In the Changes to the data flow in Run 2 lesson turbo is mentioned but it should be updated to summarise the new types of turbo that are now available.

Add editor config

With lots of contributors having strong opinions on their text configuration, we should include a style guide and/or editor config file that defines how we format the lessons. This is partly to circumvent the need for style debates, and also to reduce the number of re-formatting changes made in PRs, which clutter the diff.

Disambiguate between two LFN downloading methods

The "downloading data" lessons gives two ways to download LFNs that are in a Python file downloaded from the bookkeeping: using a custom Python script that needs the LFNs pasted into it, or using dirac-dms-get-file.

This isn't in the spirit of things, we should generally show The One True Way of doing things, or at the very least strongly recommend one technique over another. IMO we should just have the dirac-dms command.

Basics of analysis preservation

We would like to teach people from the beginning good practices that can make an analysis reproducible.
What would be the minimum set of skills / tricks people would need to know?
Here are some ideas:

basic snakemake intro
usage of the containerization template https://gitlab.cern.ch/lhcb-analysis-preservation/containerization-cookie (this template will receive updates before the next starterkit)
gitlab WG groups and eos WG space, usage of xrootd for file access

Remove installation of anaconda

Anaconda installation as described seems to take O(3 GB). This is an obscene amount of space in general, but especially on a home directory on lxplus (max of 10 GB). Is this installation really necessary? Unless I installed anaconda a long time ago and forgot about it, it doesn't seem necessary for coding python on lxplus, which can already run ipython...

Discuss DTF_FUN functors

In #144 we realised that we don't mention the DTF_FUN functor. This is a powerful way to use DecayTreeFitter, and is more explicit than TupleToolDecayTreeFitter. One can be more fine-grained with what's saved as well, which is nice.

We should mention it somewhere.

Answering good questions

(Migrated from first-analysis-steps #95)

As a follow-up to a lesson of asking good questions, some instructions should be given on how to answer good questions (and helping other people in general).

As an easy start, it may be just an extra callout box in the end of the lesson.

Remove(/redirect) old versions of the lessons

Make use of the LHCb glossary

Hi all,

It would be great if the starterkit material start making use of the glossary recently prepared, see https://github.com/lhcb/glossary. This way, relevant definitions can all be collected in one place and referred to in the various training and documentation material sets.

Many thanks.

Rework the interactive DST lesson

As written, with the copying and pasting of functions and restarting the ipython session, it's very clunky and hard-to-follow in class. It might be good to rework the lesson actually using Bender, as this is how people access the TES in real life.

Add lesson on creating/expanding AFS home/work areas

Some people during the Starterkit didn't have an AFS work area, and most hadn't upgraded their home and work areas to the maximum available space (10GB and 100GB respectively).

We should add instructions for doing this somewhere, for example here.

(Migrated from first-analysis-steps #156)

Wrong DaVinci version in the Intro to LHCb software lesson

The Introduction to LHCb software has several DaVinci versions in use in different places. We should stick to (the most recent) one - v42r6p1.

Move to using a Run 2 uDST MC sample

Improve the EOS lesson

The first section of the lesson says:

To retrieve a job outputfile, one can use three types of files:

But only two are listed right after.

The third should be a MassStorageFile which is considered to be almost deprecated, and some explanation should be added why should people avoid that.
Other options e.g. SharedFile exist and may be worth a brief mention here.

(Migrated from first-analysis-steps #221)

Exercise in "The simulation framework" not working out of the box

When following the instructions for "Setting up a new Decay" in "The simulation framework" lesson of the second analysis steps, the .run command gives the following error:
./run: line 21: ./build_env.sh: No such file or directory

When following these instructions for testing decay files, the environment is setup properly and the execution of the example provided there works:
https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/blob/master/CONTRIBUTING.md

The problem seems to come from the difference between cmt (SetupProject) and lb-dev. Since I'm no expert, I'm not sure how to fix the tutorial.

I tried running the tutorial by setting up the everything following the SetupProject steps (v49r9), but when creating the xgen file, one more option is needed: $GAUSSOPTS/Gauss-2016.py, as in the example on https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/blob/master/CONTRIBUTING.md, otherwise the following error appears:
ToolSvc.EvtGenDecay ERROR EvtGen Error from EvtGen
Unknown particle name:N(1440)+ on line 10787

This is the list of commands that worked for me:

SetupProject -c x86_64-slc6-gcc49-opt Gauss v49r9 --build-env
cd ~/cmtuser/Gauss_v49r9
SetupProject -c x86_64-slc6-gcc49-opt Gauss v49r9

git lb-clone-pkg Gen/DecFiles

cd ~/cmtuser/Gauss_v49r9/Gen/DecFiles/
make

SetupProject Gauss v49r9
cd ~/cmtuser/Gauss_v49r9
gaudirun.py Gauss-Job.py $GAUSSOPTS/Gauss-2016.py $GAUSSOPTS/GenStandAlone.py
$DECFILESROOT/options/11164001.py $LBPYTHIA8ROOT/options/Pythia8.py
(here, Gauss-Job.py is the one from the tutorial saved in the directory ~/cmtuser/Gauss_v49r9)

SetupProject DaVinci v41r0
gaudirun.py DaVinciOptions.py (changing the .xgen file name to the one I created)

IOHelper().inputFiles vs DaVinci().Input

In the Running a minimal DaVinci job locally lesson, the proposed way of specifying the input DST is using IOHelper().inputFiles command. At the same time, a lot of people use DaVinci.Input instead.
Is there a deep reason to prefer the IOHelper way of doing things?

Discuss copyright notice and licensing

We should discuss the new policy in the Developing LHCb Software lesson.

https://indico.cern.ch/event/766055/contributions/3179968/

Include code blocks directly from example files

We can use the neat gitbook-plugin-codesnippet plugin to extract code directly from the example files for display in the lessons. This has the benefit of keeping the examples on the web pages in sync with those in the files.

Bug in setDescriptorTemplate

I noticed a bug on the setDescriptorTemplate function to automatically add branches for a given decay. Basically, the function assumes the primary decay will always be added as a branch and thus removes the ^ from the first particle in the template. If the user does not specify a branch for the primary decay, this will cause the first branch for secondary particles to not be created.

Also, for the primary decay, a ^ is still introduced in the addBranches, which is at least not needed, but this does not seem to break anything as far as I could tell.

See lesson here: https://lhcb.github.io/starterkit-lessons/first-analysis-steps/add-tupletools.html

LaTeX not working

This 💩 is so 🔥. For real folks! It's really impressive how you continue to do such awesome work and make the Starterkit better every day!

That being said, LaTeX integration (e.g. here, bottom of the page (screenshot below)) does not seem to work.

Clarify the command to run Bender

In the Exploring the DST lesson, it is mentioned the function

seekStripDecision (which replaces our advance)

It is worth mentioning, that to reproduce the same behavior as of advance(line), the following syntax should be used: seekStripDecision('Stripping'+line+'Decision').

Broken links to code

Some of links to the example code are broken.

Example: https://lhcb.github.io/starterkit-lessons/first-analysis-steps/code/add-tupletools/code/add-tupletools/ntuple_options.py
The right one being: https://lhcb.github.io/starterkit-lessons/first-analysis-steps/code/add-tupletools/ntuple_options.py

Reported by Giampi

DTF substitutions are wrong

Currently the lesson replaces D0 -> K- K+ with D0 -> pi- pi+ instead of D0 -> K- pi+ as is said in the text.

Add git lesson

Create few issues to solve during the Friday lesson?

On Friday, one of the goals of the lesson is to teach students to create their own issues here and to solve those already existing. As an easy start, it would be nice to have some simple issues prepared so that they can be solved during the lesson.

If you have any ideas for that, it would be great to prepare them in advance, and mark by 'Friday lesson' label, so that we don't touch these issues until Friday.

Add python for hep lesson

Add reference to accessURL

When we discuss downloading files from the Grid, we should mention this is not necessary at all thanks to the root protocol and we can show how to do that

apuignav@lxplus062:~/Starterkit$ lb-run LHCbDirac/latest $SHELL
apuignav@lxplus062:~/Starterkit$  dirac-dms-lfn-accessURL /lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst --Protocol=ROOT

Using the following list of SEs: ['CNAF_MC-DST', 'IN2P3-ARCHIVE', 'LAL_MC-DST']
Failed :
    IN2P3-ARCHIVE : File not at SE
Successful :
    CNAF_MC-DST :
        /lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst : root://xrootd-lhcb.cr.cnaf.infn.it//storage/gpfs_lhcb/lhcb/disk/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst
    LAL_MC-DST :
        /lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst : root://grid05.lal.in2p3.fr:1094//dpm/lal.in2p3.fr/home/lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst

I will explain this in my lesson and update the text later (I don't have time now), but this can serve as a reminder.

Add EventPreFilters to DaVinci file running on grid

I think it would be good to teach students as quickly as possible that they can use EventPreFilters to make their jobs faster (and use fewer resources). It could be mentioned in a callout box at the end of the minimal DaVinci script lesson.

Explain what an NTuple is

The lesson "Running a minimal DaVinci job locally" says that we're going to create an NTuple, but doesn't explain what it is:

it's useful to store the information on the selected particles inside an ntuple

Would be nice to have a box defining it.

(Migrated from first-analysis-steps #195)

Explain how to find real data in the bookkeeping

In the bookkeeping lesson, we talk about MC only.
It is worth adding few words on how to find real data.

Improve the LoKi functors lesson

From Vanya:

In LoKi functors lesson, one probably can avoid a long paragraph with discussions on VFASPF, there are functors CHI2VX and CHI2VXNDOF that get particle as an argument and evaluate corresponding chi2_VX ( or chi2_VX/ndf).
I think it could simplify a bit the material.

Though, we would like to keep the VFASPF discussion since it's widely used in the stripping lines. Just a brief update on the CHI2VX and CHI2VXNDOF should be useful.

(Migrated from first-analysis-steps #231)

Simplify LoKi functors lessons

After teaching it, some of the changes #26 have added some material thats is too complicated and doesn't add much to the discussion (the IPCHI2). I think we should roll these back when we update the lesson as mentioned in #58.

Update the DTF lesson for D0 -> K- K+

In the DTF lesson, there are some remnants from the time when we used an example the D0->K+pi- decay.

For example,

As an example let's say we want to examine the Cabibbo-suppressed decay of the D^0 into pi- pi+ instead of K- pi+.

A certain clean-up is needed to update these instructions.