lhcb / starterkit-lessons Goto Github PK
View Code? Open in Web Editor NEWLessons taught at the Starterkit workshops.
Home Page: https://lhcb.github.io/starterkit-lessons/
License: Other
Lessons taught at the Starterkit workshops.
Home Page: https://lhcb.github.io/starterkit-lessons/
License: Other
As part of the LoKi functors lesson, the CHILD
functor is taught with indices to select the child. As this depends on the ordering of the decay descriptor, it is discouraged. Perhaps the indices could be replaced with with decay selectors?
This brings its own complications, as (some minimal version of the) the grammar would have to be explained. At the same time, this is already part of the DTT lesson. Perhaps the lessons can be reordered to have the DTT lesson first up to the HybridTupleTool part, then the LoKi functors lesson, and then return to the DTT with the newly gained insights.
This might have the advantage that some "magic" particular to GaudiPython, e.g. explicit creation of the LoKi::DistanceCalculator
can be avoided.
This issue is for keeping track of project ideas for the next ImpactKit. There doesn't need to be any additional details, just enough words to make sure so we don't forget them.
(Though information/links/contact people/etc. are also welcome!)
My ideas:
The interactive-dst lesson is run on simulated data, but DaVinci
is not told about this so you see errors like:
RunStampCheck ERROR Database not up-to-date. No valid data for run 6251347 at 1970-01-01 00:00:00.0 UTC
The ganga lesson currently uses SetupProject ganga
and should be updated now this is no longer functional.
Using and finding the right database tags is often not very intuitive. And it also depends on the use case, what the recommended way is. E.g. in MC you should always use the one used in the generation while in data the latest one is often but not always preferred.
There are some twiki pages containing information but they look a bit outdated.
E.g. this one:
https://twiki.cern.ch/twiki/bin/view/LHCb/RecommendedTags
In the prerequisites you urge us to bring an ethernet cable with us.
But after following the starterkit it was clearly not necessary to have it.
Maybe remove this?
In the very end of the lesson, the callout box 'such a clever script' points to the file
MC_2012_27163003_Beam4000GeV2012MagDownNu2.5Pythia8_Sim08e_Digi13_Trig0x409f0045_Reco14a_Stripping20NoPrescalingFlagged_ALLSTREAMS.DST.py
It is a remnant from the times when we used D0->K+pi- decay in our examples.
This path needs to be changed to
MC_2016_27163002_Beam6500GeV2016MagDownNu1.625nsPythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping28NoPrescalingFlagged_ALLSTREAMS.DST.py
already mentioned above on the same page.
In the new MC we use, the first candidate is a D0+ instead of D0-
Update as well the example with CHILD
In the "Running DaVinci on the grid" lesson it currently submits a single job to process an entire bookkeeping path which results in the error "Exceeded Maximum Dataset Limit (100)".
We should instead limit the number of files processed until students have been taught about splitting.
Also:
Should we tell people to run lhcb-proxy-init
command before launching ganga?
Or it is enough that ganga requests the password itself at some point?
While we teach the data flow in LHCb, it is not explained how do we measure the momentum, perform a vertex fit or define the primary vertex. I had to spend a significant amount of time explaining this during the LoKi functors lesson (e.g. to explain the difference between M
and MM
), following by questions of students.
It would be convenient to have a little explanation on this before we move to exploring the actual physics variables.
In the "Changes to Data Flow in Run II" lesson, we show the Turbo as a way to bypass the stripping step. While updating the "An Introduction to LHCb Software" lesson, I noticed that MC/2016/27163002/Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8/Sim09b/Trig0x6138160F/Reco16/Turbo03/Stripping28NoPrescalingFlagged/ALLSTREAMS.DST
in the bookkeeping lists Stripping28 under Turbo.
I want to make sure I teach this right. Does this mean the Turbo stream was "resurrected" with Tesla and then included in the Stripping28 campaign? If so, should we update the data flow diagram in the lesson?
In the intro to LHCb software lesson, the link to the Gaudi doxygen does not work.
In https://lhcb.github.io/starterkit-lessons/second-analysis-steps/building-decays.html, I don't see any math, and actually the callout box is cut.
Is it only me?
If we move multiple current repositories into a single gitbook it would be nice to keep all the existing history. It's possible for a git repo to have multiple initial commits[1] and we could import the other repositories by doing something like (untested):
git remote add new_repo XXXXX
git fetch
git checkout new_repo/master
git checkout -b add-new-repo
git mv * new_location_for_files
git commit -m "Import new_repo"
# Make a PR as normal
In the Changes to the data flow in Run 2
lesson turbo is mentioned but it should be updated to summarise the new types of turbo that are now available.
With lots of contributors having strong opinions on their text configuration, we should include a style guide and/or editor config file that defines how we format the lessons. This is partly to circumvent the need for style debates, and also to reduce the number of re-formatting changes made in PRs, which clutter the diff.
The "downloading data" lessons gives two ways to download LFNs that are in a Python file downloaded from the bookkeeping: using a custom Python script that needs the LFNs pasted into it, or using dirac-dms-get-file
.
This isn't in the spirit of things, we should generally show The One True Way of doing things, or at the very least strongly recommend one technique over another. IMO we should just have the dirac-dms
command.
We would like to teach people from the beginning good practices that can make an analysis reproducible.
What would be the minimum set of skills / tricks people would need to know?
Here are some ideas:
Anaconda installation as described seems to take O(3 GB). This is an obscene amount of space in general, but especially on a home directory on lxplus (max of 10 GB). Is this installation really necessary? Unless I installed anaconda a long time ago and forgot about it, it doesn't seem necessary for coding python on lxplus, which can already run ipython...
In #144 we realised that we don't mention the DTF_FUN
functor. This is a powerful way to use DecayTreeFitter, and is more explicit than TupleToolDecayTreeFitter
. One can be more fine-grained with what's saved as well, which is nice.
We should mention it somewhere.
(Migrated from first-analysis-steps #95)
As a follow-up to a lesson of asking good questions, some instructions should be given on how to answer good questions (and helping other people in general).
As an easy start, it may be just an extra callout box in the end of the lesson.
Hi all,
It would be great if the starterkit material start making use of the glossary recently prepared, see https://github.com/lhcb/glossary. This way, relevant definitions can all be collected in one place and referred to in the various training and documentation material sets.
Many thanks.
As written, with the copying and pasting of functions and restarting the ipython session, it's very clunky and hard-to-follow in class. It might be good to rework the lesson actually using Bender, as this is how people access the TES in real life.
Some people during the Starterkit didn't have an AFS work area, and most hadn't upgraded their home and work areas to the maximum available space (10GB and 100GB respectively).
We should add instructions for doing this somewhere, for example here.
(Migrated from first-analysis-steps #156)
The Introduction to LHCb software has several DaVinci versions in use in different places. We should stick to (the most recent) one - v42r6p1.
The first section of the lesson says:
To retrieve a job outputfile, one can use three types of files:
But only two are listed right after.
The third should be a MassStorageFile
which is considered to be almost deprecated, and some explanation should be added why should people avoid that.
Other options e.g. SharedFile
exist and may be worth a brief mention here.
(Migrated from first-analysis-steps #221)
When following the instructions for "Setting up a new Decay" in "The simulation framework" lesson of the second analysis steps, the .run command gives the following error:
./run: line 21: ./build_env.sh: No such file or directory
When following these instructions for testing decay files, the environment is setup properly and the execution of the example provided there works:
https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/blob/master/CONTRIBUTING.md
The problem seems to come from the difference between cmt (SetupProject) and lb-dev. Since I'm no expert, I'm not sure how to fix the tutorial.
I tried running the tutorial by setting up the everything following the SetupProject steps (v49r9), but when creating the xgen file, one more option is needed: $GAUSSOPTS/Gauss-2016.py, as in the example on https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/blob/master/CONTRIBUTING.md, otherwise the following error appears:
ToolSvc.EvtGenDecay ERROR EvtGen Error from EvtGen
Unknown particle name:N(1440)+ on line 10787
This is the list of commands that worked for me:
SetupProject -c x86_64-slc6-gcc49-opt Gauss v49r9 --build-env
cd ~/cmtuser/Gauss_v49r9
SetupProject -c x86_64-slc6-gcc49-opt Gauss v49r9
git lb-clone-pkg Gen/DecFiles
cd ~/cmtuser/Gauss_v49r9/Gen/DecFiles/
make
SetupProject Gauss v49r9
cd ~/cmtuser/Gauss_v49r9
gaudirun.py Gauss-Job.py $GAUSSOPTS/Gauss-2016.py $GAUSSOPTS/GenStandAlone.py
$DECFILESROOT/options/11164001.py $LBPYTHIA8ROOT/options/Pythia8.py
(here, Gauss-Job.py is the one from the tutorial saved in the directory ~/cmtuser/Gauss_v49r9)
SetupProject DaVinci v41r0
gaudirun.py DaVinciOptions.py (changing the .xgen file name to the one I created)
In the Running a minimal DaVinci job locally lesson, the proposed way of specifying the input DST is using IOHelper().inputFiles command. At the same time, a lot of people use DaVinci.Input instead.
Is there a deep reason to prefer the IOHelper way of doing things?
We should discuss the new policy in the Developing LHCb Software lesson.
We can use the neat gitbook-plugin-codesnippet plugin to extract code directly from the example files for display in the lessons. This has the benefit of keeping the examples on the web pages in sync with those in the files.
I noticed a bug on the setDescriptorTemplate function to automatically add branches for a given decay. Basically, the function assumes the primary decay will always be added as a branch and thus removes the ^ from the first particle in the template. If the user does not specify a branch for the primary decay, this will cause the first branch for secondary particles to not be created.
Also, for the primary decay, a ^ is still introduced in the addBranches, which is at least not needed, but this does not seem to break anything as far as I could tell.
See lesson here: https://lhcb.github.io/starterkit-lessons/first-analysis-steps/add-tupletools.html
This ๐ฉ is so ๐ฅ. For real folks! It's really impressive how you continue to do such awesome work and make the Starterkit better every day!
That being said, LaTeX integration (e.g. here, bottom of the page (screenshot below)) does not seem to work.
In the Exploring the DST lesson, it is mentioned the function
seekStripDecision (which replaces our advance)
It is worth mentioning, that to reproduce the same behavior as of advance(line)
, the following syntax should be used: seekStripDecision('Stripping'+line+'Decision')
.
Some of links to the example code are broken.
Example: https://lhcb.github.io/starterkit-lessons/first-analysis-steps/code/add-tupletools/code/add-tupletools/ntuple_options.py
The right one being: https://lhcb.github.io/starterkit-lessons/first-analysis-steps/code/add-tupletools/ntuple_options.py
Reported by Giampi
Currently the lesson replaces D0 -> K- K+
with D0 -> pi- pi+
instead of D0 -> K- pi+
as is said in the text.
On Friday, one of the goals of the lesson is to teach students to create their own issues here and to solve those already existing. As an easy start, it would be nice to have some simple issues prepared so that they can be solved during the lesson.
If you have any ideas for that, it would be great to prepare them in advance, and mark by 'Friday lesson' label, so that we don't touch these issues until Friday.
When we discuss downloading files from the Grid, we should mention this is not necessary at all thanks to the root protocol and we can show how to do that
apuignav@lxplus062:~/Starterkit$ lb-run LHCbDirac/latest $SHELL
apuignav@lxplus062:~/Starterkit$ dirac-dms-lfn-accessURL /lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst --Protocol=ROOT
Using the following list of SEs: ['CNAF_MC-DST', 'IN2P3-ARCHIVE', 'LAL_MC-DST']
Failed :
IN2P3-ARCHIVE : File not at SE
Successful :
CNAF_MC-DST :
/lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst : root://xrootd-lhcb.cr.cnaf.infn.it//storage/gpfs_lhcb/lhcb/disk/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst
LAL_MC-DST :
/lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst : root://grid05.lal.in2p3.fr:1094//dpm/lal.in2p3.fr/home/lhcb/MC/2016/ALLSTREAMS.DST/00062514/0000/00062514_00000002_7.AllStreams.dst
I will explain this in my lesson and update the text later (I don't have time now), but this can serve as a reminder.
I think it would be good to teach students as quickly as possible that they can use EventPreFilters to make their jobs faster (and use fewer resources). It could be mentioned in a callout box at the end of the minimal DaVinci script lesson.
The lesson "Running a minimal DaVinci job locally" says that we're going to create an NTuple, but doesn't explain what it is:
it's useful to store the information on the selected particles inside an ntuple
Would be nice to have a box defining it.
(Migrated from first-analysis-steps #195)
In the bookkeeping lesson, we talk about MC only.
It is worth adding few words on how to find real data.
From Vanya:
In LoKi functors lesson, one probably can avoid a long paragraph with discussions on VFASPF, there are functors CHI2VX and CHI2VXNDOF that get particle as an argument and evaluate corresponding chi2_VX ( or chi2_VX/ndf).
I think it could simplify a bit the material.
Though, we would like to keep the VFASPF
discussion since it's widely used in the stripping lines. Just a brief update on the CHI2VX
and CHI2VXNDOF
should be useful.
(Migrated from first-analysis-steps #231)
In the DTF lesson, there are some remnants from the time when we used an example the D0->K+pi- decay.
For example,
As an example let's say we want to examine the Cabibbo-suppressed decay of the D^0 into pi- pi+ instead of K- pi+.
A certain clean-up is needed to update these instructions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.