Analysis essentials
Contents:
An introduction to Python
Running Python
Scripting
argparse
1: Basics
Jupyter
Basic types and operations
strong typing
Container types
Mutability
dynamic typing
Assignement and variables
Python variable assignement
Sugar: comprehensions
Sugar: using Markdown
Objects and operators
Objects
Numbers
Strings
Formatting
Lists and looping
Looping
List comprehension
Tuples
Dictionaries
Dictionary keys
Conditions
Truthiness
Conditions in loops
Functions
Inline methods
Scripting
argparse
Modules
Using modules into your code: import
The standard library
Modules from PyPi
Modules inside a virtual environment
Write your first Python module
Write a structured module
Run a module
Classes
Welcome to classes
Inheritance: a glance
How to fix
Learning more
Exploring Python
Conventional coding
Making your first histogram
Pandas
Plotting histograms
Applying cuts
More advanced topics in Python
Nice standard libraries
Nice libraries for data analysis
Python and ROOT
Advanced Python Tutorial
1: Basics
Basics
Markdown
Jupyter
Importing modules
Advanced Python Concepts
Packing and unpacking of values
Context manager
Using
yield
Using a class
Decorators and factories
Decorator
Exceptions
Custom Exception
Catching exceptions
pitfall “guaranteed execution”
Exceptions as control-flow
Advanced Classes
Dunder
len
str
Callable
Indexing (iterating)
self
Danger zone
2: First look at data
Two plotting libraries?
Recap: Importing modules
5. The toy dataset
Loading data
6. Plotting a simple histogram
Adding variables
Using rectangular cuts
Comparing distributions
3: Multivariate Analysis
Using a classifier
TODO Add a diagram of a decision tree for the above plot
4: Extension on Classification
Alternative implimentations
Feature engineering
\(k\)
-folding
Turn this into a scipt using argparse
5: Boosting to Uniformity
Loading data
Distributions in the Dalitz features for signal and background
Preparation of train/test datasets
Setting up classifiers, training
Let’s look at the results of training
ROC curves after training
Model tuning setup
Cross-validation
\(k\)
-folding & early stopping
Hyperameter optimisation
6: Histograms
Axes
Regular
Variable
Axis Name
Compatibility with mplhep
Plotting with hist
Multiple dimensions
Access Bins
Getting Density
Projecting axes
Accessing everything relevant
Multi dimensional
Arithmetics
Weights
7: Demonstration of distribution reweighting
Downloading data
prepare train and test samples
Original distributions
train part of original distribution
test part for target distribution
Bins-based reweighting in n dimensions
Gradient Boosted Reweighter
Comparing some simple expressions:
GB-discrimination
Great!
What did just happen?
How to tune
Folding reweighter
GB discrimination for reweighting rule
8: Likelihood inference
Scope of this tutorial
Getting started
Difference of the two spaces
Plotting
Loss
Fixing parameters
9: sPlot
Simple sPlot example
Observed distributions
Applying sWeights
Compare
More complex case
Splot
Alternative: Known probabilities
Building sPlot over mass
Of course we don’t have labels which events are signal and which are background beforehand
We have no information about real labels
Fitting doesn’t give us information about real labels
Appying sPlot
Using sWeights to reconstruct initial distribution
An important requirement of sPlot
Derivation of sWeights (optional)
Under assumption of linearity:
Minimization of variation
Uncorrelatedness
Conclusion
10: Scikit-HEP
formulate - converting expressions
Particle
hepunits
Vector
Vector properties
Introducing the Shell
Background
The Command-Line Interface
The Shell
Why bother?
Nelle’s Pipeline: Starting Point
Navigating Files and Directories
Nelle’s Pipeline: Organizing Files
Working With Files and Directories
Pipes and Filters
Nelle’s Pipeline: Checking Files
Loops
Shell Scripts
Finding Things
UNIX shell
Using screen to keep things running
Advanced screen topics
Finding lost screens
Using tabs in screen
Persistent screen or tmux session on lxplus
lxplus7
Setting up password-less kerberos token on lxplus7
Making use of the keytab on lxplus7
Using k5reauth to automatically refresh your kerberos token on lxplus7
lxplus8
lxplus9
More about the UNIX shell
Types of shell
Manual pages
Environment variables
Variables
Differences among files
Looping over files
Conditionals
Linking commands
Pipes and redirection
Bash security
Complexity
Text viewers
Text editors
Disk space
Over the Wire and Bandit wargame
Analysis automation with Snakemake
Documentation and environments
Workflow preservation
Basic Tutorial
What is a workflow?
Why use a workflow management system?
Introducing Snakemake
Re-running rules
Chaining rules
The limits of wildcards
Advanced Tutorial
Running scripts
Log files
Config files
Includes
Git
Automated Version Control
Setting Up Git
Creating a Repository
Tracking Changes
Exploring History
Ignoring Things
Remotes in CERN GitLab
Sharing a repository with others
Collaborating with Pull Requests
What is a Pull (or Merge) Request
Fork the original project repository
Clone a remote project and its fork
Sync your local repository with remote changes
Implement your new feature
Push changes
Create a Pull (or Merge) Request
Discussing, amending, retiring a Merge Request
Accepting a Pull Request
The social side of coding
Automatic testing
Conflicts
GitLab CI
Open Science
Licensing
Citation
Contributing
Contributor Agreement
How to Contribute
What to Contribute
Using GitHub
Other Resources
Contributor Code of Conduct
Instructional Material
Software
Analysis essentials
Index
Edit on GitHub
Index
Hello world