Analysis essentials
This is the source material for the analysis essentials website, a series of lessons for helping high-energy physics analysts become more comfortable working with the shell, version control, and programming.
The lessons introduce the basics of the bash shell, the git version control system, and the Python programming language. They are developed for and taught during the Starterkit, and aim to teach students enough to be able to follow the experiment-specific lessons that are taught afterwards.
Contributions to the lessons are highly encouraged. Please see the contributing guide for details on how to participate.
Prerequisites
There are two options for running these lessons. Running locally should be prefered on Linux and macOS as it is faster and makes it easier to save you work. On Windows it is likely easier to use Binder however care is needed to prevent notebooks being lost when the server is shut down.
Local
This tutorial uses Python 3.7
and requires some packages.
It is recommended to use Conda to install the correct packages.
To install Conda
you will need to do the following:
Install
Conda
according to the instructions hereYou can add
source /my/path/for/miniconda/etc/profile.d/conda.sh
to your.bashrc
Add the channel:
conda config --add channels conda-forge
Now to use your first Conda
environment:
Create an environment with some packages already installed:
conda create -n my-analysis-env python=3.7 jupyterlab ipython matplotlib uproot numpy pandas scikit-learn scipy tensorflow xgboost hep_ml wget
Activate your environment by doing:
conda activate my-analysis-env
You can install additional packages by doing:
conda install package_name
For the lessons to work fully you will also need to install a special helper package with pip:
pip install git+https://github.com/hsf-training/python-lesson.git
You will also need Jupyter to run the examples in this tutorial. Jupyter was already installed in the previous command and can be ran by following the instructions here. Note: You will need Python.
Binder
Usage
You should now be able to use the tutorial.
First clone with git:
git clone https://github.com/hsf-training/analysis-essentials.git
For more information on getting started with git please see the Analysis Essentials course
cd analysis-essentials
jupyter lab
This should open a Jupyter webpage with the current directory displayed. Navigate to one of the lessons to start the tutorial.
If you have any problems or questions, you can open an issue on this repository.
Contents:
- An introduction to Python
- Advanced Python Tutorial
- 1: Basics
- Advanced Python Concepts
- Advanced Classes
- Danger zone
- 2: First look at data
- 3: Multivariate Analysis
- 4: Extension on Classification
- 5: Boosting to Uniformity
- 6: Histograms
- 7: Demonstration of distribution reweighting
- 8: Likelihood inference
- 9: sPlot
- Simple sPlot example
- Observed distributions
- Applying sWeights
- More complex case
- Splot
- Alternative: Known probabilities
- Fitting doesn’t give us information about real labels
- Appying sPlot
- Using sWeights to reconstruct initial distribution
- An important requirement of sPlot
- Derivation of sWeights (optional)
- Conclusion
- 10: Scikit-HEP
- Introducing the Shell
- UNIX shell
- Analysis automation with Snakemake
- Git
- Automated Version Control
- Setting Up Git
- Creating a Repository
- Tracking Changes
- Exploring History
- Ignoring Things
- Remotes in CERN GitLab
- Sharing a repository with others
- Collaborating with Pull Requests
- What is a Pull (or Merge) Request
- Fork the original project repository
- Clone a remote project and its fork
- Sync your local repository with remote changes
- Implement your new feature
- Push changes
- Create a Pull (or Merge) Request
- Discussing, amending, retiring a Merge Request
- Accepting a Pull Request
- The social side of coding
- Automatic testing
- Conflicts
- GitLab CI
- Open Science
- Licensing
- Citation
- Contributing