1: Basics
In this Advanced Python Tutorial we will cover some useful python skills and tips. The lessons are as follows:
Basics
Loading data and plotting with matplotlib
Cut based selction
Multivariate Analysis with Scikit Learn
uBoost with hep_ml
Neural Network Demo
Mutivariate kinematic reweighting
The sPlot technique
The first lesson will be on some python basics:
Mutable and immutable objects in python
List and dictionaries and comprehensions
Writing code in markdown
Jupyter notebook basics
Importing moduels
Basics
Mutability
Let’s start by comparing some mutable and immutable objects.
In python lists and strings are mutable and tuples are immutable.
What happens when you run the code below?
[1]:
a = ['a', 'b', 'c']
b = a
b[1] = 'hello'
print(a)
print(b)
['a', 'hello', 'c']
['a', 'hello', 'c']
[2]:
a = {'a': '0', 'b': '1', 'c': '2'}
b = a
b['b'] = 'hello'
print(a)
print(b)
{'a': '0', 'b': 'hello', 'c': '2'}
{'a': '0', 'b': 'hello', 'c': '2'}
[3]:
a = 'foo'
b = 'bar'
for c in [a, b]:
c += '!'
print(a)
print(b)
foo
bar
List comprehensions
[4]:
N = 10
list_of_squares = [i**2 for i in range(N)]
sum_of_squares = sum(list_of_squares)
print('Sum of squares for', N, 'is', sum_of_squares)
Sum of squares for 10 is 285
Dictionary comprehensions
[5]:
squares = {i: i**2 for i in range(10)}
print(squares)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
[6]:
N = 5
print('The square of', N, 'is', squares[N])
The square of 5 is 25
Markdown
Write comments inline about your code:
Use LaTeX:
\(A = \frac{1}{B+C}\)
Show lists:
A wonderful
List
This is
Show code with syntax highlighting:
Python: (but in a sad grey world)
print('Hello world')
Python:
print('Hello world')
C++:
#include <iostream>
std::cout << "Hello world" << std::endl;
Bash:
echo "Hello world"
f-strings
[7]:
pt_cut = 1789.234567890987654
eta_low = 2
eta_high = 5
cut_string = f'(PT > {pt_cut:.2f}) & ({eta_low} < ETA < {eta_high})'
print(cut_string)
(PT > 1789.23) & (2 < ETA < 5)
Jupyter
Jupyter has some very useful features included that can help make trying things out faster…
Cells have a return value which is shown after the finish runing if it’s not None
:
[8]:
"Hello starterkitters"
[8]:
'Hello starterkitters'
[9]:
None
Run a shell command:
[10]:
!ls
10Basics.ipynb 33ModelTuning.ipynbBKP
11AdvancedPython.ipynb 40Histograms.ipynb
12AdvancedClasses.ipynb 45DemoReweighting.ipynb
20DataAndPlotting.ipynb 50LikelihoodInference.ipynb
30Classification.ipynb 60sPlot.ipynb
31ClassificationExtension.ipynb 70ScikitHEPUniverse.ipynb
32BoostingToUniformity.ipynb README.md
[11]:
!wget https://example.com/index.html
--2025-02-10 17:00:53-- https://example.com/index.html
Resolving example.com (example.com)... 23.192.228.80, 23.192.228.84, 23.215.0.136, ...
Connecting to example.com (example.com)|23.192.228.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘index.html’
index.html 100%[===================>] 1.23K --.-KB/s in 0s
2025-02-10 17:00:54 (892 MB/s) - ‘index.html’ saved [1256/1256]
Time how long something takes for one line:
[12]:
%time sum([i**2 for i in range(10000)])
CPU times: user 629 μs, sys: 0 ns, total: 629 μs
Wall time: 634 μs
[12]:
333283335000
Time how long an entire cell takes:
[13]:
%%time
a = sum([i**2 for i in range(10000)])
b = sum([i**2 for i in range(10000)])
c = sum([i**2 for i in range(10000)])
CPU times: user 1.37 ms, sys: 0 ns, total: 1.37 ms
Wall time: 1.37 ms
If something takes longer than you expect, you can profile it to find out where it spends it’s time:
# Maybe skip this %%prun -s cumtime a = sum([np.sqrt(i) for i in range(100000)])Jupyter also makes it easy to look at documentation, just add a question mark to the end of the line
[14]:
def my_print(my_string):
print(my_string)
[15]:
my_print?
Two question marks allows you to see the code that is in the function
[16]:
my_print??
[17]:
range?
Note that this is done without running the actual line of code so sometimes you need to use a junk variable to make it work
[18]:
get?
Object `get` not found.
[19]:
{'a': 'b'}.get
[19]:
<function dict.get(key, default=None, /)>
[20]:
junk = {'a': 'b'}.get
junk?
Importing modules
It is good practice to import all modules at the beginning of your python script or notebook
Avoid using wildcard imports as you it makes it unclear where things come from: for example
from math import *
Below we now have two
max
functions and trying to usemax
will return an error
[21]:
max(10, 15)
[21]:
15
[22]:
from numpy import max
max(10, 15)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In[22], line 3
1 from numpy import max
----> 3 max(10, 15)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/_core/fromnumeric.py:2899, in max(a, axis, out, keepdims, initial, where)
2781 @array_function_dispatch(_max_dispatcher)
2782 @set_module('numpy')
2783 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
2784 where=np._NoValue):
2785 """
2786 Return the maximum of an array or maximum along an axis.
2787
(...)
2897 5
2898 """
-> 2899 return _wrapreduction(a, np.maximum, 'max', axis, None, out,
2900 keepdims=keepdims, initial=initial, where=where)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/_core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
83 else:
84 return reduction(axis=axis, out=out, **passkwargs)
---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
AxisError: axis 15 is out of bounds for array of dimension 0
To avoid this, import numpy as np
and then use np.max
.
[23]:
import numpy as np
np.max([0, 1, 2])
[23]:
np.int64(2)
Some common abrivations for packages are:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import ROOT as R
Typially the nicest code uses a mixture of import X
and from X import Y
like we have above.
If you’re interested in following best practices for style look there is an offical style guide called PEP8. The document itself is quite long but you can also get automated sytle checkers called ‘linters’. Look into flake8, either as a command line application or as a plugin for your favourite text editor. Take care though, it’s occasionally better to break style rules to make code easier to read!
Restart the kernal to fix
max
[24]:
max(10, 15)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In[24], line 1
----> 1 max(10, 15)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/_core/fromnumeric.py:2899, in max(a, axis, out, keepdims, initial, where)
2781 @array_function_dispatch(_max_dispatcher)
2782 @set_module('numpy')
2783 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
2784 where=np._NoValue):
2785 """
2786 Return the maximum of an array or maximum along an axis.
2787
(...)
2897 5
2898 """
-> 2899 return _wrapreduction(a, np.maximum, 'max', axis, None, out,
2900 keepdims=keepdims, initial=initial, where=where)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/_core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
83 else:
84 return reduction(axis=axis, out=out, **passkwargs)
---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
AxisError: axis 15 is out of bounds for array of dimension 0