1: Basics
In this Advanced Python Tutorial we will cover some useful python skills and tips. The lessons are as follows:
Basics
Loading data and plotting with matplotlib
Cut based selction
Multivariate Analysis with Scikit Learn
uBoost with hep_ml
Neural Network Demo
Mutivariate kinematic reweighting
The sPlot technique
The first lesson will be on some python basics:
Mutable and immutable objects in python
List and dictionaries and comprehensions
Writing code in markdown
Jupyter notebook basics
Importing moduels
Basics
Mutability
Let’s start by comparing some mutable and immutable objects.
In python lists and strings are mutable and tuples are immutable.
What happens when you run the code below?
[1]:
a = ['a', 'b', 'c']
b = a
b[1] = 'hello'
print(a)
print(b)
['a', 'hello', 'c']
['a', 'hello', 'c']
[2]:
a = {'a': '0', 'b': '1', 'c': '2'}
b = a
b['b'] = 'hello'
print(a)
print(b)
{'a': '0', 'b': 'hello', 'c': '2'}
{'a': '0', 'b': 'hello', 'c': '2'}
[3]:
a = 'foo'
b = 'bar'
for c in [a, b]:
c += '!'
print(a)
print(b)
foo
bar
List comprehensions
[4]:
N = 10
list_of_squares = [i**2 for i in range(N)]
sum_of_squares = sum(list_of_squares)
print('Sum of squares for', N, 'is', sum_of_squares)
Sum of squares for 10 is 285
Dictionary comprehensions
[5]:
squares = {i: i**2 for i in range(10)}
print(squares)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
[6]:
N = 5
print('The square of', N, 'is', squares[N])
The square of 5 is 25
Markdown
Write comments inline about your code:
Use LaTeX:
\(A = \frac{1}{B+C}\)
Show lists:
A wonderful
List
This is
Show code with syntax highlighting:
Python: (but in a sad grey world)
print('Hello world')
Python:
print('Hello world')
C++:
#include <iostream>
std::cout << "Hello world" << std::endl;
Bash:
echo "Hello world"
f-strings
[7]:
pt_cut = 1789.234567890987654
eta_low = 2
eta_high = 5
cut_string = f'(PT > {pt_cut:.2f}) & ({eta_low} < ETA < {eta_high})'
print(cut_string)
(PT > 1789.23) & (2 < ETA < 5)
Jupyter
Jupyter has some very useful features included that can help make trying things out faster…
Cells have a return value which is shown after the finish runing if it’s not None
:
[8]:
"Hello starterkitters"
[8]:
'Hello starterkitters'
[9]:
None
Run a shell command:
[10]:
!ls
10Basics.ipynb 33ModelTuning.ipynb
11AdvancedPython.ipynb 40Histograms.ipynb
12AdvancedClasses.ipynb 45DemoReweighting.ipynb
20DataAndPlotting.ipynb 50LikelihoodInference.ipynb
30Classification.ipynb 60sPlot.ipynb
31ClassificationExtension.ipynb 70ScikitHEPUniverse.ipynb
32BoostingToUniformity.ipynb README.md
[11]:
!wget https://example.com/index.html
--2024-04-15 14:36:24-- https://example.com/index.html
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘index.html’
index.html 0%[ ] 0 –.-KB/s index.html 100%[===================>] 1.23K –.-KB/s in 0s
2024-04-15 14:36:24 (94.1 MB/s) - ‘index.html’ saved [1256/1256]
</pre>
index.html 0%[ ] 0 –.-KB/s index.html 100%[===================>] 1.23K –.-KB/s in 0s
2024-04-15 14:36:24 (94.1 MB/s) - ‘index.html’ saved [1256/1256]
end{sphinxVerbatim}
index.html 0%[ ] 0 –.-KB/s index.html 100%[===================>] 1.23K –.-KB/s in 0s
2024-04-15 14:36:24 (94.1 MB/s) - ‘index.html’ saved [1256/1256]
Time how long something takes for one line:
[12]:
%time sum([i**2 for i in range(10000)])
CPU times: user 459 µs, sys: 142 µs, total: 601 µs
Wall time: 606 µs
[12]:
333283335000
Time how long an entire cell takes:
[13]:
%%time
a = sum([i**2 for i in range(10000)])
b = sum([i**2 for i in range(10000)])
c = sum([i**2 for i in range(10000)])
CPU times: user 954 µs, sys: 290 µs, total: 1.24 ms
Wall time: 1.25 ms
If something takes longer than you expect, you can profile it to find out where it spends it’s time:
# Maybe skip this %%prun -s cumtime a = sum([np.sqrt(i) for i in range(100000)])Jupyter also makes it easy to look at documentation, just add a question mark to the end of the line
[14]:
def my_print(my_string):
print(my_string)
[15]:
my_print?
Two question marks allows you to see the code that is in the function
[16]:
my_print??
[17]:
range?
Note that this is done without running the actual line of code so sometimes you need to use a junk variable to make it work
[18]:
{'a': 'b'}.get?
Object `get` not found.
[19]:
{'a': 'b'}.get
[19]:
<function dict.get(key, default=None, /)>
[20]:
junk = {'a': 'b'}.get
junk?
Importing modules
It is good practice to import all modules at the beginning of your python script or notebook
Avoid using wildcard imports as you it makes it unclear where things come from: for example
from math import *
Below we now have two
max
functions and trying to usemax
will return an error
[21]:
max(10, 15)
[21]:
15
[22]:
from numpy import max
max(10, 15)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In[22], line 3
1 from numpy import max
----> 3 max(10, 15)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:2810, in max(a, axis, out, keepdims, initial, where)
2692 @array_function_dispatch(_max_dispatcher)
2693 @set_module('numpy')
2694 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
2695 where=np._NoValue):
2696 """
2697 Return the maximum of an array or maximum along an axis.
2698
(...)
2808 5
2809 """
-> 2810 return _wrapreduction(a, np.maximum, 'max', axis, None, out,
2811 keepdims=keepdims, initial=initial, where=where)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:88, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85 else:
86 return reduction(axis=axis, out=out, **passkwargs)
---> 88 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
AxisError: axis 15 is out of bounds for array of dimension 0
To avoid this, import numpy as np
and then use np.max
.
[23]:
import numpy as np
np.max([0, 1, 2])
[23]:
2
Some common abrivations for packages are:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import ROOT as R
Typially the nicest code uses a mixture of import X
and from X import Y
like we have above.
If you’re interested in following best practices for style look there is an offical style guide called PEP8. The document itself is quite long but you can also get automated sytle checkers called ‘linters’. Look into flake8, either as a command line application or as a plugin for your favourite text editor. Take care though, it’s occasionally better to break style rules to make code easier to read!
Restart the kernal to fix
max
[24]:
max(10, 15)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In[24], line 1
----> 1 max(10, 15)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:2810, in max(a, axis, out, keepdims, initial, where)
2692 @array_function_dispatch(_max_dispatcher)
2693 @set_module('numpy')
2694 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
2695 where=np._NoValue):
2696 """
2697 Return the maximum of an array or maximum along an axis.
2698
(...)
2808 5
2809 """
-> 2810 return _wrapreduction(a, np.maximum, 'max', axis, None, out,
2811 keepdims=keepdims, initial=initial, where=where)
File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:88, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85 else:
86 return reduction(axis=axis, out=out, **passkwargs)
---> 88 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
AxisError: axis 15 is out of bounds for array of dimension 0