1: Basics

In this Advanced Python Tutorial we will cover some useful python skills and tips. The lessons are as follows:

  • Basics

  • Loading data and plotting with matplotlib

  • Cut based selction

  • Multivariate Analysis with Scikit Learn

  • uBoost with hep_ml

  • Neural Network Demo

  • Mutivariate kinematic reweighting

  • The sPlot technique

The first lesson will be on some python basics:

  • Mutable and immutable objects in python

  • List and dictionaries and comprehensions

  • Writing code in markdown

  • Jupyter notebook basics

  • Importing moduels

Basics

Mutability

  • Let’s start by comparing some mutable and immutable objects.

  • In python lists and strings are mutable and tuples are immutable.

  • What happens when you run the code below?

[1]:
a = ['a', 'b', 'c']
b = a
b[1] = 'hello'

print(a)
print(b)
['a', 'hello', 'c']
['a', 'hello', 'c']
[2]:
a = {'a': '0', 'b': '1', 'c': '2'}
b = a
b['b'] = 'hello'

print(a)
print(b)
{'a': '0', 'b': 'hello', 'c': '2'}
{'a': '0', 'b': 'hello', 'c': '2'}
[3]:
a = 'foo'
b = 'bar'
for c in [a, b]:
    c += '!'

print(a)
print(b)
foo
bar

List comprehensions

[4]:
N = 10

list_of_squares = [i**2 for i in range(N)]
sum_of_squares = sum(list_of_squares)

print('Sum of squares for', N, 'is', sum_of_squares)
Sum of squares for 10 is 285

Dictionary comprehensions

[5]:
squares = {i: i**2 for i in range(10)}
print(squares)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
[6]:
N = 5
print('The square of', N, 'is', squares[N])
The square of 5 is 25

Markdown

Write comments inline about your code:

Use LaTeX:

\(A = \frac{1}{B+C}\)

Show lists:

  • A wonderful

  • List

  • This is

Show code with syntax highlighting:

Python: (but in a sad grey world)

print('Hello world')

Python:

print('Hello world')

C++:

#include <iostream>

std::cout << "Hello world" << std::endl;

Bash:

echo "Hello world"

f-strings

[7]:
pt_cut = 1789.234567890987654
eta_low = 2
eta_high = 5

cut_string = f'(PT > {pt_cut:.2f}) & ({eta_low} < ETA < {eta_high})'
print(cut_string)
(PT > 1789.23) & (2 < ETA < 5)

Jupyter

Jupyter has some very useful features included that can help make trying things out faster…

Cells have a return value which is shown after the finish runing if it’s not None:

[8]:
"Hello starterkitters"
[8]:
'Hello starterkitters'
[9]:
None

Run a shell command:

[10]:
!ls
10Basics.ipynb                   33ModelTuning.ipynb
11AdvancedPython.ipynb           40Histograms.ipynb
12AdvancedClasses.ipynb          45DemoReweighting.ipynb
20DataAndPlotting.ipynb          50LikelihoodInference.ipynb
30Classification.ipynb           60sPlot.ipynb
31ClassificationExtension.ipynb  70ScikitHEPUniverse.ipynb
32BoostingToUniformity.ipynb     README.md
[11]:
!wget https://example.com/index.html
--2024-04-15 14:36:24--  https://example.com/index.html
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘index.html’

index.html 0%[ ] 0 –.-KB/s index.html 100%[===================&gt;] 1.23K –.-KB/s in 0s

2024-04-15 14:36:24 (94.1 MB/s) - ‘index.html’ saved [1256/1256]

</pre>

index.html 0%[ ] 0 –.-KB/s index.html 100%[===================>] 1.23K –.-KB/s in 0s

2024-04-15 14:36:24 (94.1 MB/s) - ‘index.html’ saved [1256/1256]

end{sphinxVerbatim}

index.html 0%[ ] 0 –.-KB/s index.html 100%[===================>] 1.23K –.-KB/s in 0s

2024-04-15 14:36:24 (94.1 MB/s) - ‘index.html’ saved [1256/1256]

Time how long something takes for one line:

[12]:
%time sum([i**2 for i in range(10000)])
CPU times: user 459 µs, sys: 142 µs, total: 601 µs
Wall time: 606 µs
[12]:
333283335000

Time how long an entire cell takes:

[13]:
%%time
a = sum([i**2 for i in range(10000)])
b = sum([i**2 for i in range(10000)])
c = sum([i**2 for i in range(10000)])
CPU times: user 954 µs, sys: 290 µs, total: 1.24 ms
Wall time: 1.25 ms

If something takes longer than you expect, you can profile it to find out where it spends it’s time:

# Maybe skip this %%prun -s cumtime a = sum([np.sqrt(i) for i in range(100000)])

Jupyter also makes it easy to look at documentation, just add a question mark to the end of the line

[14]:
def my_print(my_string):
    print(my_string)
[15]:
my_print?

Two question marks allows you to see the code that is in the function

[16]:
my_print??
[17]:
range?

Note that this is done without running the actual line of code so sometimes you need to use a junk variable to make it work

[18]:
{'a': 'b'}.get?
Object `get` not found.
[19]:
{'a': 'b'}.get
[19]:
<function dict.get(key, default=None, /)>
[20]:
junk = {'a': 'b'}.get
junk?

Importing modules

  • It is good practice to import all modules at the beginning of your python script or notebook

  • Avoid using wildcard imports as you it makes it unclear where things come from: for example from math import *

  • Below we now have two max functions and trying to use max will return an error

[21]:
max(10, 15)
[21]:
15
[22]:
from numpy import max

max(10, 15)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Cell In[22], line 3
      1 from numpy import max
----> 3 max(10, 15)

File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:2810, in max(a, axis, out, keepdims, initial, where)
   2692 @array_function_dispatch(_max_dispatcher)
   2693 @set_module('numpy')
   2694 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
   2695          where=np._NoValue):
   2696     """
   2697     Return the maximum of an array or maximum along an axis.
   2698
   (...)
   2808     5
   2809     """
-> 2810     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
   2811                           keepdims=keepdims, initial=initial, where=where)

File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:88, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85         else:
     86             return reduction(axis=axis, out=out, **passkwargs)
---> 88 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

AxisError: axis 15 is out of bounds for array of dimension 0

To avoid this, import numpy as np and then use np.max.

[23]:
import numpy as np

np.max([0, 1, 2])
[23]:
2

Some common abrivations for packages are:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import ROOT as R

Typially the nicest code uses a mixture of import X and from X import Y like we have above.

If you’re interested in following best practices for style look there is an offical style guide called PEP8. The document itself is quite long but you can also get automated sytle checkers called ‘linters’. Look into flake8, either as a command line application or as a plugin for your favourite text editor. Take care though, it’s occasionally better to break style rules to make code easier to read!

  • Restart the kernal to fix max

[24]:
max(10, 15)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 max(10, 15)

File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:2810, in max(a, axis, out, keepdims, initial, where)
   2692 @array_function_dispatch(_max_dispatcher)
   2693 @set_module('numpy')
   2694 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
   2695          where=np._NoValue):
   2696     """
   2697     Return the maximum of an array or maximum along an axis.
   2698
   (...)
   2808     5
   2809     """
-> 2810     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
   2811                           keepdims=keepdims, initial=initial, where=where)

File /usr/share/miniconda/envs/analysis-essentials/lib/python3.11/site-packages/numpy/core/fromnumeric.py:88, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85         else:
     86             return reduction(axis=axis, out=out, **passkwargs)
---> 88 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

AxisError: axis 15 is out of bounds for array of dimension 0