Scikit-HEP Tutorial: Glossary

Key Points

Introduction: Python background
  • Be familiar with the syntax of Python dicts, NumPy arrays, slicing rules, and bitwise logic operators.

  • Large-scale computations in Python tend to be performed one array at a time, rather than one scalar operation at a time.

  • You, as a user, will likely be gluing together many packages in each data analysis.

Basic file I/O with Uproot
  • Uproot TDirectories and TTrees have a dict-like interface.

  • Uproot reading methods are primarily intended to get data into a more specialized library.

  • Uproot writing is more limited, but it can write histograms and TTrees.

TTree details
  • ROOT files have a structure that enables partial reading. This is essential for large datasets.

  • Be aware of how much data you’re reading and when.

  • The Python + Jupyter + Uproot interface provides a gradual path from interactive tinkering to scaled-up workflows.

Jagged, ragged, Awkward Arrays
  • NumPy (and almost all array libraries) is only for rectilinear collections of numbers: arrays, tables, and tensors.

  • Awkward Array extends NumPy’s slicing and array-manipulation to jagged arrays and more general data types (such as nested records).

  • These extensions are useful for physics.

  • There’s usually more than one way to get what you want.

Histogram manipulations and fitting
  • High-energy physicists approach histogramming in a different way from NumPy, Matplotlib, SciPy, etc.

  • Scikit-HEP tools make histogramming and fitting Pythonic.

Lorentz vectors, particle PDG IDs, jet-clustering, oh my!
  • Instead of building vector methods into multiple packages, a standalone package provides just that.

  • The value of these small packages amplify when used together.

Tools for scaling up
  • See Coffea for more about scaling up your analysis.

  • Pythonic high-energy physics is a broad and growing ecosystem.

Glossary