This lesson is being piloted (Beta version)

Trees, Branches, and Events

Overview

Teaching: 20 min
Exercises: 5 min
Questions
  • How do I access a TTree?

  • How can I tell what branches are in a TTree?

  • How do I read the data from a TTree?

Objectives
  • List the branches in a tree.

  • Access the branches in a tree.

  • Create a table from tree branches.

  • Access data for a particular event.

Trees

Trees in ROOT are basically just tables of information. Trees are composed of branches, which are the columns of the table. The rows usually represent events (individual bunch crossings).

First we assign the tree to a variable (named tree here)

tree = file['Events']

In order to find out what information is in the tree, we need to know what the branches (columns) are. The term key is used (again) here to refer to the names of the branches.

tree.keys()
['nMuon', 'Muon_pt', 'Muon_eta', 'Muon_phi', 'Muon_mass', 'Muon_charge']

The above output is a list of the branch names. So we can see that for each event, we will have the number of muons in the event (nMuon) and the pT, eta, phi, mass, and charge of each muon.

But how do we get the actual data from the table? There are several ways with Uproot, but the simplest is with the arrays() function:

tree.arrays()
<Array [{nMuon: 2, Muon_pt: [10.8, ... -1, 1]}] type='100000 * {"nMuon": uint32,...'>

You can see some numbers in there, which indeed are from the data in the tree.

Branches

Now we assign this object (which contains both the names and contents of the branches) to another variable (branches):

branches = tree.arrays()

Next let’s just look at each branch individually. You can access a single branch from branches in a similar way to getting an item from a ROOT file object (array-like notation):

branches['nMuon']
<Array [2, 2, 1, 4, 4, 3, ... 0, 3, 2, 3, 2, 3] type='100000 * uint32'>

You can see the partial list of numbers in the output that represents the number of muons in each event. It’s abbreviated with an ellipsis (...) so that it doesn’t take up the whole page.

Error?

If you get something like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'nMuon'

then you are almost certainly using an older version of Uproot that is not compatible with the rest of this tutorial. If this is the case, install the latest version of Uproot, restart the notebook’s kernel, and try again.

These Array objects are a special type provided by the Awkward Array package. The type=100000 * uint32 means that there are 100,000 entries and that each entry is a 32-bit unsigned integer. Each entry corresponds to one event.

Let’s look at another branch:

branches['Muon_pt']
<Array [[10.8, 15.7], ... 11.4, 3.08, 4.97]] type='100000 * var * float32'>

This is a jagged array because the number of entries is different for different events (because each event can have a different number of muons). Note that there are square brackets [] surrounding the list of entries for each event. The type='100000 * var * float32' means that there are 100,000 rows, each containing a variable number of 32-bit floating point numbers. This is basically an array of arrays (or a 2D array).

Events

If we want to focus on a particular event, we can index it just like a normal array:

branches['Muon_pt'][0]
<Array [10.8, 15.7] type='2 * float32'>

From the above output, the first event has two muons, and the two numbers in the list are the muons’ pT. It’s not specified anywhere in the file, but the units are GeV. Let’s look at the third event:

branches['Muon_pt'][2]
<Array [3.28] type='1 * float32'>

It only has one muon.

Exercise

Print out the pT of all muons that are in only the first 10 events. (There are many possible ways to do this.)

Solution

Here’s one way to do it. All that matters is that you get the same numbers (and number of numbers) in the output

for i in range(10):
    print(branches['Muon_pt'][i])
[10.8, 15.7]
[10.5, 16.3]
[3.28]
[11.4, 17.6, 9.62, 3.5]
[3.28, 3.64, 32.9, 23.7]
[3.57, 4.57, 4.37]
[57.6, 53]
[11.3, 23.9]
[10.2, 14.2]
[11.5, 3.47]

What if we want to get all of the information about a single event? So far we’ve accessed data in branches by providing a branch name, but we can also just use an event index:

branches[0]
<Record ... 0.106], Muon_charge: [-1, -1]} type='{"nMuon": uint32, "Muon_pt": va...'>

This is a Record object, which is another special type provided by Awkward Array. It functions basically the same way as a standard Python dictionary (dict). Unfortunately, most of the interesting information is still hidden in the above output to save space. A little trick we can use to force printing all the data is adding .tolist():

branches[0].tolist()
{'nMuon': 2,
 'Muon_pt': [10.763696670532227, 15.736522674560547],
 'Muon_eta': [1.0668272972106934, -0.563786506652832],
 'Muon_phi': [-0.03427272289991379, 2.5426154136657715],
 'Muon_mass': [0.10565836727619171, 0.10565836727619171],
 'Muon_charge': [-1, -1]}

There we go. Now we can see the whole picture for an individual event.

.tolist()

.tolist() is a NumPy function that has been extended to Awkward Array objects. As the name suggests, it converts NumPy arrays to Python lists. In the case of trees, which have named branches, it actually converts to a dictionary of lists. It can be very useful when you want to understand exactly what’s in an Array or Record. Be careful when using it, though–trying to print out an entire branch or tree could cause Python to crash if it’s large enough. Therefore it’s best to only use tolist() on one or a few events at a time to be safe.

Key Points

  • TTrees are tables of data.

  • Trees are made of branches, which are columns in the table.

  • Each row represents an event.