Modules

Python comes with lots of useful stuff, which is provided with modules (and submodules, see later). We have already met the maths module, but did not talk about how we started using it.

>>> import math
>>> math
<module 'math' from '/usr/lib/python2.7/lib-dynload/math.so'>
>>> math.pi
3.141592653589793
>>> math.sin(1)
0.8414709848078965

The path after from might look different on your computer.

So, math is a module, and this seems to behave a lot like other objects we have met: it is a container with properties and methods attached that we can access with the dot operator .. Actually, that is pretty much all there is to them.

Using modules into your code: import

The keyword import, usually specified at the beginning of your source code, is used to tell Python what modules you want to make available to your current code.

There are different ways of specifying an import. The one we have seen already simply makes the module available to you:

>>> import random
>>> random.uniform(0, 1)
0.5877109428927353

The module random contains functions useful for random number generation: with the import above, we have made the random module accessible, and everything within that module is accessible via the syntax random.<name>. For the record, the uniform(x,y) method returns a pseudo-random number within the range \(` [x,y] `\).

Sometimes you want to make only one or more things from a given module accessible: Python gives you the ability to import just those:

>>> from random import uniform, choice
>>> uniform(0, 1)
0.4059007502204043
>>> choice([ 33, 56, 42, -1 ])
42

In this case the uniform and choice names are available directly, i.e. without using the random prefix. All other functions in the random module are not available in this case. For the record, the choice function returns a random element from a given collection.

Another option is to import all functions of a certain module and make them available without a prefix:

>>> from random import *
>>> gauss(0, 1)
-1.639334770284028

This is not that recommended as you generally do not know what is the extent of what you are importing and you might end up with name clashes between your current code and the imported module, as it will all be in the same namespace, meaning directly available with no need for a .<name> syntax.

Lastly, it is possible to import modules, or specific names from a module, under an alias.

>>> from random import uniform as uni
>>> uni(0, 1)
0.7288973406605329
>>> import numpy as np
np.arccos(1)
0.0

This option is useful when you need to assign shorter aliases to names you will use frequently. In particular, the alias np for the numpy module will be encountered a lot.

Note that modules can have submodules, specified with extra dots .:

>>> from os.path import abspath
>>> abspath('..')
'/afs/cern.ch/user/d'

When importing a module, its submodules are not available by default and you must import them explicitly:

>>> import os
>>> os.getcwd()
'/afs/cern.ch/user/d/dberzano'
>>> import os.path
>>> os.path.basename(os.getcwd())
'dberzano'

Note that due to the current Python implementation of the os module, os.path functions are actually available even without importing os.path. But just os. You cannot and should not rely on this implementation, which represents an exception and might change in the future. Always import submodules explicitly!

It is also possible to import several modules with a single import command:

>>> import os, sys, math

but this is not recommended by the Python style guide, which suggests to use several import statements, one per module, as it improves readability:

>>> import os
>>> import sys
>>> import math

If you need to import several names from a single module, you can split an import function over multiple lines:

>>> from math import (
...     exp,
...     log,
...     e,
...     floor
... )
>>> floor(exp(log(e)))
2.0

The standard library

The set of things that Python comes with, from all of the types of objects to all of the different modules, is called the standard library. It is recommended to browse through the standard library documentation to see what is available: Python is rich of standard modules, and you should reuse them as much as possible instead of rewriting code on your own.

Some of the categories for which standard modules are available are:

  • processing paths

  • date and time manipulation

  • mathematical functions

  • parsing of certain file formats

  • support for multiple processes and threads

Use standard Python library modules with confidence: being part of any standard Python distribution, your code will be easily portable.

Modules from PyPi

Many external modules can be found on PyPi, the Python Package Index repository. Some of those modules are already part of some Python distributions (such as Anaconda, which comes with more than a thousand science-oriented modules preinstalled).

If a certain module you need is not available on your distribution you can easily install it with the pip shell command. Since you typically do not have write access to the standard Python installation’s directories, pip allows you to install modules only for yourself, under your current user’s home directory. It is recommended to set up in your shell startup script (such as ~/.bashrc) the following two lines telling once and for all where to install and search for Python user modules:

export PYTHONUSERBASE=$HOME/.local
export PATH=$PYTHONUSERBASE/bin:$PATH

Once you have done that, close your current terminal window and open a new one, and you will be ready to use pip. We will see in a later lesson how to install the root_pandas module with:

pip install --user root_pandas

Modules inside a virtual environment

It is however usually preferable and safer to do everything inside a virtual environement. The latter is like a copy of your current environement. Thus you can modify your virtual environement (including installing/deleting/updating modules) without affecting your default environement. If at some point you realize you have broken everything, you can always exit the virtual environement and go back to the default lxplus one.

To build a virtual environement based on LCG views, you can use LCG_virtualenv:

git clone https://gitlab.cern.ch/cburr/lcg_virtualenv.git
./lcg_virtualenv/create_lcg_virtualenv myVenv

To activate the virtual environement do:

source myVenv/bin/activate

You can then install stuff with pip, like for instance root_pandas:

pip install --upgrade root_pandas matplotlib
python -c 'import pandas; print(f"Got pandas from {pandas.__file__}")'
python -c 'import root_pandas; print(f"Got root_pandas from {root_pandas.__file__}")'
python -c 'import matplotlib; print(f"Got matplotlib from {matplotlib.__file__}")'

You can go back to the default environement using the deactivate command.

Write your first Python module

The simplest Python module you can write is just a .py file with some functions inside:

# myfirstmodule.py

def one():
    print('this is my first function')

def two():
    print('this is my second function')

You can now fire an ipython shell and use those functions right away:

>>> import myfirstmodule
>>> myfirstmodule.one()
this is my first function
>>> myfirstmodule.two()
this is my second function

By simply calling the file myfirstmodule.py we have made it available as a module named myfirstmodule - given that the file is in the same directory where we have launched the Python interpreter.

Module name restrictions

Note that you cannot pick any name you want for a module! From the Python style guide, we gather that we should use “short, all-lowercase names”. As a matter of fact, if we used dashes in the file name, we would have ended up with a syntax error while trying to load it:

>>> import my-first-module
  File "<ipython-input-1-ef292d9e19fe>", line 1
    import my-first-module
             ^
SyntaxError: invalid syntax

Python treats - as a minus and does not understand your intentions.

Write a structured module

Let’s now create a more structured module, with submodules and different files. We can start from the myfirstmodule.py file and create a directory structure:

$ mkdir yabba
$ cp myfirstmodule.py yabba/__init__.py

We have reused the same file created before, copied it into a directory called yabba and renamed it to __init__.py. The double underscore should ring a bell: this is a Python special name, and it represents the “main file” within a module, whereas the directory name now represents the module name.

This means that our module is called yabba, and if we import it, functions from __init__.py will be available:

>>> import yabba
>>> yabba.one()
this is my first function
>>> yabba.two()
this is my second function

We can create an additional file inside the yabba directory, say yabba/extra.py and have more functions there:

# yabba/extra.py

def three():
  print 'this function will return the number three'
  return 3

We have effectively made extra a submodule of yabba. Let’s try:

>>> import yabba
>>> filter(lambda x: not x.startswith('__'), dir(yabba))
['one', 'two']
>>> import yabba.extra
>>> yabba.extra.three()
yabba.extra.three()
this function will return the number three
3

What have I done with the filter function?

We have used the filter function above to list the functions we have defined in our module. Can you describe in detail what the commands above do? {% solution “Solution” %}

The dir(module) command lists all names (not necessarily functions, not necessarily defined by us) contained in a given imported module. We have used the filter() command to filter out all names starting with two underscores. Every item returned by dir() is passed as x to the lambda function which returns True or False, determining whether the filter() function should keep or discard the current element.

{% endsolution %}

Run a module

We can make a Python module that can be easily imported by other Python programs, but we can also make it in a way that it can be run directly as a Python script.

Let’s write this special module and call it runnable.py:

#!/usr/bin/env python

long_format = False

def print_label(label, msg):
    if long_format:
        out = '{0}: {1}'.format(label.upper(), str(msg))
    else:
        out = '{0}-{1}'.format(label[0].upper(), str(msg))
    print out

def debug(msg):
    print_label('debug', msg)

def warning(msg):
    print_label('warning', msg)

if __name__ == '__main__':
    print '*** Testing print functions ***'
    debug('This is a debug message')
    long_format = True
    warning('This is a warning message with a long label')
else:
    print 'Module {0} is being imported'.format(__name__)

Now let’s make it executable:

$ chmod +x runnable.py

It can be now run as a normal executable from your shell:

$ ./runnable.py
*** Testing print functions ***
D-This is a debug message
WARNING: This is a warning message with a long label

There are two outstanding notions here. First off, the first line is a “shebang”: it really has to be the first line in a file (it cannot be the second, or “one of the first”, or the first non-empty) and it basically tells your shell that your executable text file has to be interpreted by the current Python interpreter. Just use this line as it is.

Secondly, we notice we have a peculiar if condition with a block that gets executed when we run the file. __name__ is a special internal Python variable which is set to the module name in case the module is imported. When the module is ran, it is set to the special value "__main__".

The else: condition we have added is just to show what happens when you import the module instead:

>>> import runnable
Module runnable is being imported
>>> runnable.warning('hey I can use it from here too')
W-hey I can use it from here too

Now, the if condition is not necessary when you want to run the module - those lines in the if block will be executed anyway. It is however used to prevent some lines from being executed when you import the file as a module.

Please also note that module imports are typically silent, so the else: condition with a printout would not exist in real life.