Resources

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Where should I go if I want to get better at Python?

  • What are the machine learning libraries in Python?

Objectives
  • Provide links to tutorials and textbooks that will help you get better at Python.

  • Provide links to machine learning library documentation.

Proficiency in Python

If you are unfamiliar with Python, the following tutorials will be useful:

For non-trivial machine learning tasks that occur in research, one needs to be proficient in the programming libraries discussed in the tutorial here. There are two main python libraries for scientific computing:

  1. NumPy: the go-to numerical library in Python. See the documentation. NumPy’s main purpose is the manipulation of multi-dimensional arrays: this includes both

    • slicing: taking “chunks” out of arrays. Slicing in Python means taking elements from one given index to another given index. For 1 dimensional arrays this reduces to selecting intervals, but these operations can become quite advanced for multidimesional arrays.
    • functional operations: applying a function to an entire array. This code is highly optimized: there is often a myth that Python is slower than languages like C++; while this may be true for things like for-loops, it is not true if you use NumPy properly.
  2. pandas: pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. See the documentation. The most important datatype in the pandas library is the DataFrame: a “spreadsheet-type object” with row and column names. It is preferred to use pandas DataFrames rather than NumPy arrays for managing data sets.

If you are unfamiliar with these packages, I would recommend reading the introduction to the documentation pages for NumPy slicing, NumPy operations and pandas DataFrames or sitting down with this textbook if you have it available and reading/coding along with chapters 4 and 5. In a few hours, you should have a good idea of how these packages work.

Machine Learning Libraries in Python

There are many machine libraries in Python, but the two discussed in this tutorial are scikit-learn and PyTorch.

  1. scikit-learn: features various classification, regression and clustering algorithms and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. See the documentation.
  2. PyTorch: PyTorch is an end-to-end open-source platform for machine learning. It is used for building and deploying machine learning models. See the documentation.

What about GPUs?

scikit-learn doesn’t have GPU support, therefore should only be used for training simple neural networks.

PyTorch does have GPU support, therefore can be used to train complicated neural network models that require a lot of GPU power.

Take a look at our tutorial “Machine Learning on GPUs” if you’re interested.

Note that the four Python programming packages discussed so far are interoperable: in particular, datatypes from NumPy and pandas are often used in packages like scikit-learn and PyTorch.

Your feedback is very welcome! Most helpful for us is if you “Improve this page on GitHub”. If you prefer anonymous feedback, please fill this form.

Key Points

  • NumPy and pandas are the main libraries for scientific computing.

  • scikit-learn and PyTorch are two good options for machine learning in Python.