Introduction to Machine Learning

This tutorial explores Machine Learning using scikit-learn and PyTorch for applications in high energy physics.

Extended from a version developed by Luke Polson for the 2020 USATLAS Computing Bootcamp.

This lesson leads directly into a lesson “Machine Learning on GPU” originally developed by Anna Scaife.



Machine learning is everywhere in modern “big-data” science. As physicists and big-data scientists, it’s a good idea to know a bit about machine learning.

The aim of this lesson is to:

The skills we’ll focus on:

  1. Understanding a bit about machine learning
  2. Preparing data for machine learning
  3. Training some machine learning models
  4. Comparing some machine learning models

The HSF Training Curriculum

HSF Logo
This training module is part of the Training Curriculum, a series of training modules that serves HEP newcomers the software skills needed as they enter the field, and in parallel, instill best practices for writing software.

Videos are provided at the top of each page to help guide you. For the sections without coding (Introduction, Mathematical Foundations, Neural Networks) the videos essentially take you through the text, so choose whichever way you learn best: video or reading. For the remaining sections, the videos take you through the coding live.


Setup Setup ready for the lesson
00:00 1. Introduction What is machine learning?
What role does machine learning have in high energy physics?
What should I do if I want to get good at machine learning?
00:15 2. Mathematical Foundations What is the common terminology in machine learning?
How does machine learning actually optimize?
Is there anything I should be careful of?
00:35 3. Neural Networks What is a neural network?
How can I visualize a neural network?
00:55 4. Comfort break! Get up, stretch out, take a short break.
01:10 5. Resources Where should I go if I want to get better at Python?
What are the machine learning libraries in Python?
01:20 6. Data Discussion What dataset is being used?
How do some of the variables look?
01:35 7. Data Preprocessing How must we organize our data such that it can be used in the machine learning libraries?
Are we ready for machine learning yet?!
01:50 8. Comfort break Water? Juice? Coffee? Tea?
02:05 9. Model Training How does one train machine learning models in Python?
What machine learning models might be appropriate?
02:35 10. Overfitting Check How do I check whether my model has overfitted?
02:45 11. Model Comparison How do you use the scikit-learn and PyTorch packages for machine learning?
How do I see whether my machine learning model is doing alright?
03:05 12. Applying To Experimental Data What about real, experimental data?
Are we there yet?
03:25 13. OPTIONAL: TensorFlow What other machine learning libraries can I use?
How do classifiers built with different libraries compare?
03:45 14. OPTIONAL: different dataset What other datasets can I use?
How do classifiers perform on different datasets?
04:25 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.