Introduction to Machine Learning

This tutorial explores Machine Learning using scikit-learn and PyTorch for applications in high energy physics.

Extended from a version developed by Luke Polson for the 2020 USATLAS Computing Bootcamp.

This lesson leads directly into a lesson “Machine Learning on GPU” originally developed by Anna Scaife.

Prerequisites

A Kaggle account. Click here to create an account as described in the Setup page

Basic Python knowledge, e.g. through the Software Carpentry Programming with Python lesson

Introduction

Machine learning is everywhere in modern “big-data” science. As physicists and big-data scientists, it’s a good idea to know a bit about machine learning.

The aim of this lesson is to:

explore what it means to build a machine learning model
expand on concepts in machine learning that are essential to anyone working in big-data science

The skills we’ll focus on:

Understanding a bit about machine learning

Preparing data for machine learning

Training some machine learning models

Comparing some machine learning models

The HSF Training Curriculum

This training module is part of the Training Curriculum, a series of training modules that serves HEP newcomers the software skills needed as they enter the field, and in parallel, instill best practices for writing software.

Videos are provided at the top of each page to help guide you. For the sections without coding (Introduction, Mathematical Foundations, Neural Networks) the videos essentially take you through the text, so choose whichever way you learn best: video or reading. For the remaining sections, the videos take you through the coding live.

Schedule

	Setup	Setup ready for the lesson
00:00	1. Introduction	What is machine learning? What role does machine learning have in high energy physics? What should I do if I want to get good at machine learning?
00:15	2. Mathematical Foundations	What is the common terminology in machine learning? How does machine learning actually optimize? Is there anything I should be careful of?
00:35	3. Neural Networks	What is a neural network? How can I visualize a neural network?
00:55	4. Comfort break!	Get up, stretch out, take a short break.
01:10	5. Resources	Where should I go if I want to get better at Python? What are the machine learning libraries in Python?
01:20	6. Data Discussion	What dataset is being used? How do some of the variables look?
01:35	7. Data Preprocessing	How must we organize our data such that it can be used in the machine learning libraries? Are we ready for machine learning yet?!
01:50	8. Comfort break	Water? Juice? Coffee? Tea?
02:05	9. Model Training	How does one train machine learning models in Python? What machine learning models might be appropriate?
02:35	10. Overfitting Check	How do I check whether my model has overfitted?
02:45	11. Model Comparison	How do you use the scikit-learn and PyTorch packages for machine learning? How do I see whether my machine learning model is doing alright?
03:05	12. Applying To Experimental Data	What about real, experimental data? Are we there yet?
03:25	13. OPTIONAL: TensorFlow	What other machine learning libraries can I use? How do classifiers built with different libraries compare?
03:45	14. OPTIONAL: different dataset	What other datasets can I use? How do classifiers perform on different datasets?
04:25	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.