Introduction to Machine Learning: Glossary

Key Points

  • The 3 main tasks of Machine Learning are regression, classification and generation.

  • Machine learning has many applications in high energy physics.

  • If you want to become proficient in machine learning, you need to practice.

Mathematical Foundations
  • In a particular machine learning problem, one needs an adequate dataset, a reasonable model, and a corresponding loss function. The choice of model and loss function needs to depend on the dataset.

  • Gradient descent is a procedure used to optimize a loss function corresponding to a specific model and dataset.

  • Beware of overfitting!

Neural Networks
  • Neural networks consist of an input layer, hidden layers and an output layer.

  • TensorFlow Playground is a cool place to visualize neural networks!

Comfort break!
  • You’ll be back.

  • They’re the jedi of the sea.

  • NumPy and pandas are the main libraries for scientific computing.

  • scikit-learn and PyTorch are two good options for machine learning in Python.

Data Discussion
  • One must know a bit about your data before any machine learning takes place.

  • It’s a good idea to visualise your data before any machine learning takes place.

Data Preprocessing
  • One must properly format data before any machine learning takes place.

  • Data can be formatted using scikit-learn functionality; using it effectively may take time to master.

Comfort break
  • Breaks are helpful in the service of learning

Model Training
  • Random forests and neural networks are two viable machine learning models.

Overfitting Check
  • It’s a good idea to check your models for overfitting.

Model Comparison
  • Many metrics exist to assess classifier performance.

  • Making plots is useful to assess classifier performance.

Applying To Experimental Data
  • It’s a good idea to check whether our machine learning models behave well with real experimental data.

  • That’s it!

OPTIONAL: TensorFlow
  • TensorFlow is another good option for machine learning in Python.

OPTIONAL: different dataset
  • The same algorithm can do fairly well across different datasets.

  • Different optimisation is needed for different datasets.