The goal of this lecture is to understand the core principles and architectures of deep learning models. We will learn how deep learning extends traditional machine learning through multi-layer neural networks, explore the mathematical foundation behind neural computations, and understand how training and optimization methods such as gradient descent enable models to learn complex patterns from data.
Objectives
Describe how Deep Learning (DL) fits within the broader field of Machine Learning (ML) and Artificial Intelligence (AI).
Explain the structure and mathematical operations of a neural network, including layers, weights, biases, and activations.
Distinguish between linear and non-linear models, and understand how non-linearity enables complex decision boundaries.
Apply concepts of activation functions such as sigmoid, tanh, and ReLU, and interpret their mathematical properties.
Understand the training process of deep neural networks using gradient descent and backpropagation.
Discuss common problems such as overfitting, vanishing gradients, and how regularization and normalization techniques mitigate them.
Identify the structure and purpose of key architectures including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers.
Reading materials
CS231n: Convolutional Neural Networks for Visual Recognition – Stanford University
Goodfellow, Bengio & Courville (2016) – Deep Learning, MIT Press, Chapters 6–8
Hung-Yi Lee – Deep Learning Tutorial
Ismini Lourentzou – Introduction to Deep Learning
Sebastian Ruder (2016) – An Overview of Gradient Descent Optimization Algorithms
4.1 Machine Learning Basics
Machine learning enables computers to learn patterns from data without explicit programming. The goal of an ML model is to find a mapping function:
Machine learning aims to learn a function \(f_\theta\) that maps inputs \(x\) to outputs \(y\):
\[
f_\theta : X \rightarrow Y,
\]
where \(\theta\) represents model parameters (weights and biases).
The learning objective is to minimize the expected loss: