Mathematics for Machine Learning
Exercise 1 : Basic vector operations
Let
Compute:
- ⟨u, v⟩
- \( \| \mathbf{u} \|_2 \) and \( \| \mathbf{v} \|_2 \)
- Projection of u onto v.
Exercise 2 : Linear system
Solve:
$$ A = \begin{pmatrix} 2 & -1 & 0 \\ 1 & 1 & 1 \\ 0 & 2 & -1 \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 1 \\ 3 \\ -1 \end{pmatrix}, \quad A\mathbf{x} = \mathbf{b}. $$Exercise 3 : Data interpretation
Given points (1,2), (2,3), (3,5), form the design matrix X for
and write the normal equations for least squares.
Exercise 4 : Partial derivatives
For
compute \( \frac{\partial f}{\partial x} \) and \( \frac{\partial f}{\partial y} \).
Exercise 5 : Gradient and Hessian
Let
compute \( \nabla f(\mathbf{x}) \), \( H(f) \), and state when \( f \) is convex.
Exercise 6 : Chain rule
Let
compute \( \nabla g(\mathbf{x}) \).
Exercise 7 : One-dimensional GD
For
starting from \( x_0 = 0 \), apply gradient descent with \( \eta = 0.1 \) for three iterations.
Exercise 8 : Quadratic form
For
find the largest \( \eta \) guaranteeing convergence.
Exercise 9 : Least squares
For
derive \( \nabla_w J(w) \) and write one gradient descent update.
Exercise 10 : Ridge closed form
Show that
Why does \( \lambda I \) improve conditioning?
Exercise 11 : L1 vs L2 regularization
Explain why L1 produces sparse models while L2 does not.
Exercise 12 : Regularized gradient
For
compute \( \nabla_w J_\lambda(w) \).
Exercise 13 : Eigenpairs & PCA intuition
Given
explain why eigenvectors with largest eigenvalues are principal components, and find the projection of a point x.
Exercise 14 : Gradient descent rate
For
show that gradient descent with
converges linearly with rate involving