Mathematics for Machine Learning

Exercise 1 : Basic vector operations

Let

$$ \mathbf{u} = (1,2,-1)^\top, \quad \mathbf{v} = (0,1,3)^\top. $$

Compute:

⟨u, v⟩
$ \| \mathbf{u} \|_2 $ and $ \| \mathbf{v} \|_2 $
Projection of u onto v.

Exercise 2 : Linear system

Solve:

$$ A = \begin{pmatrix} 2 & -1 & 0 \\ 1 & 1 & 1 \\ 0 & 2 & -1 \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 1 \\ 3 \\ -1 \end{pmatrix}, \quad A\mathbf{x} = \mathbf{b}. $$

Exercise 3 : Data interpretation

Given points (1,2), (2,3), (3,5), form the design matrix X for

$$y = w_0 + w_1x$$

and write the normal equations for least squares.

Exercise 4 : Partial derivatives

For

$$ f(x,y) = 3x^2y - 4xy + \sin y, $$

compute $ \frac{\partial f}{\partial x} $ and $ \frac{\partial f}{\partial y} $.

Exercise 5 : Gradient and Hessian

Let

$$ f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top A\mathbf{x} - \mathbf{b}^\top\mathbf{x}, $$

compute $ \nabla f(\mathbf{x}) $, $ H(f) $, and state when $ f $ is convex.

Exercise 6 : Chain rule

Let

$$ g(\mathbf{x}) = \log(1 + e^{\mathbf{a}^\top \mathbf{x}}), $$

compute $ \nabla g(\mathbf{x}) $.

Exercise 7 : One-dimensional GD

For

$$ f(x) = x^2 - 4x + 5, $$

starting from $ x_0 = 0 $, apply gradient descent with $ \eta = 0.1 $ for three iterations.

Exercise 8 : Quadratic form

For

$$ f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix}\mathbf{x}, $$

find the largest $ \eta $ guaranteeing convergence.

Exercise 9 : Least squares

For

$$ J(w) = \frac{1}{2m}\| Xw - y \|_2^2, $$

derive $ \nabla_w J(w) $ and write one gradient descent update.

Exercise 10 : Ridge closed form

Show that

$$ w^\star = (X^\top X + \lambda I)^{-1} X^\top y. $$

Why does $ \lambda I $ improve conditioning?

Exercise 11 : L1 vs L2 regularization

Explain why L1 produces sparse models while L2 does not.

Exercise 12 : Regularized gradient

For

$$ J_\lambda(w) = \frac{1}{2m}\|Xw - y\|_2^2 + \frac{\lambda}{2}\|w\|_2^2, $$

compute $ \nabla_w J_\lambda(w) $.

Exercise 13 : Eigenpairs & PCA intuition

Given

$$ C = \frac{1}{m} X^\top X, $$

explain why eigenvectors with largest eigenvalues are principal components, and find the projection of a point x.

Exercise 14 : Gradient descent rate

For

$$ f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top A\mathbf{x} - \mathbf{b}^\top\mathbf{x}, $$

show that gradient descent with

$$ 0 < \eta < \frac{2}{\lambda_{\max}} $$

converges linearly with rate involving

$$ \kappa = \frac{\lambda_{\max}}{\lambda_{\min}}. $$

Mathematics for Machine Learning#

Exercise 1 : Basic vector operations#

Exercise 2 : Linear system#

Exercise 3 : Data interpretation#

Exercise 4 : Partial derivatives#

Exercise 5 : Gradient and Hessian#

Exercise 6 : Chain rule#

Exercise 7 : One-dimensional GD#

Exercise 8 : Quadratic form#

Exercise 9 : Least squares#

Exercise 10 : Ridge closed form#

Exercise 11 : L1 vs L2 regularization#

Exercise 12 : Regularized gradient#

Exercise 13 : Eigenpairs & PCA intuition#

Exercise 14 : Gradient descent rate#