2.8 Orthogonal Projections, projection onto line and general subspaces

Projections are an important class of linear transformations (besides rotations and reflections) and play an important role in graphics, coding theory, statistics and machine learning. In machine learning, we often deal with data that is high-dimensional. High-dimensional data is often hard to analyze or visualize. However, high-dimensional data quite often possesses the property that only a few dimensions contain most information, and most other dimensions are not essential to describe key properties of the data. When we compress or visualize high-dimensional data, we will lose information. To minimize this compression loss, we ideally find the most informative dimensions in the data.More specifically, we can project the original high-dimensional data onto a lower-dimensional feature space and work in this lower-dimensional space to learn more about the dataset and extract relevant patterns.

Machine learning algorithms, such as principal component analysis (PCA) by Pearson (1901) and Hotelling (1933) and deep neural networks (e.g., deep auto-encoders (Deng et al., 2010)), heavily exploit the idea of dimensionality reduction.For a given lower-dimensional subspace, orthogonal projections of high-dimensional data retain as much information as possible and minimize the difference/error between the original data and the corresponding projection. An illustration of such an orthogonal projection is given in Figure.

Definition: Let $V$ be a vector space and $U \subseteq V$ a subspace of $V$.A linear mapping $\pi:V \to U$ is called a projection if $\pi^2= \pi \circ \pi = \pi$

Since linear mapping can be expressed by transformation matrices, the definition applies equally to a special kind of transformation matrices, the projection matrices $P_\pi$, which exhibit the property that $P^2_\pi=P$

Projection onto a one dimensional subspaces(Lines)

Assume we are given a line (one-dimensional subspace) through the origin with basis vector $b \in \mathbb{R}^n$. The line is a one-dimensional subspace $U \subseteq \mathbb{R}^n$ spanned by $b$. When we project $x \in \mathbb{R}^n$ onto $U$, we seek the vector $\pi_U(x) \in U$ that is closest to $x$.

Using geometric arguments

The projection $\pi_U(x)$ is closest to $x$, implies that $\left \| x -\pi_U(x) \right \|$ is minimal.It follows that $\pi_U(x)-x$ is orthogonal to $b$.This means that their dot product is zero $<\pi_U(x)-x,b>=0$
The projection $\pi_U(x)$ of $x$ onto $U$ must be an element of $U$ and ,therefore, a multiple of basis vector $b$ that spans $U$. Hence $\pi_U(x)=\lambda b$ for some $\lambda \in \mathbb{R}$

The following three steps will find out projection coordinate $\lambda$, the projection $\pi_u(x) \in U$ and the Projection matrix $P_\pi$

1.Lets find out the coordinate $\lambda$

$<x-\pi_U(x),b> = 0 \Leftrightarrow <x-\lambda b,b> = 0$

$<x,b>=\lambda <b,b> $

$\lambda= \frac{<x,b>}{<b,b>} $

$\lambda= \frac{<b,x>}{<b,b>}$

$\lambda= \frac{b^Tx}{b^Tb} $

$\lambda= \frac{b^Tx}{\left \| b \right \|^2}$

if $\left \| b \right \|=1$, then the coordinate $\lambda$ of the projection is given by $b^Tx$.

2.Finding the projection point $\pi_U(x) \in U$.Since $\pi_U(x)=\lambda b$, we immediately obtain that

$\pi_U(x)=\lambda b = \frac{<x,b>}{\left \| b \right\|^2}b = \frac{b^T x}{\left \| b \right \|^2}b$
If we use dot product as the inner product
$\left \| \pi_U(x)) \right \|=\frac{| b^Tx|}{{\left \| b \right \|^2}}\left \| b \right \|=|cos \omega| \left \| x \right \| \left \| b \right \|\frac{\left \| b \right \|}{\left \| b \right \|^2}=|cos \omega|\left \| x\right \|$

Here $\omega$ is the angle between $x$ and $b$.

3.There exist a projection matrix $P_\pi$, such that $\pi_U(x)=P_\pi x$$\pi_U(x)=\lambda b=b \lambda = b\frac{b^Tx}{\left \| b \right\|^2}=\frac{b.b^T}{\left \| b \right\|^2}x$
$P_\pi=\frac{bb^T}{\left \| b \right\|^2}$

Note that $bb^T$ (and, consequently, $P_\pi$ Projection matrices ) is a symmetric matrix (of rank 1)

and $\left \| b \right\|^2$ is a scalar.The projection matrix $P_\pi$ projects any vector $x \in \mathbb{R}^n$ onto the line through the origin with direction $b$ (equivalently, the subspace $U$ spanned by $b$).

Remark. The projection $\pi_U(x) \in \mathbb{R}^n$ is still an $n$-dimensional vector and not a scalar. However, we no longer require $n$ coordinates to represent the projection, but only a single one if we want to express it with respect to the basis vector b that spans the subspace $U: \lambda$.

Remark:$\pi_U(x)$ is an eigen vector of the projection matrix $P_\pi$

Example

Projecting a point [2,3] onto a line passing through origin spanned by [1,1]

import numpy as np

x=np.array([2,3])

b=np.array([1,1])

print("Projection Matrix b.bT/bT.b")

P=np.outer(b,b)/np.dot(b,b)

print(P)

print("Projection Point")

xp=P.dot(x)

print(xp)

O/P

Projection Matrix b.bT/bT.b

[[0.5 0.5]

[0.5 0.5]]

Projection Point

[2.5 2.5]

Projection of point [1,1,1] onto line through origin spanned by [1,2,2]

import numpy as np

x=np.array([1,1,1])

b=np.array([1,2,2])

print("Projection Matrix b.bT/bT.b")

P=np.outer(b,b)/np.dot(b,b)

print(P)

print("Projection Point")

xp=P.dot(x)

print(xp)

print("applying projection again will not change anything")

xp=P.dot(x)

print(xp)

print("Squaring the projection matrix P will give P")

print(P.dot(P))

O/P

Projection Matrix b.bT/bT.b

[[0.11111111 0.22222222 0.22222222]

[0.22222222 0.44444444 0.44444444]

[0.22222222 0.44444444 0.44444444]]

Projection Point

[0.55555556 1.11111111 1.11111111]

applying projection again will not change anything

[0.55555556 1.11111111 1.11111111]

Squaring the projection matrix P

[[0.11111111 0.22222222 0.22222222]

[0.22222222 0.44444444 0.44444444]

[0.22222222 0.44444444 0.44444444]]

Projection on to general subspace

Let us consider orthogonal projections of vectors $x \in \mathbb{R}^n$ onto lower-dimensional subspaces $U \in \mathbb{R}^n$ with $dim(U) = m \ge 1$.

Assume that $(b_1,\ldots,b_m)$ is an orderd basis of $U$. Any projection $\pi_U(x)$ onto $U$ is necessarily an element of $U$. Therefore, they can be represented as a linear combination of the basis vectors $b_1,\ldots,b_n$ of $U$, such that $\pi_U(x)=\sum_{i=1}^{m}\lambda_ib_i$

1.Let us find the coordinates $\lambda_1,\ldots,\lambda_m$ of the projections with respect to $U$

$\pi_U(x)=\sum_{i=1}^{m}\lambda_ib_i=B\lambda$

$B=[b_1,\ldots,b_m] \in \mathbb{R}^{n \times m}$,$\lambda=[\lambda_1,\ldots,\lambda_m]^T \in \mathbb{R}^m$

It is noted that the vector connecting $x \in \mathbb{R}^n$ and $\pi_U(x) \in U$ is orthogonal to all the basis vectors of $U$.Therefore we obtain $m$ simultaneous conditions.

$<b_1,x-\pi_U(x)> \quad = b_1^T(x-\pi_U(x))=0 $
$ \vdots $
$<b_m,x-\pi_U(x)> \quad = b_m^T(x-\pi_U(x))=0 $
since $\pi_U(x)=B\lambda $
$b_1^T(x-B\lambda)=0$
$ \vdots $
$ b_m^T(x-B \lambda)=0 $

So we get $m$ homogeneous equations

$\begin{bmatrix}
b_1\\
\vdots\\
b_m
\end{bmatrix}\begin{bmatrix}
x-B\lambda
\end{bmatrix}=0$

$B^T(x-B\lambda)=0$

$B^Tx=B^TB\lambda$

The last expression is called normal equation. Since $b_1,\ldots,b_m$ normal equation are a basis of $U$ and, therefore, linearly independent, $B^TB \in \mathbb{R}^m$ is regular and can be inverted. This allows us to solve for the coefficients/coordinates

$\lambda=(B^TB)^{-1}B^Tx$

The matrix $\lambda=(B^TB)^{-1}B^T$ is also called the pseudo-inverse of $B$, which can be computed for the non square matrices $B$.It only requires that $B^TB$ is positive definite, which is the case if $B$ is full rank.

2.Find the projection $\pi_U(x) \in U$ .We know that $\pi_U(x)=B\lambda$

$\pi_U(x)=B(B^TB)^{-1}B^Tx$

3.Find the Projection matrix $P_\pi$

$P_\pi=B(B^TB)^{-1}B^T$

Note: The projection onto 1D is a special case ,if $dim(U)=1$, then $B^TB \in \mathbb{R}$ is a scalar and we can rewrite the projection matrix $P_\pi=B(B^TB)^{-1}B^T$ as $P_\pi=\frac{BB^T}{B^TB}$

Example

Projecting the point [6,0,0] onto the subspace spanned by [1,1,1] and [0,1,2]

import numpy as np
x=np.array([6,0,0])
B=np.array([[1,0],[1,1],[1,2]])
print("Projection Matrix B.(B.BT)^-1. BT")
P=np.dot(B.T,B)
P=np.linalg.inv(P)
P=P.dot(B.T)
P=B.dot(P)
print(P)
print("Projection Point")
xp=P.dot(x)

print(xp)
print("applying projection will not change anything")
xp=P.dot(x)

print(xp)
print("Squaring the projection matrix P will give P")
print(P.dot(P))

o/p
Projection Matrix B.(B.BT)^-1. BT

[[ 0.83333333 0.33333333 -0.16666667]

[ 0.33333333 0.33333333 0.33333333]

[-0.16666667 0.33333333 0.83333333]]

Projection Point

[ 5. 2. -1.]

applying projection will not change anything

[ 5. 2. -1.]

Squaring the projection matrix P will give P

[[ 0.83333333 0.33333333 -0.16666667]

[ 0.33333333 0.33333333 0.33333333]

[-0.16666667 0.33333333 0.83333333]]

Remarks:The projections $\pi_U(x)$ are still vectors in $\mathbb{R}^n$ although they lie in an $m$-dimensional subspace $U \subseteq \mathbb{R}^n$. However, to represent a projected vector we only need the $m$ coordinates $\lambda_1,\ldots,\lambda_m$ with respect to the basis vectors $b_1,\ldots,b_m$ of $U$.

Projections allow us to look at situations where we have a linear system $Ax = b$ without a solution. Recall that this means that $b$ does not lie in the span of $A$, i.e., the vector $b$ does not lie in the subspace spanned by the columns of $A$. Given that the linear equation cannot be solved exactly,we can find an approximate solution. The idea is to find the vector in the subspace spanned by the columns of $A$ that is closest to $b$, i.e., we compute the orthogonal projection of $b$ onto the subspace spanned by the columns of $A$. This problem arises often in practice, and the solution is called the least-squares least-squares solution.The projection error is the norm of the difference vector between the original vector and its projections onto $U$.i.e., $\left \| x- \pi_U(x) \right \|$.This displacement vector is orthogonal to all basis vectors of $U$.If the basis vectors $\{b_1,\ldots,b_m\}$ is Orthonormal, then the projection equation is simplified to

$\pi_U(x)=BB^Tx$ since $B^TB=I$

The coordinates $\lambda$ become

$\lambda=B^Tx$

This means that we no longer have to compute the inverse $(B^TB)^{-1}$, which saves computation time.

Example- University Question

Find the projection of the vector $x=\begin{bmatrix} 6\\0\\0\\ \end{bmatrix}$ in $\mathbb{R}^3$ into the span of $\begin{bmatrix} 1\\1\\1\\ \end{bmatrix} \begin{bmatrix} 0\\1\\2\\ \end{bmatrix}$

The projection Matrix is

$P_\pi=B(B^T.B)^{-1}B^T$

$=\begin{bmatrix} 1 & 0 \\ 1 & 1\\1 & 2 \end{bmatrix} \begin{bmatrix} 3 & 3\\ 3 & 5 \\ \end{bmatrix}^{-1} \begin{bmatrix} 1 & 1 & 1 \\0 & 1 & 2 \end{bmatrix}$

$=\begin{bmatrix} 1 & 0 \\ 1 & 1\\1 & 2 \end{bmatrix} \begin{bmatrix} 5/6 & -1/2\\ -1/2 & 1/2 \\ \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 \\0 & 1 & 2 \end{bmatrix}$

$=\begin{bmatrix} 1 & 0 \\ 1 & 1\\1 & 2 \end{bmatrix} \begin{bmatrix} 5/6 & 1/3 & -1/6\\ -1/2 & 0 & 1/2 \\ \end{bmatrix} $

$= \begin{bmatrix} 5/6 & 1/3 & -1/6\\ 1/3 & 1/3 & 1/3\\ -1/6 & 1/3 & 5/6 \end{bmatrix} $

The projection of the vector $x$ is

$\pi_U(x)=P_\pi . x$

$= \begin{bmatrix} 5/6 & 1/3 & -1/6\\ 1/3 & 1/3 & 1/3\\ -1/6 & 1/3 & 5/6 \end{bmatrix}\begin{bmatrix} 6 \\ 0 \\ 0 \end{bmatrix} $

$=\begin{bmatrix} 5 \\ 2 \\ -1 \end{bmatrix}$

Mathematics for Machine Learning with Python- CST284 KTU Minor - Dr. Binu V P -9847390760

Search This Blog

2.8 Orthogonal Projections, projection onto line and general subspaces

Comments

Post a Comment

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

1.1 Solving system of equations using Gauss Elimination Method

4.3 Sum Rule, Product Rule, and Bayes’ Theorem