Skip to main content

2.8 Orthogonal Projections, projection onto line and general subspaces


Projections are an important class of linear transformations (besides rotations and reflections) and play an important role in graphics, coding theory, statistics and machine learning. In machine learning, we often deal with data that is high-dimensional. High-dimensional data is often hard to analyze or visualize. However, high-dimensional data quite often possesses the property that only a few dimensions contain most information, and most other dimensions are not essential to describe key properties of the data. When we compress or visualize high-dimensional data, we will lose information. To minimize this compression loss, we ideally find the most informative dimensions in the data.More specifically, we can project the original high-dimensional data onto a lower-dimensional feature space and work in this lower-dimensional space to learn more about the dataset and extract relevant patterns.

Machine learning algorithms, such as principal component analysis (PCA) by Pearson (1901) and Hotelling (1933) and deep neural networks (e.g., deep auto-encoders (Deng et al., 2010)), heavily exploit the idea of dimensionality reduction.For a given lower-dimensional subspace, orthogonal projections of high-dimensional data retain as much information as possible and minimize the difference/error between the original data and the corresponding projection. An illustration of such an orthogonal projection is given in Figure.

Definition: Let  $V$ be a vector space and $U \subseteq V$ a subspace of $V$.A linear mapping $\pi:V \to U$ is called a projection if $\pi^2= \pi \circ \pi = \pi$

Since linear mapping can be expressed by transformation matrices, the definition applies equally to a special kind of transformation matrices, the projection matrices $P_\pi$, which exhibit the property that $P^2_\pi=P$
Projection onto a one dimensional subspaces(Lines)
Assume we are given a line (one-dimensional subspace) through the origin with basis vector $b \in \mathbb{R}^n$. The line is a one-dimensional subspace $U \subseteq \mathbb{R}^n$ spanned by $b$. When we project  $x \in \mathbb{R}^n$ onto $U$, we seek the vector $\pi_U(x) \in U$ that is closest to $x$. 

Using geometric arguments
  • The projection $\pi_U(x)$ is closest to $x$, implies that $\left \| x -\pi_U(x) \right \|$ is minimal.It follows that $\pi_U(x)-x$ is orthogonal to $b$.This means that their dot product is zero $<\pi_U(x)-x,b>=0$
  • The projection $\pi_U(x)$ of $x$ onto $U$ must be an element of $U$ and ,therefore, a multiple of basis vector $b$ that spans $U$. Hence $\pi_U(x)=\lambda b$ for some $\lambda \in \mathbb{R}$
The following three steps will find out projection coordinate $\lambda$, the projection $\pi_u(x) \in U$ and the Projection matrix $P_\pi$ 

1.Lets find out the coordinate $\lambda$

$<x-\pi_U(x),b> = 0 \Leftrightarrow <x-\lambda b,b> = 0$
$<x,b>=\lambda <b,b> $
$\lambda= \frac{<x,b>}{<b,b>} $
$\lambda= \frac{<b,x>}{<b,b>}$
$\lambda= \frac{b^Tx}{b^Tb} $
$\lambda= \frac{b^Tx}{\left \| b \right \|^2}$
if $\left \| b \right \|=1$, then the coordinate $\lambda$ of the projection is given by $b^Tx$.

2.Finding the projection point $\pi_U(x) \in U$.Since $\pi_U(x)=\lambda b$, we immediately obtain that
$\pi_U(x)=\lambda b = \frac{<x,b>}{\left \| b \right\|^2}b = \frac{b^T x}{\left \| b \right \|^2}b$
If we use dot product as the inner product
$\left \| \pi_U(x)) \right \|=\frac{| b^Tx|}{{\left \| b \right \|^2}}\left \| b \right \|=|cos \omega| \left \| x \right \| \left \| b \right \|\frac{\left \| b \right \|}{\left \| b \right \|^2}=|cos \omega|\left \| x\right \|$

Here $\omega$ is the angle between $x$ and $b$.

3.There exist a projection matrix $P_\pi$, such that $\pi_U(x)=P_\pi x$$\pi_U(x)=\lambda b=b \lambda = b\frac{b^Tx}{\left \| b \right\|^2}=\frac{b.b^T}{\left \| b \right\|^2}x$
$P_\pi=\frac{bb^T}{\left \| b \right\|^2}$

Note that $bb^T$ (and, consequently, $P_\pi$ Projection matrices ) is a symmetric matrix (of rank 1)
and $\left \| b \right\|^2$ is a scalar.The projection matrix $P_\pi$ projects any vector $x \in \mathbb{R}^n$ onto the line through the origin with direction $b$ (equivalently, the subspace $U$ spanned by $b$).
Remark. The projection $\pi_U(x) \in  \mathbb{R}^n$ is still an $n$-dimensional vector and not a scalar. However, we no longer require $n$ coordinates to represent the projection, but only a single one if we want to express it with respect to the basis vector b that spans the subspace $U: \lambda$.

Remark:$\pi_U(x)$ is an eigen vector of  the projection matrix $P_\pi$

Example

Projecting a point [2,3] onto a line passing through origin spanned by  [1,1]
import numpy as np
x=np.array([2,3])
b=np.array([1,1])
print("Projection Matrix b.bT/bT.b")
P=np.outer(b,b)/np.dot(b,b)
print(P)
print("Projection Point")
xp=P.dot(x)
print(xp)

O/P

Projection Matrix b.bT/bT.b
 [[0.5 0.5]
 [0.5 0.5]]
 Projection Point 
[2.5 2.5]

Projection of point [1,1,1] onto line through origin spanned by [1,2,2]
import numpy as np
x=np.array([1,1,1])
b=np.array([1,2,2])
print("Projection Matrix b.bT/bT.b")
P=np.outer(b,b)/np.dot(b,b)
print(P)
print("Projection Point")
xp=P.dot(x)
print(xp)
print("applying projection again will not change anything")
xp=P.dot(x)
print(xp)
print("Squaring the projection matrix P will give P")
print(P.dot(P))

O/P

Projection Matrix b.bT/bT.b 
[[0.11111111 0.22222222 0.22222222] 
 [0.22222222 0.44444444 0.44444444] 
 [0.22222222 0.44444444 0.44444444]] 
Projection Point 
[0.55555556 1.11111111 1.11111111] 
applying projection again will not change anything 
[0.55555556 1.11111111 1.11111111] 
Squaring the projection matrix P 
[[0.11111111 0.22222222 0.22222222]
 [0.22222222 0.44444444 0.44444444] 
 [0.22222222 0.44444444 0.44444444]]

Projection on to general subspace
Let us consider orthogonal projections of vectors $x \in \mathbb{R}^n$ onto lower-dimensional subspaces $U \in  \mathbb{R}^n$ with $dim(U) = m \ge 1$.



Assume that $(b_1,\ldots,b_m)$ is an orderd basis of $U$. Any projection $\pi_U(x)$ onto $U$ is necessarily an element of $U$. Therefore, they can be represented as a linear combination of the basis vectors $b_1,\ldots,b_n$ of $U$, such that $\pi_U(x)=\sum_{i=1}^{m}\lambda_ib_i$
1.Let us find the coordinates $\lambda_1,\ldots,\lambda_m$ of the projections with respect to $U$
$\pi_U(x)=\sum_{i=1}^{m}\lambda_ib_i=B\lambda$
$B=[b_1,\ldots,b_m] \in \mathbb{R}^{n \times m}$,$\lambda=[\lambda_1,\ldots,\lambda_m]^T \in \mathbb{R}^m$
It is noted that the vector connecting $x \in \mathbb{R}^n$ and $\pi_U(x)  \in U$ is orthogonal to all the basis vectors of $U$.Therefore we obtain $m$ simultaneous conditions.
$<b_1,x-\pi_U(x)> \quad = b_1^T(x-\pi_U(x))=0 $
$ \vdots $
$<b_m,x-\pi_U(x)> \quad = b_m^T(x-\pi_U(x))=0 $
since $\pi_U(x)=B\lambda $
$b_1^T(x-B\lambda)=0$
$ \vdots $
$ b_m^T(x-B \lambda)=0 $

So we get $m$ homogeneous equations
$\begin{bmatrix}
b_1\\
\vdots\\
b_m
\end{bmatrix}\begin{bmatrix}
x-B\lambda
\end{bmatrix}=0$

$B^T(x-B\lambda)=0$
$B^Tx=B^TB\lambda$

The last expression is called normal equation. Since $b_1,\ldots,b_m$ normal equation are a basis of $U$ and, therefore, linearly independent, $B^TB \in \mathbb{R}^m$ is regular and can be inverted. This allows us to solve for the coefficients/coordinates
$\lambda=(B^TB)^{-1}B^Tx$
The matrix $\lambda=(B^TB)^{-1}B^T$ is also called the pseudo-inverse of $B$, which can be computed for the non square matrices $B$.It only requires that $B^TB$ is positive definite, which is the case if $B$ is full rank.

2.Find the projection $\pi_U(x) \in U$ .We know that $\pi_U(x)=B\lambda$
$\pi_U(x)=B(B^TB)^{-1}B^Tx$

3.Find the Projection matrix $P_\pi$

$P_\pi=B(B^TB)^{-1}B^T$

Note: The projection onto 1D is a special case ,if $dim(U)=1$, then $B^TB \in \mathbb{R}$ is a scalar and we can rewrite the projection matrix $P_\pi=B(B^TB)^{-1}B^T$ as $P_\pi=\frac{BB^T}{B^TB}$

Example
Projecting the point [6,0,0] onto the subspace spanned by [1,1,1] and [0,1,2]
import numpy as np
x=np.array([6,0,0])
B=np.array([[1,0],[1,1],[1,2]])
print("Projection Matrix B.(B.BT)^-1. BT")
P=np.dot(B.T,B)
P=np.linalg.inv(P)
P=P.dot(B.T)
P=B.dot(P)
print(P)
print("Projection Point")
xp=P.dot(x)
print(xp)
print("applying projection will not change anything")
xp=P.dot(x) 
print(xp)
print("Squaring the projection matrix P will give P")
print(P.dot(P))
o/p
Projection Matrix B.(B.BT)^-1. BT 
[[ 0.83333333 0.33333333 -0.16666667] 
 [ 0.33333333 0.33333333 0.33333333] 
 [-0.16666667 0.33333333 0.83333333]] 
Projection Point
 [ 5. 2. -1.] 
applying projection will not change anything 
[ 5. 2. -1.] 
Squaring the projection matrix P will give P 
[[ 0.83333333 0.33333333 -0.16666667] 
 [ 0.33333333 0.33333333 0.33333333]
 [-0.16666667 0.33333333 0.83333333]]








Remarks:The projections $\pi_U(x)$ are still vectors in $\mathbb{R}^n$ although they lie in an $m$-dimensional subspace $U \subseteq \mathbb{R}^n$. However, to represent a projected vector we only need the $m$ coordinates $\lambda_1,\ldots,\lambda_m$ with respect to the basis vectors $b_1,\ldots,b_m$ of $U$.

Projections allow us to look at situations where we have a linear system $Ax = b$ without a solution. Recall that this means that $b$ does not lie in the span of $A$, i.e., the vector $b$ does not lie in the subspace spanned by the columns of $A$. Given that the linear equation cannot be solved exactly,we can find an approximate solution. The idea is to find the vector in the subspace spanned by the columns of $A$ that is closest to $b$, i.e., we compute the orthogonal projection of $b$ onto the subspace spanned by the columns of $A$. This problem arises often in practice, and the solution is called the least-squares least-squares solution.The projection error is the norm of the difference vector between the original vector and its projections onto $U$.i.e., $\left \| x- \pi_U(x) \right \|$.This displacement vector is orthogonal to all basis vectors of $U$.If the basis vectors $\{b_1,\ldots,b_m\}$ is Orthonormal, then the projection equation is simplified to
$\pi_U(x)=BB^Tx$ since $B^TB=I$
The coordinates $\lambda$ become
$\lambda=B^Tx$
This means that we no longer have to compute the inverse $(B^TB)^{-1}$, which saves computation time.



Example- University Question
Find the projection of the vector $x=\begin{bmatrix} 6\\0\\0\\ \end{bmatrix}$ in $\mathbb{R}^3$ into the span of $\begin{bmatrix} 1\\1\\1\\ \end{bmatrix} \begin{bmatrix} 0\\1\\2\\ \end{bmatrix}$

The projection Matrix is 

$P_\pi=B(B^T.B)^{-1}B^T$
$=\begin{bmatrix} 1 & 0 \\ 1 & 1\\1 & 2 \end{bmatrix} \begin{bmatrix} 3 & 3\\ 3 & 5 \\ \end{bmatrix}^{-1} \begin{bmatrix} 1 & 1 & 1 \\0 & 1 & 2 \end{bmatrix}$

$=\begin{bmatrix} 1 & 0 \\ 1 & 1\\1 & 2 \end{bmatrix} \begin{bmatrix} 5/6 & -1/2\\ -1/2 & 1/2 \\ \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 \\0 & 1 & 2 \end{bmatrix}$

$=\begin{bmatrix} 1 & 0 \\ 1 & 1\\1 & 2 \end{bmatrix} \begin{bmatrix} 5/6 & 1/3 & -1/6\\ -1/2 & 0 & 1/2 \\ \end{bmatrix} $
$= \begin{bmatrix} 5/6 & 1/3 & -1/6\\ 1/3 & 1/3 & 1/3\\ -1/6 & 1/3 & 5/6 \end{bmatrix} $
The projection of the vector $x$ is
$\pi_U(x)=P_\pi . x$
$= \begin{bmatrix} 5/6 & 1/3 & -1/6\\ 1/3 & 1/3 & 1/3\\ -1/6 & 1/3 & 5/6 \end{bmatrix}\begin{bmatrix} 6 \\ 0 \\ 0  \end{bmatrix} $

$=\begin{bmatrix} 5 \\ 2 \\ -1 \end{bmatrix}$







Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let $p(x,y)$ is the joint distribution of the two random variables $x, y$. The distributions $p(x)$ and $p(y)$ are the corresponding marginal distributions, and $p(y |x)$ is the conditional distribution of $y$ given $x$. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: $P(A∪B)=P(A)+P(B)−P(A∩B)$ Suppose $A$ and $B$ are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  $P(A∩B)=0$  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...