Skip to main content

2.8 Orthogonal Projections, projection onto line and general subspaces


Projections are an important class of linear transformations (besides rotations and reflections) and play an important role in graphics, coding theory, statistics and machine learning. In machine learning, we often deal with data that is high-dimensional. High-dimensional data is often hard to analyze or visualize. However, high-dimensional data quite often possesses the property that only a few dimensions contain most information, and most other dimensions are not essential to describe key properties of the data. When we compress or visualize high-dimensional data, we will lose information. To minimize this compression loss, we ideally find the most informative dimensions in the data.More specifically, we can project the original high-dimensional data onto a lower-dimensional feature space and work in this lower-dimensional space to learn more about the dataset and extract relevant patterns.

Machine learning algorithms, such as principal component analysis (PCA) by Pearson (1901) and Hotelling (1933) and deep neural networks (e.g., deep auto-encoders (Deng et al., 2010)), heavily exploit the idea of dimensionality reduction.For a given lower-dimensional subspace, orthogonal projections of high-dimensional data retain as much information as possible and minimize the difference/error between the original data and the corresponding projection. An illustration of such an orthogonal projection is given in Figure.

Definition: Let  V be a vector space and UV a subspace of V.A linear mapping π:VU is called a projection if π2=ππ=π

Since linear mapping can be expressed by transformation matrices, the definition applies equally to a special kind of transformation matrices, the projection matrices Pπ, which exhibit the property that P2π=P
Projection onto a one dimensional subspaces(Lines)
Assume we are given a line (one-dimensional subspace) through the origin with basis vector bRn. The line is a one-dimensional subspace URn spanned by b. When we project  xRn onto U, we seek the vector πU(x)U that is closest to x

Using geometric arguments
  • The projection πU(x) is closest to x, implies that xπU(x) is minimal.It follows that πU(x)x is orthogonal to b.This means that their dot product is zero <πU(x)x,b>=0
  • The projection πU(x) of x onto U must be an element of U and ,therefore, a multiple of basis vector b that spans U. Hence πU(x)=λb for some λR
The following three steps will find out projection coordinate λ, the projection πu(x)U and the Projection matrix Pπ 

1.Lets find out the coordinate λ

<xπU(x),b>=0⇔<xλb,b>=0
<x,b>=λ<b,b>
λ=<x,b><b,b>
λ=<b,x><b,b>
λ=bTxbTb
λ=bTxb2
if b=1, then the coordinate λ of the projection is given by bTx.

2.Finding the projection point πU(x)U.Since πU(x)=λb, we immediately obtain that
πU(x)=λb=<x,b>b2b=bTxb2b
If we use dot product as the inner product
πU(x))=|bTx|b2b=|cosω|xbbb2=|cosω|x

Here ω is the angle between x and b.

3.There exist a projection matrix Pπ, such that πU(x)=PπxπU(x)=λb=bλ=bbTxb2=b.bTb2x
Pπ=bbTb2

Note that bbT (and, consequently, Pπ Projection matrices ) is a symmetric matrix (of rank 1)
and b2 is a scalar.The projection matrix Pπ projects any vector xRn onto the line through the origin with direction b (equivalently, the subspace U spanned by b).
Remark. The projection πU(x)Rn is still an n-dimensional vector and not a scalar. However, we no longer require n coordinates to represent the projection, but only a single one if we want to express it with respect to the basis vector b that spans the subspace U:λ.

Remark:πU(x) is an eigen vector of  the projection matrix Pπ

Example

Projecting a point [2,3] onto a line passing through origin spanned by  [1,1]
import numpy as np
x=np.array([2,3])
b=np.array([1,1])
print("Projection Matrix b.bT/bT.b")
P=np.outer(b,b)/np.dot(b,b)
print(P)
print("Projection Point")
xp=P.dot(x)
print(xp)

O/P

Projection Matrix b.bT/bT.b
 [[0.5 0.5]
 [0.5 0.5]]
 Projection Point 
[2.5 2.5]

Projection of point [1,1,1] onto line through origin spanned by [1,2,2]
import numpy as np
x=np.array([1,1,1])
b=np.array([1,2,2])
print("Projection Matrix b.bT/bT.b")
P=np.outer(b,b)/np.dot(b,b)
print(P)
print("Projection Point")
xp=P.dot(x)
print(xp)
print("applying projection again will not change anything")
xp=P.dot(x)
print(xp)
print("Squaring the projection matrix P will give P")
print(P.dot(P))

O/P

Projection Matrix b.bT/bT.b 
[[0.11111111 0.22222222 0.22222222] 
 [0.22222222 0.44444444 0.44444444] 
 [0.22222222 0.44444444 0.44444444]] 
Projection Point 
[0.55555556 1.11111111 1.11111111] 
applying projection again will not change anything 
[0.55555556 1.11111111 1.11111111] 
Squaring the projection matrix P 
[[0.11111111 0.22222222 0.22222222]
 [0.22222222 0.44444444 0.44444444] 
 [0.22222222 0.44444444 0.44444444]]

Projection on to general subspace
Let us consider orthogonal projections of vectors xRn onto lower-dimensional subspaces URn with dim(U)=m1.



Assume that (b1,,bm) is an orderd basis of U. Any projection πU(x) onto U is necessarily an element of U. Therefore, they can be represented as a linear combination of the basis vectors b1,,bn of U, such that πU(x)=mi=1λibi
1.Let us find the coordinates λ1,,λm of the projections with respect to U
πU(x)=mi=1λibi=Bλ
B=[b1,,bm]Rn×m,λ=[λ1,,λm]TRm
It is noted that the vector connecting xRn and πU(x)U is orthogonal to all the basis vectors of U.Therefore we obtain m simultaneous conditions.
<b1,xπU(x)>=bT1(xπU(x))=0

<bm,xπU(x)>=bTm(xπU(x))=0
since πU(x)=Bλ
bT1(xBλ)=0

bTm(xBλ)=0

So we get m homogeneous equations
[b1bm][xBλ]=0

BT(xBλ)=0
BTx=BTBλ

The last expression is called normal equation. Since b1,,bm normal equation are a basis of U and, therefore, linearly independent, BTBRm is regular and can be inverted. This allows us to solve for the coefficients/coordinates
λ=(BTB)1BTx
The matrix λ=(BTB)1BT is also called the pseudo-inverse of B, which can be computed for the non square matrices B.It only requires that BTB is positive definite, which is the case if B is full rank.

2.Find the projection πU(x)U .We know that πU(x)=Bλ
πU(x)=B(BTB)1BTx

3.Find the Projection matrix Pπ

Pπ=B(BTB)1BT

Note: The projection onto 1D is a special case ,if dim(U)=1, then BTBR is a scalar and we can rewrite the projection matrix Pπ=B(BTB)1BT as Pπ=BBTBTB

Example
Projecting the point [6,0,0] onto the subspace spanned by [1,1,1] and [0,1,2]
import numpy as np
x=np.array([6,0,0])
B=np.array([[1,0],[1,1],[1,2]])
print("Projection Matrix B.(B.BT)^-1. BT")
P=np.dot(B.T,B)
P=np.linalg.inv(P)
P=P.dot(B.T)
P=B.dot(P)
print(P)
print("Projection Point")
xp=P.dot(x)
print(xp)
print("applying projection will not change anything")
xp=P.dot(x) 
print(xp)
print("Squaring the projection matrix P will give P")
print(P.dot(P))
o/p
Projection Matrix B.(B.BT)^-1. BT 
[[ 0.83333333 0.33333333 -0.16666667] 
 [ 0.33333333 0.33333333 0.33333333] 
 [-0.16666667 0.33333333 0.83333333]] 
Projection Point
 [ 5. 2. -1.] 
applying projection will not change anything 
[ 5. 2. -1.] 
Squaring the projection matrix P will give P 
[[ 0.83333333 0.33333333 -0.16666667] 
 [ 0.33333333 0.33333333 0.33333333]
 [-0.16666667 0.33333333 0.83333333]]








Remarks:The projections πU(x) are still vectors in Rn although they lie in an m-dimensional subspace URn. However, to represent a projected vector we only need the m coordinates λ1,,λm with respect to the basis vectors b1,,bm of U.

Projections allow us to look at situations where we have a linear system Ax=b without a solution. Recall that this means that b does not lie in the span of A, i.e., the vector b does not lie in the subspace spanned by the columns of A. Given that the linear equation cannot be solved exactly,we can find an approximate solution. The idea is to find the vector in the subspace spanned by the columns of A that is closest to b, i.e., we compute the orthogonal projection of b onto the subspace spanned by the columns of A. This problem arises often in practice, and the solution is called the least-squares least-squares solution.The projection error is the norm of the difference vector between the original vector and its projections onto U.i.e., xπU(x).This displacement vector is orthogonal to all basis vectors of U.If the basis vectors {b1,,bm} is Orthonormal, then the projection equation is simplified to
πU(x)=BBTx since BTB=I
The coordinates λ become
λ=BTx
This means that we no longer have to compute the inverse (BTB)1, which saves computation time.



Example- University Question
Find the projection of the vector x=[600] in R3 into the span of [111][012]

The projection Matrix is 

Pπ=B(BT.B)1BT
=[101112][3335]1[111012]

=[101112][5/61/21/21/2][111012]

=[101112][5/61/31/61/201/2]
=[5/61/31/61/31/31/31/61/35/6]
The projection of the vector x is
πU(x)=Pπ.x
=[5/61/31/61/31/31/31/61/35/6][600]

=[521]







Comments

Popular posts from this blog

Mathematics for Machine Learning- CST 284 - KTU Minor Notes - Dr Binu V P

  Introduction About Me Syllabus Course Outcomes and Model Question Paper University Question Papers and Evaluation Scheme -Mathematics for Machine learning CST 284 KTU Overview of Machine Learning What is Machine Learning (video) Learn the Seven Steps in Machine Learning (video) Linear Algebra in Machine Learning Module I- Linear Algebra 1.Geometry of Linear Equations (video-Gilbert Strang) 2.Elimination with Matrices (video-Gilbert Strang) 3.Solving System of equations using Gauss Elimination Method 4.Row Echelon form and Reduced Row Echelon Form -Python Code 5.Solving system of equations Python code 6. Practice problems Gauss Elimination ( contact) 7.Finding Inverse using Gauss Jordan Elimination  (video) 8.Finding Inverse using Gauss Jordan Elimination-Python code Vectors in Machine Learning- Basics 9.Vector spaces and sub spaces 10.Linear Independence 11.Linear Independence, Basis and Dimension (video) 12.Generating set basis and span 13.Rank of a Matrix 14.Linear Mapping...

4.3 Sum Rule, Product Rule, and Bayes’ Theorem

 We think of probability theory as an extension to logical reasoning Probabilistic modeling  provides a principled foundation for designing machine learning methods. Once we have defined probability distributions corresponding to the uncertainties of the data and our problem, it turns out that there are only two fundamental rules, the sum rule and the product rule. Let p(x,y) is the joint distribution of the two random variables x,y. The distributions p(x) and p(y) are the corresponding marginal distributions, and p(y|x) is the conditional distribution of y given x. Sum Rule The addition rule states the probability of two events is the sum of the probability that either will happen minus the probability that both will happen. The addition rule is: P(AB)=P(A)+P(B)P(AB) Suppose A and B are disjoint, their intersection is empty. Then the probability of their intersection is zero. In symbols:  P(AB)=0  The addition law then simplifies to: $P(...

5.1 Optimization using Gradient Descent

Since machine learning algorithms are implemented on a computer, the mathematical formulations are expressed as numerical optimization methods.Training a machine learning model often boils down to finding a good set of parameters. The notion of “good” is determined by the objective function or the probabilistic model. Given an objective function, finding the best value is done using optimization algorithms. There are two main branches of continuous optimization constrained and unconstrained. By convention, most objective functions in machine learning are intended to be minimized, that is, the best value is the minimum value. Intuitively finding the best value is like finding the valleys of the objective function, and the gradients point us uphill. The idea is to move downhill (opposite to the gradient) and hope to find the deepest point. For unconstrained optimization, this is the only concept we need,but there are several design choices. For constrained optimization, we need to intr...