What is a Vector Norm?
We usually use the norms for vectors and rarely for matrices. At first, let’s define what the norm of a vector is? A vector norm can be described as below:
A function that operates on a vector space and returns a scalar element.
$\left \| \cdot \right \|: V \to R$
$x \mapsto x$
which assigns each vector $x$ its length $\left \| x \right \| \in \mathbb{R}$, such that for all $\lambda \in \mathbb{R}$ and $x,y \in V$, the following hold:
Absolutely homogeneous:$\left \| \lambda x \right \|=| \lambda| \left \| x \right \|$
Triangle inequality:$\left \| x+y \right \| \le \left \| x \right \| + \left \| y \right \|$
Positive definite: $\left \| x \right \| \ge 0$ and $\left \| x \right \| = 0 \Leftrightarrow x=0$
A norm is denoted by $\left \| \right \|_q$ in which $q$ shows the order of the norm .
The intuition behind the norm is to measure a kind of distance.
A norm is mathematically defined as below:
$$\parallel x \parallel _q=\left ( \sum_{i=1}^n |x|^q \right )^{\frac{1}{q}}$$
The vector norm: 3.7416573867739413
Mostly Used Norms
A norm is mathematically defined as below:
$$\parallel x \parallel _q=\left ( \sum_{i=1}^n |x|^q \right )^{\frac{1}{q}}$$
The sign $|.|$ is an operation that outputs the absolute value of its argument. The example of which is $|-2|=2$ and $|2|=2$. You can implement norm by the following Python code:
# Import Numpy package and the norm function
import numpy as np
from numpy.linalg import norm
# Define a vector
v = np.array([2,3,1,0])
# Take the q-norm which q=2
q = 2
v_norm = norm(v, ord=q)
# Print values
print('The vector: ', v)
print('The vector norm: ', v_norm)
import numpy as np
from numpy.linalg import norm
# Define a vector
v = np.array([2,3,1,0])
# Take the q-norm which q=2
q = 2
v_norm = norm(v, ord=q)
# Print values
print('The vector: ', v)
print('The vector norm: ', v_norm)
The vector: [2 3 1 0]
The vector norm: 3.7416573867739413
Mostly Used Norms
In the previous section, I described what is the norm in general, and we implement it in Python. Here, I would like to discuss the norms that are mostly used in Machine Learning.
$L_1$ Norm ( Manhattan Norm)
The Manhattan norm on $\mathbb{R}^n$ is defined for $x \in \mathbb{R}^n$ as
The $L_1$ norm is technically the summation over the absolute values of a vector. The simple mathematical formulation is as below:
The Manhattan norm on $\mathbb{R}^n$ is defined for $x \in \mathbb{R}^n$ as
$$\left \| x \right \|_1 := \sum_{i=1}^n |x_i|$$
In Machine Learning, we usually use norm when the sparsity of a vector matters, i.e., when the essential factor is the non-zero elements of a matrix simply target the non-zero elements by adding them up.The $L1$ norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small, and in turn, the model is less complex.
Suppose we have a vector:
✅ L1 Norm (Manhattan norm / Taxicab norm)
Example
# L1 norm of a vector
from numpy import array
from numpy.linalg import norm
a = array([1, 2, 3])
print(a)
l1 = norm(a, 1)
print(l1)
O/P
[1 2 3] 6.0
$L_2$ Norm ( Euclidean Norm)
# Import Numpy package and the norm function
import numpy as np
from numpy.linalg import norm
# Defining a random vector
v = np.random.rand(1,5)
# Calculate L-2 norm
sum_square = 0
for i in range(v.shape[1]):
# Define two random vector of size (1,5). Obviously v does not equal w!!
sum_square += np.square(v[0,i])
L2_norm_approach_1 = np.sqrt(sum_square)
# Calculate L-2 norm using numpy
L2_norm_approach_2 = norm(v, ord=2)
print('L2_norm: ', L2_norm_approach_1)
print('L2_norm with numpy:', L2_norm_approach_2)
Max Norm
$$\parallel x \parallel ^∞ = max(|x_1|, |x_2|, ..., |x_n|)$$
$L_2$ Norm ( Euclidean Norm)
$L_2$norm is also called the Euclidean norm which is the Euclidean distance of a vector from the origin.is defined as
The Euclidean norm of $x \in \mathbb{R}^n$
$\left \| x \right \|_2 := \sqrt{ \sum_{i=1}^n x_i^2}= \sqrt{x^T x}$
The $L2$ norm is commonly used in Machine Learning due to being differentiable, which is crucial for optimization purposes.Like the $L1$ norm, the $L2$ norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small and, in turn, the model is less complex.
By far, the $L2$ norm is more commonly used than other vector norms in machine learning.
✅ L2 Norm (Euclidean norm)
Example
# l2 norm of a vectorfrom numpy import array
from numpy.linalg import norm
a = array([1, 2, 3])
print(a)
l2 = norm(a)
print(l2)
O/P
[1 2 3]
3.7416573867739413
Let’s calculate $L2$ norm of a random vector with Python using two approaches. Both should lead to the same results:
# Import Numpy package and the norm function
import numpy as np
from numpy.linalg import norm
# Defining a random vector
v = np.random.rand(1,5)
# Calculate L-2 norm
sum_square = 0
for i in range(v.shape[1]):
# Define two random vector of size (1,5). Obviously v does not equal w!!
sum_square += np.square(v[0,i])
L2_norm_approach_1 = np.sqrt(sum_square)
# Calculate L-2 norm using numpy
L2_norm_approach_2 = norm(v, ord=2)
print('L2_norm: ', L2_norm_approach_1)
print('L2_norm with numpy:', L2_norm_approach_2)
Max Norm
Well, you may not see this norm quite often. However, it is a kind of definition that you should be familiar with. The max norm is denoted with $\parallel x \parallel ^∞$ and the mathematical formulation is as below:
It simply returns the maximum absolute value in the vector elements
✅ Max Norm (Infinity norm / Chebyshev norm)
Example
from numpy import inf
from numpy import array
from numpy.linalg import norm
a = array([1, 2, 3])
print(a)
maxnorm = norm(a, inf)
print(maxnorm)
O/P
[1 2 3] 3.0
Max norm is also used as a regularization in machine learning, such as on neural network weights, called max norm regularization.
Norm of a Matrix
$\left \| M \right \| = \sqrt{ \sum_{i,j}M_{ij}^2 }$
For calculating the norm of a matrix, we have the unusual definition of Frobenius norm which is very similar to $L2$ norm of a vector and is as below:
$\left \| M \right \| = \sqrt{ \sum_{i,j}M_{ij}^2 }$
python code
import numpy as np
x=10*np.random.randn(10)
print(x)
print(np.linalg.norm(x,0))
print(np.linalg.norm(x,1))
print(np.linalg.norm(x,2))
print(np.linalg.norm(x,np.inf))
output:
[ 16.578067 -5.66057775 1.37715832 16.18872848 4.30709896 10.36359172 -1.45975146 3.24831072 -10.49827027 11.67825408] 10.0
81.3598087617449
30.920523435980932
16.57806699912792
Note: The $L^2$ norm (or the Frobenius norm in case of a matrix) and the squared $L^2$ norm are widely used in machine learning, deep learning and data science in general. For example, norms can be used as cost functions. Let's say that you want to fit a line to a set of data points. One way to find the better line is to start with random parameters and iterate by minimizing the cost function. The cost function is a function that represents the error of your model, so you want this error to be as small as possible. Norms are useful here because it gives you a way of measuring this error. The norm will map the vector containing all your errors to a simple scalar, and the cost function is this scalar for a set of value for your parameters.
We have seen that norms are nothing more than an array reduced to a scalar. We have also noticed that there are some variations according to the function we can use to calculate it. Choosing which norm to use depends a lot of the problem to be solved since there are some pros and cons for applying one or another. For instance, the $L^1$ norm is more robust than the $L^2$ norm. This means that the $L^2$ norm is more sensible to outliers since significant error values will give enormous squared error values.
Comments
Post a Comment