Principal Component Analysis (PCA)¶
New in version 0.6.
Changed in version 1.2.0.
Principal component analysis (PCA) is a statistical method how to convert a set of observations with possibly correlated variables into a data-set of linearly uncorrelated variables (principal components). The number of principal components is less or equal than the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance.
See also: Linear Discriminant Analysis (LDA)
Usage Explanation¶
For reduction of dataset x
to n
number of principal components
new_x = pa.preprocess.PCA(x, n)
If you want to see the ordered eigenvalues of principal components, you can do it as follows:
eigenvalues = pa.preprocess.PCA_components(x)
Minimal Working Example¶
In this example is generated random numbers (100 samples, with 3 values each). After the PCA application the reduced data-set is produced (all samples, but only 2 valueseach)
import numpy as np
import padasip as pa
np.random.seed(100)
x = np.random.uniform(1, 10, (100, 3))
new_x = pa.preprocess.PCA(x, 2)
If you do not know, how many principal components you should use, you can check the eigenvalues of principal components according to following example
import numpy as np
import padasip as pa
np.random.seed(100)
x = np.random.uniform(1, 10, (100, 3))
print pa.preprocess.PCA_components(x)
what prints
>>> [ 8.02948402 7.09335781 5.34116273]
Code Explanation¶
-
padasip.preprocess.pca.
PCA
(x, n=False)[source]¶ Principal component analysis function.
Args:
x : input matrix (2d array), every row represents new sample
Kwargs:
n : number of features returned (integer) - how many columns should the output keep
Returns:
new_x : matrix with reduced size (lower number of columns)