Principal Component Analysis (PCA)¶

New in version 0.6.

Principal component analysis (PCA) is a statistical method how to convert a set of observations with possibly correlated variables into a data-set of linearly uncorrelated variables (principal components). The number of principal components is less or equal than the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance.

Usage Explanation¶

For reduction of dataset x to n number of principal components

new_x = pa.preprocess.PCA(x, n)


If you want to see the ordered eigenvalues of principal components, you can do it as follows:

eigenvalues = pa.preprocess.PCA_components(x)


Minimal Working Example¶

In this example is generated random numbers (100 samples, with 3 values each). After the PCA application the reduced data-set is produced (all samples, but only 2 valueseach)

import numpy as np

np.random.seed(100)
x = np.random.uniform(1, 10, (100, 3))
new_x = pa.preprocess.PCA(x, 2)


If you do not know, how many principal components you should use, you can check the eigenvalues of principal components according to following example

import numpy as np

np.random.seed(100)
x = np.random.uniform(1, 10, (100, 3))
print pa.preprocess.PCA_components(x)


what prints

>>> [ 8.02948402  7.09335781  5.34116273]


Code Explanation¶

padasip.preprocess.pca.PCA(x, n=False)[source]

Principal component analysis function.

Args:

• x : input matrix (2d array), every row represents new sample

Kwargs:

• n : number of features returned (integer) - how many columns should the output keep

Returns:

• new_x : matrix with reduced size (lower number of columns)
padasip.preprocess.pca.PCA_components(x)[source]

Principal Component Analysis helper to check out eigenvalues of components.

Args:

• x : input matrix (2d array), every row represents new sample

Returns:

• components: sorted array of principal components eigenvalues