Principal Component Analysis (PCA)

New in version 0.6.

Changed in version 1.2.0.

Principal component analysis (PCA) is a statistical method how to convert a set of observations with possibly correlated variables into a data-set of linearly uncorrelated variables (principal components). The number of principal components is less or equal than the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance.

See also: Linear Discriminant Analysis (LDA)

Usage Explanation

For reduction of dataset x to n number of principal components

new_x = pa.preprocess.PCA(x, n)

If you want to see the ordered eigenvalues of principal components, you can do it as follows:

eigenvalues = pa.preprocess.PCA_components(x)

Minimal Working Example

In this example is generated random numbers (100 samples, with 3 values each). After the PCA application the reduced data-set is produced (all samples, but only 2 valueseach)

import numpy as np
import padasip as pa

np.random.seed(100)
x = np.random.uniform(1, 10, (100, 3))
new_x = pa.preprocess.PCA(x, 2)

If you do not know, how many principal components you should use, you can check the eigenvalues of principal components according to following example

import numpy as np
import padasip as pa

np.random.seed(100)
x = np.random.uniform(1, 10, (100, 3))
print pa.preprocess.PCA_components(x)

what prints

>>> [ 8.02948402  7.09335781  5.34116273]

Code Explanation

padasip.preprocess.pca.PCA(x, n=False)[source]

Principal component analysis function.

Args:

  • x : input matrix (2d array), every row represents new sample

Kwargs:

  • n : number of features returned (integer) - how many columns should the output keep

Returns:

  • new_x : matrix with reduced size (lower number of columns)

padasip.preprocess.pca.PCA_components(x)[source]

Principal Component Analysis helper to check out eigenvalues of components.

Args:

  • x : input matrix (2d array), every row represents new sample

Returns:

  • components: sorted array of principal components eigenvalues