Source code for padasip.preprocess.pca

"""
.. versionadded:: 0.6
.. versionchanged:: 1.2.0

Principal component analysis (PCA) is a statistical method
how to convert a set of observations with possibly correlated
variables into a data-set of linearly uncorrelated variables
(principal components). The number of principal components
is less or equal than the number of original variables.
This transformation is defined in such a way that the first
principal component has the largest possible variance.

.. contents::
   :local:
   :depth: 1

See also: :ref:`preprocess-lda`

Usage Explanation
=======================

For reduction of dataset :code:`x` to :code:`n` number of principal components

.. code-block:: python

    new_x = pa.preprocess.PCA(x, n)

If you want to see the ordered eigenvalues of principal components,
you can do it as follows:

.. code-block:: python

    eigenvalues = pa.preprocess.PCA_components(x)

Minimal Working Example
===========================

In this example is generated random numbers (100 samples, with 3 values each).
After the PCA application the reduced data-set is produced
(all samples, but only 2 valueseach)

.. code-block:: python

    import numpy as np
    import padasip as pa

    np.random.seed(100)
    x = np.random.uniform(1, 10, (100, 3))
    new_x = pa.preprocess.PCA(x, 2)

If you do not know, how many principal components you should use,
you can check the eigenvalues of principal components according to
following example

.. code-block:: python

    import numpy as np
    import padasip as pa

    np.random.seed(100)
    x = np.random.uniform(1, 10, (100, 3))
    print pa.preprocess.PCA_components(x)

what prints

>>> [ 8.02948402  7.09335781  5.34116273]

Code Explanation
====================
"""
from __future__ import division
import numpy as np


[docs]def PCA_components(x):
    """
    Principal Component Analysis helper to check out eigenvalues of components.

    **Args:**

    * `x` : input matrix (2d array), every row represents new sample

    **Returns:**

    * `components`: sorted array of principal components eigenvalues

    """
    # validate inputs
    try:
        x = np.array(x)
    except:
        raise ValueError('Impossible to convert x to a numpy array.')
    # eigen values and eigen vectors of data covariance matrix
    eigen_values, eigen_vectors = np.linalg.eig(np.cov(x.T))
    # sort eigen vectors according biggest eigen value
    eigen_order = eigen_vectors.T[(-eigen_values).argsort()]
    # form output - order the eigenvalues
    return eigen_values[(-eigen_values).argsort()]


[docs]def PCA(x, n=False):
    """
    Principal component analysis function.

    **Args:**

    * `x` : input matrix (2d array), every row represents new sample

    **Kwargs:**

    * `n` : number of features returned (integer) - how many columns
      should the output keep

    **Returns:**

    * `new_x` : matrix with reduced size (lower number of columns)
    """
    n = n if n else x.shape[1] - 1
    assert x.shape[1] > n, "The requested n is bigger than \
        number of features in x."
    # eigen values and eigen vectors of data covariance matrix
    eigen_values, eigen_vectors = np.linalg.eig(np.cov(x.T))
    # sort eigen vectors according biggest eigen value
    eigen_order = eigen_vectors.T[(-eigen_values).argsort()]
    # form output - reduced x matrix
    return eigen_order[:n].dot(x.T).T
Source code for padasip.preprocess.pca

Table of Contents

Related Topics