Linear Discriminant Analysis (LDA)¶
New in version 0.6.
Linear discriminant analysis (LDA) [1] is a method used to determine the features that separates some classes of items. The output of LDA may be used as a linear classifier, or for dimensionality reduction for purposes of classification.
See also: Principal Component Analysis (PCA)
Usage Explanation¶
For reduction of dataset x
with labels stored in array (labels
)
to new dataset new_x
containg just n
number of
columns
new_x = pa.preprocess.LDA(x, labels, n)
The sorted array of scattermatrix eigenvalues for dataset x
described
with variable labels
can be obtained as follows
eigenvalues = pa.preprocess.LDA_discriminants(x, labels)
Minimal Working Examples¶
In this example we create dataset x
of 150 random samples. Every sample
is described by 4 values and label. The labels are stored in
array labels
.
Firstly, it is good to see the eigenvalues of scatter matrix to determine how many rows is reasonable to reduce
import numpy as np
import padasip as pa
np.random.seed(100) # constant seed to keep the results consistent
N = 150 # number of samples
classes = np.array(["1", "a", 3]) # names of classes
cols = 4 # number of features (columns in dataset)
x = np.random.random((N, cols)) # random data
labels = np.random.choice(classes, size=N) # random labels
print pa.preprocess.LDA_discriminants(x, labels)
what prints
>>> [ 2.90863957e02 2.28352079e02 1.23545720e18 1.61163011e18]
From this output it is obvious that reasonable number of columns to keep is 2. The following code reduce the number of features to 2.
import numpy as np
import padasip as pa
np.random.seed(100) # constant seed to keep the results consistent
N = 150 # number of samples
classes = np.array(["1", "a", 3]) # names of classes
cols = 4 # number of features (columns in dataset)
x = np.random.random((N, cols)) # random data
labels = np.random.choice(classes, size=N) # random labels
new_x = pa.preprocess.LDA(x, labels, n=2)
to check if the size of new dataset is really correct we can print the shapes as follows
>>> print "Shape of original dataset: {}".format(x.shape)
Shape of original dataset: (150, 4)
>>> print "Shape of new dataset: {}".format(new_x.shape)
Shape of new dataset: (150, 2)
References¶
[1]  Ronald A Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936. 
Code Explanation¶

padasip.preprocess.lda.
LDA
(x, labels, n=False)[source]¶ Linear Discriminant Analysis function.
Args:
 x : input matrix (2d array), every row represents new sample
 labels : list of labels (iterable), every item should be label for sample with corresponding index
Kwargs:
 n : number of features returned (integer)  how many columns should the output keep
Returns:
 new_x : matrix with reduced size (number of columns are equal n)

padasip.preprocess.lda.
LDA_base
(x, labels)[source]¶ Base function used for Linear Discriminant Analysis.
Args:
 x : input matrix (2d array), every row represents new sample
 labels : list of labels (iterable), every item should be label for sample with corresponding index
Returns:
 eigenvalues, eigenvectors : eigenvalues and eigenvectors from LDA analysis

padasip.preprocess.lda.
LDA_discriminants
(x, labels)[source]¶ Linear Discriminant Analysis helper for determination how many columns of data should be reduced.
Args:
 x : input matrix (2d array), every row represents new sample
 labels : list of labels (iterable), every item should be label for sample with corresponding index
Returns:
 discriminants : array of eigenvalues sorted in descending order