Learning Entropy (LE)¶
New in version 1.0.0.
The Learning Entropy (LE) is non-Shannon entropy based on conformity of individual data samples to the contemporary learned governing law of a learning system.
Content of this page:
Algorithm Explanation¶
Two options how to estimate the LE are implemented - direct approach and multiscale approach.
Direct approach
With direct approach the LE is evaluated for every sample as follows
\(\textrm{LE}_d(k) = \frac{ (\Delta \textbf{w}(k) - \overline{| \Delta \textbf{w}_M(k) |}) } { (\sigma({| \Delta \textbf{w}_M(k) |})+\epsilon) }\)
where
\(|\Delta \textbf{w}(k)|\) are the absolute values of current weights increment.
\(\overline{| \Delta \textbf{w}_M(k) |}\) are averages of absolute values of window used for LE evaluation.
\(\sigma (| \Delta \textbf{w}_M(k) |)\) are standard deviatons of absolute values of window used for LE evaluation.
\(\epsilon\) is regularization term to preserve stability for small values of standard deviation.
Multiscale approach
Value for every sample is defined as follows
\(\textrm{LE}(k) = \frac{1}{n \cdot n_\alpha} \sum f(\Delta w_{i}(k), \alpha ),\)
where \(\Delta w_i(k)\) stands for one weight from vector \(\Delta \textbf{w}(k)\), the \(n\) is number of weights, the \(n_\alpha\) is number of used detection sensitivities
\(\alpha=[\alpha_{1}, \alpha_{2}, \ldots, \alpha_{n_{\alpha}}].\)
The function \(f(\Delta w_{i}(k), \alpha)\) is defined as follows
\(f(\Delta w_{i}(k),\alpha)= \{{\rm if}\,\left(\left\vert \Delta w_{i}(k)\right\vert > \alpha\cdot \overline{\left\vert \Delta w_{Mi}(k)\right\vert }\right)\, \rm{then} \, 1, \rm{else }\,0 \}.\)
Usage Instructions and Optimal Performance¶
The LE algorithm can be used as follows
le = pa.detection.learning_entropy(w, m=30, order=1)
in case of direct approach. For multiscale approach an example follows
le = pa.detection.learning_entropy(w, m=30, order=1, alpha=[8., 9., 10., 11., 12., 13.])
where w is matrix of the adaptive parameters (changing in time, every row should represent one time index), m is window size, order is LE order and alpha is vector of sensitivities.
Used adaptive models
In general it is possible to use any adaptive model. The input of the LE algorithm is matrix of an adaptive parameters history, where every row represents the parameters used in a particular time and every column represents one parameter in whole adaptation history.
Selection of sensitivities
The optimal number of detection sensitivities and their values depends on task and data. The sensitivities should be chosen in range where the function \(LE(k)\) returns a value lower than 1 for at least one sample in data, and for at maximally one sample returns value of 0.
Minimal Working Example¶
In this example is demonstrated how can the multiscale approach LE highligh the position of a perturbation inserted in a data. As the adaptive model is used Normalized Least-mean-square (NLMS) adaptive filter. The perturbation is manually inserted in sample with index \(k=1000\) (the length of data is 2000).
import numpy as np
import matplotlib.pylab as plt
import padasip as pa
# data creation
n = 5
N = 2000
x = np.random.normal(0, 1, (N, n))
d = np.sum(x, axis=1) + np.random.normal(0, 0.1, N)
# perturbation insertion
d[1000] += 2.
# creation of learning model (adaptive filter)
f = pa.filters.FilterNLMS(n, mu=1., w=np.ones(n))
y, e, w = f.run(d, x)
# estimation of LE with weights from learning model
le = pa.detection.learning_entropy(w, m=30, order=2, alpha=[8., 9., 10., 11., 12., 13.])
# LE plot
plt.plot(le)
plt.show()
Code Explanation¶
-
padasip.detection.le.
learning_entropy
(w, m=10, order=1, alpha=False)[source]¶ This function estimates Learning Entropy.
Args:
w : history of adaptive parameters of an adaptive model (2d array), every row represents parameters in given time index.
Kwargs:
m : window size (1d array) - how many last samples are used for evaluation of every sample.
order : order of the LE (int) - order of weights differention
alpha : list of senstitivites (1d array). If not provided, the LE direct approach is used.
Returns:
Learning Entropy of data (1 d array) - one value for every sample