Multi-layer Perceptron (MLP)

New in version 0.3.

In this module is stored everything related to Multi-layer perceptron (MLP). This neural network can be used for classification and regression.

Minimal Working Example

import numpy as np
import padasip as pa

# data creation
x = np.array([
        [0,0,0,0], [1,0,0,0], [0,1,0,0], [1,1,0,0],
        [0,0,1,0], [1,0,1,0], [0,1,1,0], [1,1,1,0],
        [0,0,0,1], [1,0,0,1], [0,1,0,1], [1,1,0,1],
        [0,0,1,1], [1,0,1,1], [0,1,1,1], [1,1,1,1]
    ])
d = np.array([0,1,1,0,0,1,0,0,1,0,1,0,1,1,1,0])
N = len(d)
n = 4

# creation of neural network
nn = pa.ann.NetworkMLP([5,6], n, outputs=1, activation="tanh", mu="auto")    

# training
e, mse = nn.train(x, d, epochs=200, shuffle=True)    

# get results
y = nn.run(x)

And the result (pairs: target, output) can look like

>>> for i in zip(d, y): print i
... 
(0, 0.0032477183193071906)
(1, 1.0058082383308447)
(1, 1.0047503447788306)
(0, 0.0046026142618665845)
(0, 0.0003037425037410007)
(1, 1.0017672193832869)
(0, 0.0015817734995124679)
(0, 0.0019115885715706904)
(1, 0.99342117275580499)
(0, 0.00069114178424850147)
(1, 1.0021789943501729)
(0, 0.0021355836851727717)
(1, 0.99809312951378826)
(1, 1.0071488717506856)
(1, 1.0067500768423701)
(0, -0.0045962250501771244)
>>> 

Learning Rate Selection

If you select the learning rate (\(\mu\) in equations, or mu in code) manually, it will be used the same value for all nodes, otherwise it is selected automatically [1] as follows

\(\mu_{ij} = m^{-0.5}\)

where the \(m\) is the amount of nodes on input of given node. The automatic selection is recomended and default option.

Default Values of Weights

The distribution from what the weights are taken is chosen automatically [1], it has zero mean and the standard derivation estimated as follows

\(\sigma_{w} = m^{-0.5}\)

where the \(m\) is the amount of nodes on input of given node.

References

[1](1, 2, 3, 4) Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efficient backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.

Code Explanation

class padasip.ann.mlp.Layer(n_layer, n_input, activation_f, mu)[source]

This class represents a single hidden layer of the MLP.

Args:

  • n_layer : size of the layer (int)

  • n_input : how many inputs the layer have (int)

  • activation_f : what function should be used as activation function (str)

  • mu
    : learning rate (float or str), it can be directly the float value,

    or string auto for automatic selection of learning rate [1]

activation(x, f='sigmoid', der=False)[source]

This function process values of layer outputs with activation function.

Args:

  • x : array to process (1-dimensional array)

Kwargs:

  • f : activation function
  • der : normal output, or its derivation (bool)

Returns:

  • values processed with activation function (1-dimensional array)
predict(x)[source]

This function make forward pass through this layer (no update).

Args:

  • x : input vector (1-dimensional array)

Returns:

  • y
    : output of MLP (float or 1-diemnsional array).

    Size depends on number of nodes in this layer.

update(w, e)[source]

This function make update according provided target and the last used input vector.

Args:

  • d
    : target (float or 1-dimensional array).

    Size depends on number of MLP outputs.

Returns:

  • w
    : weights of the layers (2-dimensional layer).

    Every row represents one node.

  • e
    : error used for update (float or 1-diemnsional array).

    Size correspond to size of input d.

class padasip.ann.mlp.NetworkMLP(layers, n_input, outputs=1, activation='sigmoid', mu='auto')[source]

This class represents a Multi-layer Perceptron neural network.

Args:*

  • layers
    : array describing hidden layers of network

    (1-dimensional array of integers). Every number in array represents one hidden layer. For example [3, 6, 2] create network with three hidden layers. First layer will have 3 nodes, second layer will have 6 nodes and the last hidden layer will have 2 nodes.

  • n_input : number of network inputs (int).

Kwargs:

  • outputs : number of network outputs (int). Default is 1.

  • activation : activation function (str)

    • “sigmoid” - sigmoid
    • “tanh” : hyperbolic tangens
  • mu
    : learning rate (float or str), it can be:
    • float value - value is directly used as mu
    • “auto” - this will trigger automatic selection of learning rate

    according to [1]

predict(x)[source]

This function make forward pass through MLP (no update).

Args:

  • x : input vector (1-dimensional array)

Returns:

  • y
    : output of MLP (float or 1-diemnsional array).

    Size depends on number of MLP outputs.

run(x)[source]

Function for batch usage of already trained and tested MLP.

Args:

  • x
    : input array (2-dimensional array).

    Every row represents one input vector (features).

Returns:

  • y: output vector (n-dimensional array). Every row represents

    output (outputs) for an input vector.

test(x, d)[source]

Function for batch test of already trained MLP.

Args:

  • x
    : input array (2-dimensional array).

    Every row represents one input vector (features).

  • d
    : input array (n-dimensional array).

    Every row represents target for one input vector. Target can be one or more values (in case of multiple outputs).

Returns:

  • e: output vector (n-dimensional array). Every row represents

    error (or errors) for an input and output.

train(x, d, epochs=10, shuffle=False)[source]

Function for batch training of MLP.

Args:

  • x
    : input array (2-dimensional array).

    Every row represents one input vector (features).

  • d
    : input array (n-dimensional array).

    Every row represents target for one input vector. Target can be one or more values (in case of multiple outputs).

Kwargs:

  • epochs
    : amount of epochs (int). That means how many times

    the MLP will iterate over the passed set of data (x, d).

  • shuffle
    : if true, the order of inputs and outpust are shuffled (bool).

    That means the pairs input-output are in different order in every epoch.

Returns:

  • e: output vector (m-dimensional array). Every row represents

    error (or errors) for an input and output in given epoch. The size of this array is length of provided data times amount of epochs (N*epochs).

  • MSE
    : mean squared error (1-dimensional array). Every value

    stands for MSE of one epoch.

update(d)[source]

This function make update according provided target and the last used input vector.

Args:

  • d
    : target (float or 1-dimensional array).

    Size depends on number of MLP outputs.

Returns:

  • e
    : error used for update (float or 1-diemnsional array).

    Size correspond to size of input d.