Esiry.com
Focus on Machine Learning.

Perceptron

The first part of the series of “Statistical Learning Methods” notes corresponds to the second chapter of the original work. A lot of references to the original explanation, joined their own understanding. The algorithm in the book is implemented in Python, and the animation is visualized with Matplotlib, which should be considered very hard. A dry goods is very hard, and it will be good if you can stick to it.

Concept

The perceptron is a two-category model that inputs the feature vector of the instance and the ± category of the output instance.

Perceptron model

Definition

Suppose the input space is
Perceptron, the output space is
Perceptron, x and y belong to these two spaces, then the following function from the input space to the output space:

Perceptron

It is called a perceptron. Where w and b are called perceptron model parameters,
PerceptronCalled a weight or weight vector,
PerceptronCalled offset, w·x represents the inner product of vectors w and x. Sign is a function:

Perceptron

The geometric interpretation of the perceptron is a linear equation

Perceptron

Divide the feature space into two parts: positive and negative:

Perceptron

This plane (degraded to a straight line in 2D) is called a separation hyperplane.

Perceptron learning strategy

Linear separability of data sets

definition

Given data set

Perceptron

among them
PerceptronIf there is a hyperplane S

Perceptron

It is possible to completely divide the positive and negative instance points completely correctly, so that T is linearly separable, otherwise T is linearly inseparable.

Perceptron learning strategy

Assuming the data set is linearly separable, we want to find a reasonable loss function.

A naive idea is to use the total number of misclassification points, but such a loss function is not a continuous derivable function of the parameters w, b, and it is not natural to grasp the change of the function, so it is not easy to optimize (do not know when to terminate the training, Or the timing of termination is not optimal).

Another idea is to choose the total distance from all misclassified points to the hyperplane S. To do this, first define the distance from point x0 to plane S:

Perceptron

Denominator
PerceptronIs the Lnorm of w, the so-called L2 norm, which refers to the sum of the squares of the elements of the vector and then the square root (length). This formula is well understood, recalling the distance from the point learned in the middle to the plane:

Perceptron

The geometric meaning of the distance from the point to the hyperplane S here is the generalization of the above distance in the multidimensional space.

And because, if point i is misclassified, there must be

Perceptron

Established, so we removed the absolute value symbol and got the formula for the distance from the misclassification point to the hyperplane S:

Perceptron

Assuming that all misclassification points constitute a set M, then the total distance of all misclassified points to the hyperplane S is

Perceptron

The denominator has little effect, and it must be positive anyway. Without considering the denominator, we get the loss function of perceptron learning:

Perceptron

Perceptron learning algorithm

Original form

The perceptron learning algorithm is an algorithm for the following optimization problems:

Perceptron

The perceptron learning algorithm is driven by misclassification. First, a hyperplane is randomly selected, and then the loss function is continuously minimized by the gradient descent method. The gradient of the loss function is made up of:

Perceptron

Given. The so-called gradient is a vector that points to the direction in which the scalar field grows fastest and the length is the maximum rate of change. The so-called scalar field means that the attribute of any point in space can be represented by a scalar field (the individual understands the scalar as the output of the function).

Randomly select a misclassification point i to update the parameters w, b:

Perceptron

Upper formula
PerceptronIs the learning rate. The parameter of the loss function is added to the opposite direction of the gradient rise, and the gradient is then lowered. Therefore, the above iteration can cause the loss function to continue to decrease until it is zero. Then the original form of perceptron learning algorithm is obtained:

Perceptron

For this algorithm, use the following example as the test data:

Perceptron

Give the Python implementation and visualization code as follows:

Perceptron algorithm code

Finally, at the most exciting moment, with this knowledge, you can perfectly visualize this simple algorithm:

# -*- coding:utf-8 -*-
# Filename: train2.1.py
# Author:hankcs
# Date: 2015/1/30 16:29
import copy
from matplotlib import pyplot as plt
from matplotlib import animation
 
training_set = [[(3, 3), 1], [(4, 3), 1], [(1, 1), -1]]
w = [0, 0]
b = 0
history = []
 
 
def update(item):
    """
    update parameters using stochastic gradient descent
    :param item: an item which is classified into wrong class
    :return: nothing
    """
    global w, b, history
    w[0] += 1 * item[1] * item[0][0]
    w[1] += 1 * item[1] * item[0][1]
    b += 1 * item[1]
    print w, b
    history.append([copy.copy(w), b])
    # you can uncomment this line to check the process of stochastic gradient descent
 
 
def cal(item):
    """
    calculate the functional distance between 'item' an the dicision surface. output yi(w*xi+b).
    :param item:
    :return:
    """
    res = 0
    for i in range(len(item[0])):
        res += item[0][i] * w[i]
    res += b
    res *= item[1]
    return res
 
 
def check():
    """
    check if the hyperplane can classify the examples correctly
    :return: true if it can
    """
    flag = False
    for item in training_set:
        if cal(item) <= 0: flag = True update(item) # draw a graph to show the process if not flag: print "RESULT: w: " + str(w) + " b: " + str(b) return flag if __name__ == "__main__": for i in range(1000): if not check(): break # first set up the figure, the axis, and the plot element we want to animate fig = plt.figure() ax = plt.axes(xlim=(0, 2), ylim=(-2, 2)) line, = ax.plot([], [], 'g', lw=2) label = ax.text([], [], '') # initialization function: plot the background of each frame def init(): line.set_data([], []) x, y, x_, y_ = [], [], [], [] for p in training_set: if p[1] > 0:
                x.append(p[0][0])
                y.append(p[0][1])
            else:
                x_.append(p[0][0])
                y_.append(p[0][1])
 
        plt.plot(x, y, 'bo', x_, y_, 'rx')
        plt.axis([-6, 6, -6, 6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Perceptron Algorithm (www.hankcs.com)')
        return line, label
 
 
    # animation function.  this is called sequentially
    def animate(i):
        global history, ax, line, label
 
        w = history[i][0]
        b = history[i][1]
        if w[1] == 0: return line, label
        x1 = -7
        y1 = -(b + w[0] * x1) / w[1]
        x2 = 7
        y2 = -(b + w[0] * x2) / w[1]
        line.set_data([x1, x2], [y1, y2])
        x1 = 0
        y1 = -(b + w[0] * x1) / w[1]
        label.set_text(history[i])
        label.set_position([x1, y1])
        return line, label
 
    # call the animator.  blit=true means only re-draw the parts that have changed.
    print history
    anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
                                   blit=True)
    plt.show()
    anim.save('perceptron.gif', fps=2, writer='imagemagick')

Visualization

Perceptron

It can be seen that the hyperplane is attracted by the misclassification point and moves toward it, so that the distance between the two is gradually reduced until it is correctly classified. Through this animation, is there a more intuitive understanding of the gradient reduction algorithm of the perceptron?

Algorithm convergence

Write the input vector to the constant form of constant 1
Perceptron, its maximum length is
Perceptron, the parameter vector of the perceptron
PerceptronSet the conditions
PerceptronThe hyperplane can completely categorize the dataset and define the minimum gamma:

Perceptron

Then the number of misclassifications k satisfies:

Perceptron

For proof, please refer to the Statistical Learning Method P31.

Dual form of perceptron learning algorithm

Duality means that w and b are expressed as a linear combination of test data i, and w and b are obtained by solving the coefficients. Specifically, if the misclassification point i is gradually modified, wb is modified n times, then the increments of w and b for i are respectively
Perceptron,Here
Perceptron, the final solved parameters are expressed as:

Perceptron

So there is Algorithm 2.2:

Perceptron

Perceptron dual algorithm code

More matrix calculations are involved, so NumPy is used more:

# -*- coding:utf-8 -*-
# Filename: train2.2.py
# Author:hankcs
# Date: 2015/1/31 15:15
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation
 
# An example in that book, the training set and parameters' sizes are fixed
training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1]])
 
a = np.zeros(len(training_set), np.float)
b = 0.0
Gram = None
y = np.array(training_set[:, 1])
x = np.empty((len(training_set), 2), np.float)
for i in range(len(training_set)):
    x[i] = training_set[i][0]
history = []
 
def cal_gram():
    """
    calculate the Gram matrix
    :return:
    """
    g = np.empty((len(training_set), len(training_set)), np.int)
    for i in range(len(training_set)):
        for j in range(len(training_set)):
            g[i][j] = np.dot(training_set[i][0], training_set[j][0])
    return g
 
 
def update(i):
    """
    update parameters using stochastic gradient descent
    :param i:
    :return:
    """
    global a, b
    a[i] += 1
    b = b + y[i]
    history.append([np.dot(a * y, x), b])
    # print a, b # you can uncomment this line to check the process of stochastic gradient descent
 
 
# calculate the judge condition
def cal(i):
    global a, b, x, y
 
    res = np.dot(a * y, Gram[i])
    res = (res + b) * y[i]
    return res
 
 
# check if the hyperplane can classify the examples correctly
def check():
    global a, b, x, y
    flag = False
    for i in range(len(training_set)):
        if cal(i) <= 0: flag = True update(i) if not flag: w = np.dot(a * y, x) print "RESULT: w: " + str(w) + " b: " + str(b) return False return True if __name__ == "__main__": Gram = cal_gram() # initialize the Gram matrix for i in range(1000): if not check(): break # draw an animation to show how it works, the data comes from history # first set up the figure, the axis, and the plot element we want to animate fig = plt.figure() ax = plt.axes(xlim=(0, 2), ylim=(-2, 2)) line, = ax.plot([], [], 'g', lw=2) label = ax.text([], [], '') # initialization function: plot the background of each frame def init(): line.set_data([], []) x, y, x_, y_ = [], [], [], [] for p in training_set: if p[1] > 0:
                x.append(p[0][0])
                y.append(p[0][1])
            else:
                x_.append(p[0][0])
                y_.append(p[0][1])
 
        plt.plot(x, y, 'bo', x_, y_, 'rx')
        plt.axis([-6, 6, -6, 6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Perceptron Algorithm 2 (www.hankcs.com)')
        return line, label
 
 
    # animation function.  this is called sequentially
    def animate(i):
        global history, ax, line, label
 
        w = history[i][0]
        b = history[i][1]
        if w[1] == 0: return line, label
        x1 = -7.0
        y1 = -(b + w[0] * x1) / w[1]
        x2 = 7.0
        y2 = -(b + w[0] * x2) / w[1]
        line.set_data([x1, x2], [y1, y2])
        x1 = 0.0
        y1 = -(b + w[0] * x1) / w[1]
        label.set_text(str(history[i][0]) + ' ' + str(b))
        label.set_position([x1, y1])
        return line, label
 
    # call the animator.  blit=true means only re-draw the parts that have changed.
    anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
                                   blit=True)
    plt.show()
    # anim.save('perceptron2.gif', fps=2, writer='imagemagick')

Visualization

Perceptron

As with the result of Algorithm 1, we can also change the data set:

training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1], [[5, 2], -1]])

Will get a more complicated result:

Perceptron

After reading

Learn the common concepts and common processes in ML with the simplest models.

In addition, this article is only a personal note, serving personal memorandum, and does not guarantee quality and follow-up. Still, the blog only adds, to get started, or to read classics.

Reference

Some of the code in this article refers to the implementation of OldPanda.

赞(0)
Please indicate the source:Esiry » Perceptron

Comment 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址