When I read the chapter “Statistical Learning Methods” Support Vector Machine, I saw “Apply Lagrangian Duality (see Appendix C), and get the optimal solution of the original problem by solving the dual problem,” then recursively Take a look at the Lagrangian duality of Appendix C. The name is learned, but in fact it is excerpted, adding a small amount of personal understanding and background supplement. After all, theorems and inferences can be understood, and it is impossible to write flowers. Even the original work did not give proof of partial inference, just knowing that there is such a thing, the next time you have to remember to come over and take a look. The following is the body of the excerpt:
In the constraint optimization problem, the Lagrange duality is often used to transform the original problem into a dual problem, and the solution to the original problem is obtained by solving the dual problem. This method is used in many statistical learning methods, such as maximum entropy models and support vector machines. Here is a brief description of the main concepts and results of Lagrange’s duality.
Is a continuous differentiable function defined on Rn. Consider constrained optimization problems
Call this constraint optimization problem the original optimization problem or the original problem.
First, introduce a generalized Lagrange function
Is a Lagrange rider,
. Consider the function of x:
Here, the subscript P indicates the original question.
Suppose that given an x, if x violates the constraints of the original problem, there is some i
Or there is some j so that
Then there is
Because if an i makes a constraint
If j makes
And the rest of each
Both are taken as 0.
Conversely, if x satisfies the constraint formula (C.2) and formula (C.3), then from equations (C.5) and (C.4),
(The two of the two are non-positive, one is 0. If you want to take the maximum value, of course, both must be 0). therefore,
So if you consider the minimization problem
It is equivalent to the original optimization problem (C.1) ~ (C.3), that is, they have the same solution. problem
It is called the minimal maximization problem of the generalized Lagrangian function. In this way, the original optimization problem is expressed as a minimal maximal problem of the generalized Lagrangian function. For convenience, define the optimal value of the original problem
The value called the original question.
2. Dual problem
Consider maximizing again
, which is
A very small minimum problem called a generalized Lagrangian function.
The maximal minimization problem of the generalized Lagrangian function can be expressed as a constrained optimization problem:
A dual problem called the original problem. Define the optimal value for the dual problem
The value called the dual problem.
3. The relationship between the original problem and the dual problem
The relationship between the original problem and the dual problem is discussed below.
If both the original problem and the dual problem have optimal values, then
From the formula (C.12) and the formula (C.5), for any of α, β and x, there are
Since both the original problem and the dual problem have optimal values,
This seems to be well understood that the upper bound of the minimum of the same function is less than the lower bound of its maximum.
Are the feasible solutions of the original problem (C.1)~(C.3) and the dual problem (C.12)~(C.13), respectively, and
They are the optimal solutions for the original problem and the dual problem.
Under certain conditions, the optimal values of the original problem and the dual problem are equal.
. At this time, the solution to the original problem can be replaced by the solution to the dual problem. The relevant important conclusions are described below in the form of theorems without proof.
Consider the original question (C.1) ~ (C.3) and the dual question (C.12) ~ (C.13). Hypothetical function
Is a convex function,
Is an affine function;
Bean knowledge affine function
An affine function is a special convex function. A function that is both a convex function and a concave function is called an affine function. It must be the sum of a linear function and a constant. In a finite dimensional space, an affine function is a linear function. The importance of the function is that the lower semi-continuous convex function on the local convex space (including the normed linear space and the finite dimensional space) must be the upper envelope of the continuous affine function family.
– “Mathematics Ci Hai Volume III” (Unfortunately, I generally do not understand Chinese academic books, can only understand the red sentence.)
Affine functions represent vector-valued functions of the form
The coefficients can be scalars or dense or sparse matrices. The constant term is a scalar or a column vector.
In geometry, an affine transformation or affine map (from the Latin, affinis, “connected with”) between two vector spaces consists of a linear transformation followed by a translation. In a geometric setting, these are precisely the functions that map straight lines to straight lines.
——https://mathworld.wolfram.com/AffineFunction.html (English is very easy to understand, the affine function is a linear function, the input is an n-dimensional vector, the parameter A can be a constant, or a matrix of m*n, b can be a constant, or can be an m-dimensional column vector, The output is an m-dimensional column vector. Geometrically, an affine function is a transformation from one linear space to another.)
And assume inequality constraints
Is strictly feasible, that is, there is x, for all i have
, then there exists
Is the solution to the original problem,
Is the solution to the dual problem, and
For the original problem (C.1) ~ (C.3) and the dual problem (C.12) ~ (C.13), the hypothesis function
Is a convex function,
Is an affine function and inequality constraints
Is strictly feasible, then
The necessary and sufficient conditions for the solution of the original problem and the dual problem are respectively; the following Karush-Kuhn-Tucker (KKT) conditions are satisfied:
In particular, the formula (C.24) is called the dual complement condition of KKT. According to this condition, if
Statistical Learning Method