Occasionally browsed a paper with practical reference value Nguyen and Guo (2007). This paper compares the performance of some models and algorithms on part-of-speech tagging and OCR tasks, including HMM, CRF, AP, Structured SVM, M3N, SEARN algorithm and SLE algorithm, which is very useful for algorithm selection. This blog keeps track of some points.

## Structured learning model

### Multi-class SVM

Used as a baseline model in the paper, the idea is to degenerate the sequence labeling problem into multiple classification problems.

Here phi is a feature function.

### Structured SVM

The only change compared to multi-class SVM is the introduction of the loss function delta in the constraints of the function interval:

Is the author’s objective function mislabeled? Without a denominator n, shouldn’t it be like this?

The above formula is quoted from “Structured Support Vector Machine Learning Method and Application Research”.

### Maximum Margin Markov Networks

The Chinese name is the maximum interval Markov network. The idea is to define a complete graph on the sequence label y. Define a potential function for each edge:

Where phi_k is the kth feature function. Since the number of edges is fixed, the feature function is defined over the entire instance:

The Markov network model maximizes the following joint conditional probability distribution:

M3N solves the weight vector w by the interval maximization principle, and the idea is the same as the structural risk minimization of the SVM, so it has the advantage of SVM.

There is a Chinese word segmentation method based on The maximum interval Markov network model.pdf for reference.

### Average perception machine

For the latest new love, please refer to:

“Perceptron”

“200 lines of Python code implementation perceptual machine part-of-speech tagger”

“Structured Average Perceptron Based Word Segmenter Java Implementation”

### SEARN

The sequence annotation is viewed from the perspective of the search, and the multi-classifier is used for sequence annotation. I haven’t read the paper carefully, it feels like an algorithm shot by my head.

### CRF

CRF that everyone loves:

Conditional Random Field

CRF++ Code Analysis

### HMM

Hidden Markov Model

### SLE

An algorithm that combines multiple multi-class classifiers into structured prediction. Added a transfer matrix, multiplied by the number of votes for each category, and searched with viterbi.

Looking at the formula roughly, this is probably the case, laboratory products, I can not invent new models, but I can combine a number of old models, such as SLE, stacked learning and so on.

## Test

Two sets of experiments were performed, a set of part-of-speech tagging, and a set of handwritten letters OCR.

### Adjustment

The parameters that can be adjusted for each model are optimized:

### Part-of-speech tag test result

The error rate at different training scales is as low as possible.

Form form:

Surprisingly, you can make a baseline multi-class SVM error so low! This verifies the argument that the author of “200 lines of Python code implementation perceptual machine part-of-speech tagger” does not have to worry about the search algorithm for part-of-speech tagging. If the error of a classifier is small every time, then the role of the transfer matrix can be Ignore it.

There is also training time:

SVMmulticlass 1.8m, SVMstruct 6.8m, M3N 12h, Perceptron 6.4m, SEARN 2.1m, CRF 0.53h, HMM 0.23s.

It also reflects the advantages of SVM.

### OCR results

The conclusion is similar, except for the author’s son, the SVM performs best.

## In conclusion

After ignoring the author of the paper, the structured SVM is the best.

## Structured SVM implementation

Cornell University’s C implementation: https://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html

Have a look, check it out.

## Reference

Comparisons of Sequence Labeling Algorithms and Extensions.pdf