Focus on Machine Learning.

Comparison of Sequence Labeling Model Algorithms

Occasionally browsed a paper with practical reference value Nguyen and Guo (2007). This paper compares the performance of some models and algorithms on part-of-speech tagging and OCR tasks, including HMM, CRF, AP, Structured SVM, M3N, SEARN algorithm and SLE algorithm, which is very useful for algorithm selection. This blog keeps track of some points.
屏幕快照 2016-09-14 下午5.31.38.png

Structured learning model

Multi-class SVM

Used as a baseline model in the paper, the idea is to degenerate the sequence labeling problem into multiple classification problems.

屏幕快照 2016-09-14 下午4.25.36.png

Here phi is a feature function.

Structured SVM

The only change compared to multi-class SVM is the introduction of the loss function delta in the constraints of the function interval:

屏幕快照 2016-09-14 下午4.37.27.png

Is the author’s objective function mislabeled? Without a denominator n, shouldn’t it be like this?

屏幕快照 2016-09-14 下午4.39.26.png

The above formula is quoted from “Structured Support Vector Machine Learning Method and Application Research”.

Maximum Margin Markov Networks

The Chinese name is the maximum interval Markov network. The idea is to define a complete graph on the sequence label y. Define a potential function for each edge:

屏幕快照 2016-09-14 下午4.56.02.png

Where phi_k is the kth feature function. Since the number of edges is fixed, the feature function is defined over the entire instance:

屏幕快照 2016-09-14 下午5.01.46.png

The Markov network model maximizes the following joint conditional probability distribution:

屏幕快照 2016-09-14 下午5.08.22.png

M3N solves the weight vector w by the interval maximization principle, and the idea is the same as the structural risk minimization of the SVM, so it has the advantage of SVM.

There is a Chinese word segmentation method based on The maximum interval Markov network model.pdf for reference.

Average perception machine

For the latest new love, please refer to:


“200 lines of Python code implementation perceptual machine part-of-speech tagger”

“Structured Average Perceptron Based Word Segmenter Java Implementation”


The sequence annotation is viewed from the perspective of the search, and the multi-classifier is used for sequence annotation. I haven’t read the paper carefully, it feels like an algorithm shot by my head.


CRF that everyone loves:

Conditional Random Field

CRF++ Code Analysis


Hidden Markov Model


An algorithm that combines multiple multi-class classifiers into structured prediction. Added a transfer matrix, multiplied by the number of votes for each category, and searched with viterbi.

Looking at the formula roughly, this is probably the case, laboratory products, I can not invent new models, but I can combine a number of old models, such as SLE, stacked learning and so on.


Two sets of experiments were performed, a set of part-of-speech tagging, and a set of handwritten letters OCR.


The parameters that can be adjusted for each model are optimized:

屏幕快照 2016-09-14 下午5.30.27.png

Part-of-speech tag test result

The error rate at different training scales is as low as possible.

屏幕快照 2016-09-14 下午5.31.38.png

Form form:

屏幕快照 2016-09-14 下午5.33.00.png

Surprisingly, you can make a baseline multi-class SVM error so low! This verifies the argument that the author of “200 lines of Python code implementation perceptual machine part-of-speech tagger” does not have to worry about the search algorithm for part-of-speech tagging. If the error of a classifier is small every time, then the role of the transfer matrix can be Ignore it.

There is also training time:

SVMmulticlass 1.8m, SVMstruct 6.8m, M3N 12h, Perceptron 6.4m, SEARN 2.1m, CRF 0.53h, HMM 0.23s.

It also reflects the advantages of SVM.

OCR results

屏幕快照 2016-09-14 下午5.38.43.png

The conclusion is similar, except for the author’s son, the SVM performs best.

In conclusion

After ignoring the author of the paper, the structured SVM is the best.

Structured SVM implementation

Cornell University’s C implementation:

Have a look, check it out.


Comparisons of Sequence Labeling Algorithms and Extensions.pdf

Please indicate the source:Esiry » Comparison of Sequence Labeling Model Algorithms

Comment 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址