We already know some means to improve the language model, such as:
- Better input: word → root → character
- Better regularization/pretreatment
- These methods combine to get a better language model
Multiple granularity of text:
Finer granularity is equivalent to reducing the vocabulary and making the model easier to make choices. The test shows that the error is indeed reduced:
Better regularization and pretreatment
Regularization will not be said.
Preprocessing refers to randomly replacing some words in a sentence with another word (such as replacing one place name with another), or using BiGram statistics to generate a replacement.
This will result in a smoother distribution, and the high-frequency words will give some playing opportunities to the low-frequency words.
The effect of reducing the error rate is as follows (regularization on the left and preprocessing on the right):
A better model?
Noise Contrastive Estimation（NCE）
Instead of using an expensive cross-entropy loss function, it is better to use an approximation called NCE loss. The theory is that when the k value is large enough, the two gradients are close.
Larger number of LSTM units
The number of LSTM units is increased to 1024, and the larger the value of k, the better, until the GPU memory is full.
After using these various improvements, I finally got the following results: