Abstract
In this paper, a self–defined wake–up–word (WUW) recognition system and its embedded system implementation has been proposed. To execute whole system, it is divided into two phases: training phase and testing–comparison phase. In the training phase, a wake–up word of any language is recorded, and the voice segment is cut out by using the Voice Activity Detection (VAD). Then we use the Mel–Frequency Cepstral Coefficients (MFCC) as the pre–processing to extract features of the input speech signal for follow–up use. The Expectation–Maximization Algorithm is used to train the Gaussian Mixture Model, and the Baum–Welch is used to train the Hidden Markov Model. These two models are combined into a data model of a speaker's speech dataset. In the testing–comparison phase, an unknown voice segment is input. The VAD and MFCC are still reused for the same purpose with training phase. Subsequently, the output feature will be calculated through the log likelihood of the Gaussian Mixture Model to find the correspond speaker, and the Viterbi algorithm is used to calculate the state sequence of the unknown speech through Hidden Markov Model. Finally, we calculate Gaussian Mixture Model similarity method and use Levenshtein Distance algorithm to compare dataset state sequence with the unknown speech state sequence. This system can work well with a small amount of training data, and the system is implemented on the embedded board to test performance where it takes 1.4 seconds to recognize the wake-up word.