User Tools

Site Tools


preprocessing

Preprocessing

SPRAAK comes with a range of filters and other signal processing algorithms which can be used to preprocess the audio before sending it to a recogniser. These algorithms are for example used to extract the standard MFCC features from the audio on which the recognition takes place. See http://www.spraak.org/documentation/doxygen/doc/html/spr__um__feat.html for an example preprocessing and feature extraction script.

Tips & Tricks

Preprocessing and existing acoustic models

When using existing acoustic models, it is probably not wise to change the preprocessing scripts settings. This was tried with the models received from the KU Leuven. Those were trained on the CGN corpus and retrained on elderly speech from the JASMIN corpus. The preprocessing setting 'flength' (or audio frame length) was changed from 0.025s to 0.032s which resulted in better freephone recognition rates. However, when doing word recognition experiments with a bigram language model, results contained <PARTIAL> tags (explained here). Returning flength to 0.025s resolved this issue.

preprocessing.txt · Last modified: 2015/04/15 15:19 by mganzeboom