User Tools

Site Tools


kaldi_asr_toolkit

Kaldi ASR Toolkit

Under this topic you can find information about the Kaldi ASR Toolkit, like URLs and paths where to find it. Kaldi is a more recent ASR toolkit compared to SPRAAK. Like SPRAAK, it contains functionality to train different types of GMM-HMM acoustic models, but also various types of Deep Neural Networks (DNNs), the current standard in ASR. This page provides links to Kaldi's own documentation pages and tips & tricks on how to use Kaldi for certain contexts. Feel free to add experiences which you feel are useful to others (i.e. to not 'reinvent the wheel').

Recommendation (DEPRECATED!): user your own LaMachine on Ponyland

It is recommended to use your own LaMachine to use Kaldi on Ponyland instead of the shared one (however, LaMachine is now deprecated!). The instructions to install/prepare your own LaMachine are here: Your own Kaldi-CLAM-LaMachine (just Step 1: Prerequisites).

Please note that you should change 'cristian' by other name in every step.

Once you have your own LaMachine-CLAM-Kaldi, you can use these commands to activate your environment:

 $ ssh thunderlane
 $ lamachine-lacristianmachine-activate​​
 (lacristianmachine)$ cd `echo $KALDI_ROOT/egs`

​Also, thunderlane and rarity are the best servers to work with Kaldi.

Details

Name: Kaldi ASR
Type: open source software
Developer info page: http://www.kaldi-asr.org
Documentation page: http://www.kaldi-asr.org/doc/
Compile and installation instructions: http://kaldi-asr.org/doc/install.html

Location of Kaldi sources: https://github.com/kaldi-asr/kaldi
Location of linux x86_64 compiled binaries (usable on Ponyland): applejack.science.ru.nl:/vol/custom/opt/lamachine2/opt/kaldi. This is a central location for the Kaldi binaries and (for now) seem to always be the most recent version. These are compiled with NVIDIA CUDA 9.1.85 GPU support. In other words NVIDIA CUDA 9.1 should be installed on the target machine (already installed on Ponyland).
Note: for the above applejack urls a Ponyland SSH account is required from Wessel Stoop

AlexASR: a kaldi-based incremental online decoder

AlexASR is an incremental online decoder based on Kaldi. It can be used if you'd like to use ASR in a time sensitive context. It immediately decodes the speech as it comes in and only requires some finalization after the last audio packet was received. For example, we used this decoder in the game developed in the CHASING project. To reduce player waiting time on ASR results, AlexASR was used to decode speech as it was being recorded from the player.
Location of sources and info: https://github.com/UFAL-DSG/alex-asr
See also Alex ASR.

There are some useful tutorials to get to know the Kaldi toolkit and how to operate it to do various things. Below a list sorted per topic.
Training acoustic models

More practical tutorials:

Forced alignment using existing acoustic models

kaldi_asr_toolkit.txt · Last modified: 2022/09/23 22:06 by mvangompel