User Tools

Site Tools


distributed_computation

SPRAAK Distributed computing

Different ways are used to distribute computing for training and evaluating in SPRAAK. See corresponding sections below.

Training acoustic models

Distributing the computing required for training acoustic models is possible by utilising multiple hosts. Please refer to SPRAAK's own page on how to do that: http://www.spraak.org/documentation/doxygen/doc/html/spr__train_8py.html

Evaluating (i.e. recognition) experiments

To distribute the computation required for example the recognition of a large corpus of speech recordings, the spr_scoreres program can be used.
This program recalculates the totals of multiple .RES result files that SPRAAK produces in a recognition experiment. In this way, you can manually divide the corpus in multiple parts (let's say the amount of cores you have in your machine or the amount of machines you have available), run a recognition experiment with every part on a core or separate machine and merge the .RES result file of each to one big result file with spr_scoreres. To do a merge of multiple files, following is required:

  1. Merge the contents of every .RES file into a single text file (e.g. use 'cat' command on linux)
  2. Use spr_scoreres to recalculate the scores in the merged file: spr_scoreres -PAR -c <path to .cor file containing full corpus> -r <path to file in which all .RES are merged> -nr <path to output file> -omit “<tags/words to ignore or omit in the output file (e.g. <s>)>”

Optimize SPRAAK for N cores

When compiling SPRAAK from source the config.py configuration has an option to optimize SPRAAK voor multithreading. This option can contain the number of cores, i.e.
# typical number of threads (CPU cores available)
# 0 : compile single threaded (no support for multi-threading at all)
# 1 : optimize for single threaded operation but allow multiple threads
# N : optimize for N concurrent threads
spraak.Nthreads = 4

Not sure what is optimized though. It doesn't seem to use multiple cores by itself when running recognition experiments (i.e. spr_eval.py)

distributed_computation.txt · Last modified: 2015/04/30 07:55 by mganzeboom