Kaldi Active Grammar Training

Create a custom Kaldi model trained to your own voice

Aug 20, 2021

Setup your environment

This guide is for Linux (I’m on Ubuntu 20.04, but I don’t know why it wouldn’t work on 18.04 and others).

Create directories

export KAGT_HOME="${HOME}/kaldi_ag_custom"
mkdir "${KAGT_HOME}"
cd "${KAGT_HOME}"

export KAGT_AUDIO_DATA="${KAGT_HOME}/audio_data"
mkdir "${KAGT_AUDIO_DATA}"

Clone projects

git clone https://github.com/daanzu/speech-training-recorder.git
git clone https://github.com/daanzu/kaldi_ag_training.git
export KAGT_RECORDER="${KAGT_HOME}/speech-training-recorder"
export KAGT_TRAINING="${KAGT_HOME}/kaldi_ag_training"

Download base model

cd "${KAGT_TRAINING}"
export KAGT_MODEL_BASE=kaldi_model_daanzu_20200905_1ep-mediumlm-base
wget "https://github.com/daanzu/kaldi_ag_training/releases/download/v0.1.0/${KAGT_MODEL_BASE}.zip"
unzip "${KAGT_MODEL_BASE}.zip"
rm "${KAGT_MODEL_BASE}.zip"

Create audio data

You’ll need to record yourself reading prompts. @daanzu has created a great tool that makes this easy. We checked it out earlier, so just move to that directory:

cd "${KAGT_RECORDER}"

Record

export PROMPT="${KAGT_RECORDER}/prompts/rainbow_passage.txt"
python3 recorder.py -d ../audio_data -o -p "${PROMPT}" -c $(wc -l "${PROMPT}" | cut -d" " -f1)

export PROMPT="${KAGT_RECORDER}/prompts/timit.txt"
python3 recorder.py -d ../audio_data -o -p "${PROMPT}" -c $(wc -l "${PROMPT}" | cut -d" " -f1)

export PROMPT="${KAGT_RECORDER}/prompts/arctic.txt"
python3 recorder.py -d ../audio_data -o -p "${PROMPT}" -c $(wc -l "${PROMPT}" | cut -d" " -f1)

Convert TSV to SCP

Go to the training project directory and run the conversion script:

cd "${KAGT_TRAINING}"
export KAGT_TRAINING_DATASET="${KAGT_TRAINING}/dataset"
python3 convert_tsv_to_scp.py "${KAGT_AUDIO_DATA}/recorder.tsv" "${KAGT_TRAINING_DATASET}"

Check that the results aren’t empty:

wc -l "${KAGT_TRAINING_DATASET}/text"

Train

cd "${KAGT_TRAINING}"
docker run -it --rm -v "${KAGT_TRAINING}:/mnt/input" -v "${KAGT_AUDIO_DATA}:/mnt/audio_data" -w /mnt/input --user "$(id -u):$(id -g)" --runtime=nvidia daanzu/kaldi_ag_training_gpu bash run.finetune.sh "${KAGT_MODEL_BASE}" dataset

Export Model

The original instructions weren’t specific here, so I’m assuming that it wants the G.st file from the new model.

export KAGT_MODEL_FINAL="${KAGT_HOME}/exported_model"
python3 export_trained_model.py finetune "${KAGT_MODEL_FINAL}"
python3 -m kaldi_active_grammar compile_agf_dictation_graph -v -m "${KAGT_MODEL_FINAL}" "${KAGT_MODEL_FINAL}/G.fst"

Branton’s Tech Notes

Discussion about this post