Setup your environment
This guide is for Linux (I’m on Ubuntu 20.04, but I don’t know why it wouldn’t work on 18.04 and others).
Create directories
export KAGT_HOME="${HOME}/kaldi_ag_custom"
mkdir "${KAGT_HOME}"
cd "${KAGT_HOME}"
export KAGT_AUDIO_DATA="${KAGT_HOME}/audio_data"
mkdir "${KAGT_AUDIO_DATA}"
Clone projects
git clone https://github.com/daanzu/speech-training-recorder.git
git clone https://github.com/daanzu/kaldi_ag_training.git
export KAGT_RECORDER="${KAGT_HOME}/speech-training-recorder"
export KAGT_TRAINING="${KAGT_HOME}/kaldi_ag_training"
Download base model
cd "${KAGT_TRAINING}"
export KAGT_MODEL_BASE=kaldi_model_daanzu_20200905_1ep-mediumlm-base
wget "https://github.com/daanzu/kaldi_ag_training/releases/download/v0.1.0/${KAGT_MODEL_BASE}.zip"
unzip "${KAGT_MODEL_BASE}.zip"
rm "${KAGT_MODEL_BASE}.zip"
Create audio data
You’ll need to record yourself reading prompts. @daanzu has created a great tool that makes this easy. We checked it out earlier, so just move to that directory:
cd "${KAGT_RECORDER}"
Record
export PROMPT="${KAGT_RECORDER}/prompts/rainbow_passage.txt"
python3 recorder.py -d ../audio_data -o -p "${PROMPT}" -c $(wc -l "${PROMPT}" | cut -d" " -f1)
export PROMPT="${KAGT_RECORDER}/prompts/timit.txt"
python3 recorder.py -d ../audio_data -o -p "${PROMPT}" -c $(wc -l "${PROMPT}" | cut -d" " -f1)
export PROMPT="${KAGT_RECORDER}/prompts/arctic.txt"
python3 recorder.py -d ../audio_data -o -p "${PROMPT}" -c $(wc -l "${PROMPT}" | cut -d" " -f1)
Convert TSV to SCP
Go to the training project directory and run the conversion script:
cd "${KAGT_TRAINING}"
export KAGT_TRAINING_DATASET="${KAGT_TRAINING}/dataset"
python3 convert_tsv_to_scp.py "${KAGT_AUDIO_DATA}/recorder.tsv" "${KAGT_TRAINING_DATASET}"
Check that the results aren’t empty:
wc -l "${KAGT_TRAINING_DATASET}/text"
Train
cd "${KAGT_TRAINING}"
docker run -it --rm -v "${KAGT_TRAINING}:/mnt/input" -v "${KAGT_AUDIO_DATA}:/mnt/audio_data" -w /mnt/input --user "$(id -u):$(id -g)" --runtime=nvidia daanzu/kaldi_ag_training_gpu bash run.finetune.sh "${KAGT_MODEL_BASE}" dataset
Export Model
The original instructions weren’t specific here, so I’m assuming that it wants the G.st
file from the new model.
export KAGT_MODEL_FINAL="${KAGT_HOME}/exported_model"
python3 export_trained_model.py finetune "${KAGT_MODEL_FINAL}"
python3 -m kaldi_active_grammar compile_agf_dictation_graph -v -m "${KAGT_MODEL_FINAL}" "${KAGT_MODEL_FINAL}/G.fst"