Skip to content

Commit 355f4e1

Browse files
vimalmanohardanpovey
authored andcommitted
[src] Adding documentation for lattice discriminative training functions (kaldi-asr#1854)
1 parent d8e42b0 commit 355f4e1

File tree

1 file changed

+52
-11
lines changed

1 file changed

+52
-11
lines changed

src/lat/lattice-functions.h

Lines changed: 52 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -191,11 +191,39 @@ bool LatticeBoost(const TransitionModel &trans,
191191
This function implements either the MPFE (minimum phone frame error) or SMBR
192192
(state-level minimum bayes risk) forward-backward, depending on whether
193193
"criterion" is "mpfe" or "smbr". It returns the MPFE
194-
criterion of SMBR criterion for this file, and outputs the posteriors (which
195-
may be positive or negative) into "arc_post".
196-
Note: setting one_silence_class to false gives the old traditional behavior,
197-
true gives a possibly improved behavior which will tend to reduce insertions
198-
in the trained model.
194+
criterion of SMBR criterion for this utterance, and outputs the posteriors (which
195+
may be positive or negative) into "post".
196+
197+
@param [in] trans The transition model. Used to map the
198+
transition-ids to phones or pdfs.
199+
@param [in] silence_phones A list of integer ids of silence phones. The
200+
silence frames i.e. the frames where num_ali
201+
corresponds to a silence phones are treated specially.
202+
The behavior is determined by 'one_silence_class'
203+
being false (traditional behavior) or true.
204+
Usually in our setup, several phones including
205+
the silence, vocalized noise, non-spoken noise
206+
and unk are treated as "silence phones"
207+
@param [in] lat The denominator lattice
208+
@param [in] num_ali The numerator alignment
209+
@param [in] criterion The objective function. Must be "mpfe" or "smbr"
210+
for MPFE (minimum phone frame error) or sMBR
211+
(state minimum bayes risk) training.
212+
@param [in] one_silence_class Determines how the silence frames are treated.
213+
Setting this to false gives the old traditional behavior,
214+
where the silence frames (according to num_ali) are
215+
treated as incorrect. However, this means that the
216+
insertions are not penalized by the objective.
217+
Setting this to true gives the new behaviour, where we
218+
treat silence as any other phone, except that all pdfs
219+
of silence phones are collapsed into a single class for
220+
the frame-error computation. This can possible reduce
221+
the insertions in the trained model. This is closer to
222+
the WER metric that we actually care about, since WER is
223+
generally computed after filtering out noises, but
224+
does penalize insertions.
225+
@param [out] post The "MBR posteriors" i.e. derivatives w.r.t to the
226+
pseudo log-likelihoods of states at each frame.
199227
*/
200228
BaseFloat LatticeForwardBackwardMpeVariants(
201229
const TransitionModel &trans,
@@ -212,12 +240,25 @@ BaseFloat LatticeForwardBackwardMpeVariants(
212240
used in our normal MMI training recipes, where it's instead done using various command
213241
line programs that each do a part of the job. This function was written for use in
214242
neural-net MMI training.
215-
If drop_frames is true, it will not compute any posteriors on frames where the num and
216-
den have disjoint pdf-ids.
217-
If "convert_to_pdf_ids" is true, it will convert the output to be at the level of pdf-ids,
218-
not transition-ids.
219-
If "cancel" is true, it will cancel out any positive and negative parts from
220-
the same transition-id (or pdf-id, if convert_to_pdf_ids == true).
243+
244+
@param [in] trans The transition model. Used to map the
245+
transition-ids to phones or pdfs.
246+
@param [in] lat The denominator lattice
247+
@param [in] num_ali The numerator alignment
248+
@param [in] drop_frames If "drop_frames" is true, it will not compute any
249+
posteriors on frames where the num and den have disjoint
250+
pdf-ids.
251+
@param [in] convert_to_pdf_ids If "convert_to_pdfs_ids" is true, it will
252+
convert the output to be at the level of pdf-ids, not
253+
transition-ids.
254+
@param [in] cancel If "cancel" is true, it will cancel out any positive and
255+
negative parts from the same transition-id (or pdf-id,
256+
if convert_to_pdf_ids == true).
257+
@param [out] arc_post The output MMI posteriors of transition-ids (or
258+
pdf-ids if convert_to_pdf_ids == true) at each frame
259+
i.e. the difference between the numerator
260+
and denominator posteriors.
261+
221262
It returns the forward-backward likelihood of the lattice. */
222263
BaseFloat LatticeForwardBackwardMmi(
223264
const TransitionModel &trans,

0 commit comments

Comments
 (0)