You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/doc/online_programs.dox
+123Lines changed: 123 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,9 @@ script found there. The programs are as follows:
37
37
38
38
There is also a Java equivalent of the online-audio-client which contains slightly more features and has a GUI.
39
39
40
+
In addition, there is a GStreamer 1.0 compatible plugin that acts as a filter, taking raw audio as input and producing
41
+
recognized word as output. The plugin is based on \ref OnlineFasterDecoder, as other online recognition programs.
42
+
40
43
\section audio_server Online Audio Server
41
44
42
45
The main difference between the online-server-gmm-decode-faster and online-audio-server-decode-faster programs is the input: the former accepts feature vectors, while the latter accepts RAW audio.
Or simply double-click the JAR file in the graphical interface.
118
121
122
+
\section gst_plugin GStreamer plugin
123
+
124
+
Kaldi toolkit comes with a plugin for the <a href="http://gstreamer.freedesktop.org/">GStreamer</a> media streaming framework (version 1.0 or compatible).
125
+
The plugin acts as a filter that accepts raw audio as input and produces recognized words as output.
126
+
127
+
The main benefit of the plugin is the fact that it makes Kaldi's online speech recognition functionality available to all
128
+
programming languages that support GStreamer 1.0 (that includes Python, Ruby, Java, Vala and many more). It also simplifies the integration
129
+
of the Kaldi online decoder in applications since communicating with the decoder follows GStreamer standards.
130
+
131
+
\subsection gst_plugin_installation Installation
132
+
133
+
The source of the GStreamer plugin is located in the `src/gst-plugin` directory. To compile the plugin, rest of the Kaldi
134
+
toolkit has to be compiled with the '-fPIC' compilation option. To do this, just add `-fPIC` to the `CXXFLAGS` in
135
+
the `src/kaldi.mk` file. Then recompile Kaldi as usual. Also compile the online extensions (`make ext`).
136
+
137
+
Make sure the package that provides GStreamer 1.0 development headers is installed on your system (on Debian, the needed package is called
138
+
`libgstreamer1.0-dev`).
139
+
140
+
Finally, run `make depend` and `make` in the `src/gst-plugin` directory. This should result in a file `src/gst-plugin/libgstkaldi.so`
141
+
which contains the GStreamer plugin.
142
+
143
+
To make GStreamer able to find the Kaldi plugin, you have to add the `src/gst-plugin` directory to its plugin search path. To do this,
144
+
add the directory to the GST_PLUGIN_PATH environment variable:
145
+
\verbatim
146
+
export GST_PLUGIN_PATH=$KALDI_ROOT/src/gst-plugin
147
+
\endverbatim
148
+
Of course, replace `$KALDI_ROOT` with the actual location of the Kaldi root folder on your file system.
149
+
150
+
Now, running `gst-inspect-1.0 onlinegmmdecodefaster` should provide info about the plugin:
151
+
\verbatim
152
+
# gst-inspect-1.0 onlinegmmdecodefaster
153
+
Factory Details:
154
+
Rank: none (0)
155
+
Long-name: OnlineGmmDecodeFaster
156
+
Klass: Speech/Audio
157
+
Description: Convert speech to text
158
+
Author: Tanel Alumae <tanel.alumae@phon.ioc.ee>
159
+
[..]
160
+
Element Properties:
161
+
name : The name of the object
162
+
flags: readable, writable
163
+
String. Default: "onlinegmmdecodefaster0"
164
+
parent : The parent of the object
165
+
flags: readable, writable
166
+
Object of type "GstObject"
167
+
silent : Determines whether incoming audio is sent to the decoder or not
168
+
flags: readable, writable
169
+
Boolean. Default: false
170
+
model : Filename of the acoustic model
171
+
flags: readable, writable
172
+
String. Default: "final.mdl"
173
+
fst : Filename of the HCLG FST
174
+
flags: readable, writable
175
+
String. Default: "HCLG.fst"
176
+
[..]
177
+
min-cmn-window : Minumum CMN window used at start of decoding (adds latency only at start)
\subsection usage_cli Usage through the command-line
188
+
189
+
The most simple way to use the GStreamer plugin is via the command line. You have to specify the model files used for decoding
190
+
when lauching the plugin. To do this, set the `model`, `fst`, `word-syms`, `silence-phones` and optionally the `lda-mat`
191
+
plugin properties (similarly to Kaldi's command-line online decoders). The decoder accepts only 16KHz 16-bit mono audio. Any audio stream can be automatically converted to the
192
+
required format using GStreamer's `audioresample` and `audioconvert` plugins.
193
+
194
+
For example, to decode the file `test1.wav` using a model files in `tri2b_mmi`, and have the recognized stream of words printed to stdout, execute:
0 commit comments