1+ # -*- coding: utf-8 -*-
2+ #
13# Copyright 2018 Google LLC
24#
35# Licensed under the Apache License, Version 2.0 (the "License");
1921class RecognitionConfig (object ):
2022 class AudioEncoding (enum .IntEnum ):
2123 """
22- Audio encoding of the data sent in the audio message. All encodings support
23- only 1 channel (mono) audio. Only ``FLAC`` and ``WAV`` include a header that
24- describes the bytes of audio that follow the header. The other encodings
25- are raw audio bytes with no header.
24+ The encoding of the audio data sent in the request.
25+
26+ All encodings support only 1 channel (mono) audio.
2627
2728 For best results, the audio source should be captured and transmitted using
28- a lossless encoding (``FLAC`` or ``LINEAR16``). Recognition accuracy may be
29- reduced if lossy codecs, which include the other codecs listed in
30- this section, are used to capture or transmit the audio, particularly if
31- background noise is present.
29+ a lossless encoding (``FLAC`` or ``LINEAR16``). The accuracy of the speech
30+ recognition can be reduced if lossy codecs are used to capture or transmit
31+ audio, particularly if background noise is present. Lossy codecs include
32+ ``MULAW``, ``AMR``, ``AMR_WB``, ``OGG_OPUS``, and ``SPEEX_WITH_HEADER_BYTE``.
33+
34+ The ``FLAC`` and ``WAV`` audio file formats include a header that describes the
35+ included audio content. You can request recognition for ``WAV`` files that
36+ contain either ``LINEAR16`` or ``MULAW`` encoded audio.
37+ If you send ``FLAC`` or ``WAV`` audio file format in
38+ your request, you do not need to specify an ``AudioEncoding``; the audio
39+ encoding format is determined from the file header. If you specify
40+ an ``AudioEncoding`` when you send send ``FLAC`` or ``WAV`` audio, the
41+ encoding configuration must match the encoding described in the audio
42+ header; otherwise the request returns an
43+ ``google.rpc.Code.INVALID_ARGUMENT`` error code.
3244
3345 Attributes:
34- ENCODING_UNSPECIFIED (int): Not specified. Will return result ``google.rpc.Code.INVALID_ARGUMENT``.
46+ ENCODING_UNSPECIFIED (int): Not specified.
3547 LINEAR16 (int): Uncompressed 16-bit signed little-endian samples (Linear PCM).
36- FLAC (int): ``` FLAC`` <https://xiph.org/flac/documentation.html>`_ (Free Lossless Audio
48+ FLAC (int): ``FLAC`` (Free Lossless Audio
3749 Codec) is the recommended encoding because it is
3850 lossless--therefore recognition is not compromised--and
3951 requires only about half the bandwidth of ``LINEAR16``. ``FLAC`` stream
@@ -44,7 +56,7 @@ class AudioEncoding(enum.IntEnum):
4456 AMR_WB (int): Adaptive Multi-Rate Wideband codec. ``sample_rate_hertz`` must be 16000.
4557 OGG_OPUS (int): Opus encoded audio frames in Ogg container
4658 (`OggOpus <https://wiki.xiph.org/OggOpus>`_).
47- ``sample_rate_hertz`` must be 16000.
59+ ``sample_rate_hertz`` must be one of 8000, 12000, 16000, 24000, or 48000 .
4860 SPEEX_WITH_HEADER_BYTE (int): Although the use of lossy encodings is not recommended, if a very low
4961 bitrate encoding is required, ``OGG_OPUS`` is highly preferred over
5062 Speex encoding. The `Speex <https://speex.org/>`_ encoding supported by
0 commit comments