Skip to content

Commit 6a96691

Browse files
feat: Support MP3, TranscriptNormalization and SpeakerLabels in STT V1 API (googleapis#11967)
- [ ] Regenerate this pull request now. PiperOrigin-RevId: 578629599 Source-Link: googleapis/googleapis@08facab Source-Link: https://github.com/googleapis/googleapis-gen/commit/75903e0fe695900f684c72ca8b5b9e6bc160048a Copy-Tag: eyJwIjoicGFja2FnZXMvZ29vZ2xlLWNsb3VkLXNwZWVjaC8uT3dsQm90LnlhbWwiLCJoIjoiNzU5MDNlMGZlNjk1OTAwZjY4NGM3MmNhOGI1YjllNmJjMTYwMDQ4YSJ9 --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
1 parent c915e94 commit 6a96691

8 files changed

Lines changed: 114 additions & 20 deletions

File tree

packages/google-cloud-speech/CONTRIBUTING.rst

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -35,21 +35,21 @@ Using a Development Checkout
3535
You'll have to create a development environment using a Git checkout:
3636

3737
- While logged into your GitHub account, navigate to the
38-
``python-speech`` `repo`_ on GitHub.
38+
``google-cloud-python`` `repo`_ on GitHub.
3939

40-
- Fork and clone the ``python-speech`` repository to your GitHub account by
40+
- Fork and clone the ``google-cloud-python`` repository to your GitHub account by
4141
clicking the "Fork" button.
4242

43-
- Clone your fork of ``python-speech`` from your GitHub account to your local
43+
- Clone your fork of ``google-cloud-python`` from your GitHub account to your local
4444
computer, substituting your account username and specifying the destination
45-
as ``hack-on-python-speech``. E.g.::
45+
as ``hack-on-google-cloud-python``. E.g.::
4646

4747
$ cd ${HOME}
48-
$ git clone git@github.com:USERNAME/python-speech.git hack-on-python-speech
49-
$ cd hack-on-python-speech
50-
# Configure remotes such that you can pull changes from the googleapis/python-speech
48+
$ git clone git@github.com:USERNAME/google-cloud-python.git hack-on-google-cloud-python
49+
$ cd hack-on-google-cloud-python
50+
# Configure remotes such that you can pull changes from the googleapis/google-cloud-python
5151
# repository into your local repository.
52-
$ git remote add upstream git@github.com:googleapis/python-speech.git
52+
$ git remote add upstream git@github.com:googleapis/google-cloud-python.git
5353
# fetch and merge changes from upstream into main
5454
$ git fetch upstream
5555
$ git merge upstream/main
@@ -60,7 +60,7 @@ repo, from which you can submit a pull request.
6060
To work on the codebase and run the tests, we recommend using ``nox``,
6161
but you can also use a ``virtualenv`` of your own creation.
6262

63-
.. _repo: https://github.com/googleapis/python-speech
63+
.. _repo: https://github.com/googleapis/google-cloud-python
6464

6565
Using ``nox``
6666
=============
@@ -113,7 +113,7 @@ Coding Style
113113
export GOOGLE_CLOUD_TESTING_BRANCH="main"
114114

115115
By doing this, you are specifying the location of the most up-to-date
116-
version of ``python-speech``. The
116+
version of ``google-cloud-python``. The
117117
remote name ``upstream`` should point to the official ``googleapis``
118118
checkout and the branch should be the default branch on that remote (``main``).
119119

@@ -209,7 +209,7 @@ The `description on PyPI`_ for the project comes directly from the
209209
``README``. Due to the reStructuredText (``rst``) parser used by
210210
PyPI, relative links which will work on GitHub (e.g. ``CONTRIBUTING.rst``
211211
instead of
212-
``https://github.com/googleapis/python-speech/blob/main/CONTRIBUTING.rst``)
212+
``https://github.com/googleapis/google-cloud-python/blob/main/CONTRIBUTING.rst``)
213213
may cause problems creating links or rendering the description.
214214

215215
.. _description on PyPI: https://pypi.org/project/google-cloud-speech
@@ -236,7 +236,7 @@ We support:
236236

237237
Supported versions can be found in our ``noxfile.py`` `config`_.
238238

239-
.. _config: https://github.com/googleapis/python-speech/blob/main/packages/google-cloud-speech/noxfile.py
239+
.. _config: https://github.com/googleapis/google-cloud-python/blob/main/packages/google-cloud-speech/noxfile.py
240240

241241

242242
**********

packages/google-cloud-speech/docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@
156156
html_theme_options = {
157157
"description": "Google Cloud Client Libraries for google-cloud-speech",
158158
"github_user": "googleapis",
159-
"github_repo": "python-speech",
159+
"github_repo": "google-cloud-python",
160160
"github_banner": True,
161161
"font_family": "'Roboto', Georgia, sans",
162162
"head_font_family": "'Roboto', Georgia, serif",

packages/google-cloud-speech/google/cloud/speech/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
CustomClass,
6464
PhraseSet,
6565
SpeechAdaptation,
66+
TranscriptNormalization,
6667
)
6768

6869
__all__ = (
@@ -104,4 +105,5 @@
104105
"CustomClass",
105106
"PhraseSet",
106107
"SpeechAdaptation",
108+
"TranscriptNormalization",
107109
)

packages/google-cloud-speech/google/cloud/speech_v1/__init__.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,12 @@
5555
UpdateCustomClassRequest,
5656
UpdatePhraseSetRequest,
5757
)
58-
from .types.resource import CustomClass, PhraseSet, SpeechAdaptation
58+
from .types.resource import (
59+
CustomClass,
60+
PhraseSet,
61+
SpeechAdaptation,
62+
TranscriptNormalization,
63+
)
5964

6065
from google.cloud.speech_v1.helpers import SpeechHelpers
6166

@@ -99,6 +104,7 @@ class SpeechClient(SpeechHelpers, SpeechClient):
99104
"StreamingRecognitionResult",
100105
"StreamingRecognizeRequest",
101106
"StreamingRecognizeResponse",
107+
"TranscriptNormalization",
102108
"TranscriptOutputConfig",
103109
"UpdateCustomClassRequest",
104110
"UpdatePhraseSetRequest",

packages/google-cloud-speech/google/cloud/speech_v1/types/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
UpdateCustomClassRequest,
4949
UpdatePhraseSetRequest,
5050
)
51-
from .resource import CustomClass, PhraseSet, SpeechAdaptation
51+
from .resource import CustomClass, PhraseSet, SpeechAdaptation, TranscriptNormalization
5252

5353
__all__ = (
5454
"LongRunningRecognizeMetadata",
@@ -85,4 +85,5 @@
8585
"CustomClass",
8686
"PhraseSet",
8787
"SpeechAdaptation",
88+
"TranscriptNormalization",
8889
)

packages/google-cloud-speech/google/cloud/speech_v1/types/cloud_speech.py

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,13 @@ class RecognitionConfig(proto.Message):
359359
adaptation <https://cloud.google.com/speech-to-text/docs/adaptation>`__
360360
documentation. When speech adaptation is set it supersedes
361361
the ``speech_contexts`` field.
362+
transcript_normalization (google.cloud.speech_v1.types.TranscriptNormalization):
363+
Optional. Use transcription normalization to
364+
automatically replace parts of the transcript
365+
with phrases of your choosing. For
366+
StreamingRecognize, this normalization only
367+
applies to stable partial transcripts (stability
368+
> 0.8) and final transcripts.
362369
speech_contexts (MutableSequence[google.cloud.speech_v1.types.SpeechContext]):
363370
Array of
364371
[SpeechContext][google.cloud.speech.v1.SpeechContext]. A
@@ -551,6 +558,12 @@ class AudioEncoding(proto.Enum):
551558
5574. In other words, each RTP header is replaced with a
552559
single byte containing the block length. Only Speex wideband
553560
is supported. ``sample_rate_hertz`` must be 16000.
561+
MP3 (8):
562+
MP3 audio. MP3 encoding is a Beta feature and only available
563+
in v1p1beta1. Support all standard MP3 bitrates (which range
564+
from 32-320 kbps). When using this encoding,
565+
``sample_rate_hertz`` has to match the sample rate of the
566+
file being used.
554567
WEBM_OPUS (9):
555568
Opus encoded audio frames in WebM container
556569
(`OggOpus <https://wiki.xiph.org/OggOpus>`__).
@@ -565,6 +578,7 @@ class AudioEncoding(proto.Enum):
565578
AMR_WB = 5
566579
OGG_OPUS = 6
567580
SPEEX_WITH_HEADER_BYTE = 7
581+
MP3 = 8
568582
WEBM_OPUS = 9
569583

570584
encoding: AudioEncoding = proto.Field(
@@ -605,6 +619,11 @@ class AudioEncoding(proto.Enum):
605619
number=20,
606620
message=resource.SpeechAdaptation,
607621
)
622+
transcript_normalization: resource.TranscriptNormalization = proto.Field(
623+
proto.MESSAGE,
624+
number=24,
625+
message=resource.TranscriptNormalization,
626+
)
608627
speech_contexts: MutableSequence["SpeechContext"] = proto.RepeatedField(
609628
proto.MESSAGE,
610629
number=6,
@@ -659,7 +678,7 @@ class SpeakerDiarizationConfig(proto.Message):
659678
enable_speaker_diarization (bool):
660679
If 'true', enables speaker detection for each recognized
661680
word in the top alternative of the recognition result using
662-
a speaker_tag provided in the WordInfo.
681+
a speaker_label provided in the WordInfo.
663682
min_speaker_count (int):
664683
Minimum number of speakers in the
665684
conversation. This range gives you more
@@ -1469,8 +1488,17 @@ class WordInfo(proto.Message):
14691488
speaker within the audio. This field specifies which one of
14701489
those speakers was detected to have spoken this word. Value
14711490
ranges from '1' to diarization_speaker_count. speaker_tag is
1472-
set if enable_speaker_diarization = 'true' and only in the
1473-
top alternative.
1491+
set if enable_speaker_diarization = 'true' and only for the
1492+
top alternative. Note: Use speaker_label instead.
1493+
speaker_label (str):
1494+
Output only. A label value assigned for every unique speaker
1495+
within the audio. This field specifies which speaker was
1496+
detected to have spoken this word. For some models, like
1497+
medical_conversation this can be actual speaker role, for
1498+
example "patient" or "provider", but generally this would be
1499+
a number identifying a speaker. This field is only set if
1500+
enable_speaker_diarization = 'true' and only for the top
1501+
alternative.
14741502
"""
14751503

14761504
start_time: duration_pb2.Duration = proto.Field(
@@ -1495,6 +1523,10 @@ class WordInfo(proto.Message):
14951523
proto.INT32,
14961524
number=5,
14971525
)
1526+
speaker_label: str = proto.Field(
1527+
proto.STRING,
1528+
number=6,
1529+
)
14981530

14991531

15001532
class SpeechAdaptationInfo(proto.Message):

packages/google-cloud-speech/google/cloud/speech_v1/types/resource.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
"CustomClass",
2626
"PhraseSet",
2727
"SpeechAdaptation",
28+
"TranscriptNormalization",
2829
},
2930
)
3031

@@ -228,4 +229,54 @@ class ABNFGrammar(proto.Message):
228229
)
229230

230231

232+
class TranscriptNormalization(proto.Message):
233+
r"""Transcription normalization configuration. Use transcription
234+
normalization to automatically replace parts of the transcript
235+
with phrases of your choosing. For StreamingRecognize, this
236+
normalization only applies to stable partial transcripts
237+
(stability > 0.8) and final transcripts.
238+
239+
Attributes:
240+
entries (MutableSequence[google.cloud.speech_v1.types.TranscriptNormalization.Entry]):
241+
A list of replacement entries. We will perform replacement
242+
with one entry at a time. For example, the second entry in
243+
["cat" => "dog", "mountain cat" => "mountain dog"] will
244+
never be applied because we will always process the first
245+
entry before it. At most 100 entries.
246+
"""
247+
248+
class Entry(proto.Message):
249+
r"""A single replacement configuration.
250+
251+
Attributes:
252+
search (str):
253+
What to replace. Max length is 100
254+
characters.
255+
replace (str):
256+
What to replace with. Max length is 100
257+
characters.
258+
case_sensitive (bool):
259+
Whether the search is case sensitive.
260+
"""
261+
262+
search: str = proto.Field(
263+
proto.STRING,
264+
number=1,
265+
)
266+
replace: str = proto.Field(
267+
proto.STRING,
268+
number=2,
269+
)
270+
case_sensitive: bool = proto.Field(
271+
proto.BOOL,
272+
number=3,
273+
)
274+
275+
entries: MutableSequence[Entry] = proto.RepeatedField(
276+
proto.MESSAGE,
277+
number=1,
278+
message=Entry,
279+
)
280+
281+
231282
__all__ = tuple(sorted(__protobuf__.manifest))

scripts/client-post-processing/integrate-isolated-handwritten-code.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,12 @@ replacements:
102102
packages/google-cloud-speech/google/cloud/speech_v1/__init__.py,
103103
]
104104
before: |
105-
from .types.resource import CustomClass, PhraseSet, SpeechAdaptation\n
105+
\)
106+
106107
__all__ = \(
107108
after: |
108-
from .types.resource import CustomClass, PhraseSet, SpeechAdaptation\n
109+
)
110+
109111
from google.cloud.speech_v1.helpers import SpeechHelpers\n\n
110112
class SpeechClient(SpeechHelpers, SpeechClient):
111113
__doc__ = SpeechClient.__doc__\n\n

0 commit comments

Comments
 (0)