Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814
Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814singalsu wants to merge 4 commits into
Conversation
|
Note: To run the MFCC compress topologies, need kernel patches thesofproject/linux#5647 and thesofproject/linux#5789. |
d5267b3 to
969d644
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the SOF MFCC component and related tooling/topology to support VAD + DTX behavior and to use MFCC as a compress PCM “encoder” that can emit discontinuous (DTX-suppressed) feature frames, including optional IPC4 control notifications for VAD state.
Changes:
- Add MFCC VAD/DTX support in firmware (new VAD implementation, frame header with VAD/energy fields, optional IPC4 notifications, and compress-output mode).
- Add/adjust topology2 definitions to expose MFCC feature capture for both normal PCM and compress PCM on SDW jack/DMIC, including new build targets.
- Update MFCC tuning/export and host-side decode/visualization/transcription tools (Matlab/Octave + Python scripts), plus new documentation.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf | Adds MFCC frame sizing define and VAD mixer control naming for jack feature capture. |
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature-compress.conf | New compress PCM MFCC feature-capture topology for jack (MFCC encoder type, blob selection, VAD control). |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature.conf | Adds MFCC frame sizing define and VAD mixer control naming for DMIC feature capture. |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature-compress.conf | New compress PCM MFCC feature-capture topology for DMIC (MFCC encoder type, blob selection, VAD control). |
| tools/topology/topology2/platform/intel/dmic1-mfcc.conf | Renames MFCC bytes control and adds VAD mixer control naming. |
| tools/topology/topology2/include/pipelines/cavs/host-gateway-src-mfcc-capture.conf | Adds MFCC_FRAME_BYTES-driven ibs/obs to support variable-sized (compress) MFCC frames. |
| tools/topology/topology2/include/components/mfcc/mel80.conf | Updates exported MFCC configuration blob. |
| tools/topology/topology2/include/components/mfcc/mel80_compress.conf | New exported MFCC configuration blob for compress output. |
| tools/topology/topology2/include/components/mfcc/mel80_compress_dtx.conf | New exported MFCC configuration blob for compress output + DTX. |
| tools/topology/topology2/include/components/mfcc/default.conf | Updates exported default MFCC configuration blob. |
| tools/topology/topology2/include/components/mfcc/ceps13_compress_dtx.conf | New exported MFCC configuration blob for cepstral output + compress + DTX. |
| tools/topology/topology2/include/components/mfcc.conf | Adds mixer control template to MFCC widget and allows type override (e.g., encoder). |
| tools/topology/topology2/include/common/common_definitions.conf | Adds default feature flags for SDW jack/DMIC compress MFCC capture. |
| tools/topology/topology2/include/bench/mfcc_controls_playback.conf | Enables an MFCC mixer switch control in bench playback controls. |
| tools/topology/topology2/include/bench/mfcc_controls_capture.conf | Enables an MFCC mixer switch control in bench capture controls. |
| tools/topology/topology2/development/tplg-targets.cmake | Renames MFCC topology targets and adds compress MFCC mel/ceps variants with frame sizing + blob selection. |
| tools/topology/topology2/cavs-sdw.conf | Adds feature-gated includes for new compress MFCC capture topologies. |
| src/include/user/mfcc.h | Extends MFCC config ABI with VAD/DTX/compress flags and timing parameters. |
| src/include/sof/audio/mfcc/mfcc_vad.h | New VAD API/state definitions for MFCC. |
| src/include/sof/audio/mfcc/mfcc_comp.h | Refactors MFCC component interfaces (source/sink API, frame header, VAD/DTX state, IPC4 helpers). |
| src/audio/mfcc/tune/sof_mel_to_text_live_dsp_vad.py | New live Whisper transcription script using DSP VAD embedded in PCM stream. |
| src/audio/mfcc/tune/sof_mel_to_text_live_compress.py | New live Whisper transcription script for compress PCM + DTX/discontinuous frames. |
| src/audio/mfcc/tune/sof_mel_spectrogram_compress.py | New live mel spectrogram viewer for compress PCM MFCC frames. |
| src/audio/mfcc/tune/sof_ceps_spectrogram_compress.py | New live cepstral viewer for compress PCM MFCC frames. |
| src/audio/mfcc/tune/setup_mfcc.m | Updates blob export for new config layout; adds compress + DTX blob exports. |
| src/audio/mfcc/tune/README.txt | Removed in favor of README.md. |
| src/audio/mfcc/tune/README.md | New markdown documentation for tuning, decoding, and live scripts. |
| src/audio/mfcc/tune/decode_mel.m | Updates decoder for new int32 + header format and DTX gap filling. |
| src/audio/mfcc/tune/decode_ceps.m | Updates decoder for new int32 + header format and DTX gap filling. |
| src/audio/mfcc/tune/decode_all.m | Updates batch decode to new decoder signatures and int32 outputs. |
| src/audio/mfcc/mfcc.c | Moves MFCC to source/sink API processing, hooks VAD notifications and compress/DTX behavior. |
| src/audio/mfcc/mfcc_vad.c | New VAD implementation (noise floor tracking + weighted energy + hangover). |
| src/audio/mfcc/mfcc_setup.c | Adds VAD init, DTX/compress state init, buffer free fixes, sample-rate limit check. |
| src/audio/mfcc/mfcc_ipc4.c | New IPC4 control notification plumbing for VAD state reporting. |
| src/audio/mfcc/mfcc_hifi4.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_hifi3.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_generic.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_common.c | Adds source/sink copy funcs, header/VAD handling, legacy vs compress output paths, and DTX suppression logic. |
| src/audio/mfcc/CMakeLists.txt | Registers new mfcc_vad.c and conditionally mfcc_ipc4.c in build. |
| src/audio/base_fw.c | Advertises BESPOKE codec capability for MFCC compress capture. |
35e56d5 to
71404ce
Compare
71404ce to
97a3c57
Compare
| cd->source_format = source_format; | ||
|
|
||
| err: | ||
| comp_set_state(dev, COMP_TRIGGER_RESET); | ||
| return ret; | ||
| if (cd->config->compress_output) | ||
| comp_info(dev, "compress PCM output mode enabled"); | ||
|
|
There was a problem hiding this comment.
Thanks — but the sink format is intentionally unconstrained here. mfcc_output_legacy() computes commit_bytes = sink_get_frame_bytes(sink) * frames, zero-fills that period, then copies the header + int32 payload byte-wise into the period, carrying any leftover via state->out_remain. The commit size always matches what the sink expects, so S16_LE / S24_4LE / S32_LE sinks all produce correctly-sized commits — no truncation or buffer overrun. The bench topologies that wire S16_LE on the MFCC sink rely on this behavior (the host decodes the bytes as an MFCC blob, not as PCM). Rejecting non-S32 sinks would break those bench flows. The only thing being conveyed through the sink is opaque bytes; there is no PCM-format contract on the MFCC output.
Switch from process_audio_stream to source/sink API. Add compress PCM output mode (variable-size frames, no zero padding) alongside legacy mode (full period with zero-fill). Unify all output to int32 Q9.23 regardless of source format. Remove out_data_ptr_32, mel_spectra int16 copy, mfcc_func typedef, and per-format output functions from mfcc_common/hifi3/hifi4. Add DTX for compress mode: suppress silence frames after configurable trailing count, with optional periodic keepalive. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Register SND_AUDIOCODEC_BESPOKE capture in codec info TLV when CONFIG_COMP_MFCC is enabled so the kernel detects compress capture support via IPC4_SOF_CODEC_INFO. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Update Octave decode scripts for int32 Q9.23 output and DTX gap filling. Add DTX blob generation to setup_mfcc.m. Add Python compress capture tools: sof_mel_spectrogram_compress.py, sof_ceps_spectrogram_compress.py, sof_mel_to_text_live_compress.py. Refactor sof_mel_to_text_live_dsp_vad.py to use shared compress capture code. Add README with usage examples. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add sdw-jack-audio-feature-compress.conf (PCM 53, pipeline 132) and sdw-dmic-audio-feature-compress.conf (PCM 54, pipeline 133) for compress MFCC capture with DTX blobs. Fix buffer sizes: set MFCC obs and host-copier ibs/obs to 344 bytes (24-byte header + 80 x int32). Add mel and ceps compress topology targets for MTL and ARL. Rename normal MFCC topologies to *-mfcc-mel-normal for clarity. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
97a3c57 to
b491cf1
Compare
This PR adds commits to previous VAD add PR #10782
A kernel PR for encoder type ALSA controls fix is needed to run this.