Security: Command injection via shell=True in Kaldi command runner#6408
Conversation
The helper executes arbitrary command strings with `subprocess.Popen(..., shell=True)`. If any part of `command` is derived from user-controlled input (directly or indirectly through recipe/config values), an attacker can inject additional shell metacharacters and execute unintended commands. Affected files: data_dir_manipulation_lib.py Signed-off-by: tuanaiseo <221258316+tuanaiseo@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request updates RunKaldiCommand to use shlex.split and sets shell=False in subprocess.Popen to improve security, while also correcting a comparison operator for the return code. However, a significant regression was identified: the function is designed to execute shell pipelines, which will fail with shell=False. It is recommended to either revert to shell=True with proper quoting or refactor the logic to handle pipelines manually.
| if isinstance(command, str): | ||
| command = shlex.split(command) | ||
| p = subprocess.Popen(command, shell = False, |
There was a problem hiding this comment.
This change introduces a functional regression. The docstring for RunKaldiCommand (lines 5-6) explicitly states that it is designed to run command sequences connected by pipes, which requires a shell environment. By applying shlex.split and setting shell=False, shell-specific operators like |, >, and && will be treated as literal arguments rather than shell instructions, causing pipeline commands to fail. To address the security risk while maintaining functionality, consider keeping shell=True but requiring callers to use shlex.quote() on any user-provided variables before constructing the command string, or refactor the function to handle pipelines manually by chaining multiple subprocess.Popen instances.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6408 +/- ##
=======================================
Coverage 70.32% 70.32%
=======================================
Files 787 787
Lines 73675 73675
=======================================
Hits 51812 51812
Misses 21863 21863
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Problem
The helper executes arbitrary command strings with
subprocess.Popen(..., shell=True). If any part ofcommandis derived from user-controlled input (directly or indirectly through recipe/config values), an attacker can inject additional shell metacharacters and execute unintended commands.Severity:
highFile:
egs2/TEMPLATE/asr1/steps/data/data_dir_manipulation_lib.pySolution
Avoid
shell=Trueand pass commands as an argument list (subprocess.run([...], check=True)). If shell usage is unavoidable for pipelines, strictly validate/allowlist inputs and safely quote each interpolated token withshlex.quotebefore constructing the command.Changes
egs2/TEMPLATE/asr1/steps/data/data_dir_manipulation_lib.py(modified)Testing