Fix GitDagBundle submodule clone not using SSH settings from connection by shunsuke-sugita · Pull Request #64879 · apache/airflow

shunsuke-sugita · 2026-04-08T03:44:45Z

What

GitDagBundle now wraps git submodule sync / submodule update with GIT_SSH_COMMAND from the Git connection (via GitPython custom_environment), using the worktree Repo (self.repo.git), not the bare mirror.

Why

configure_hook_env() populates hook.env, and bare clone passes it with clone_from(..., env=...). Submodule commands are separate Git subprocesses and did not receive GIT_SSH_COMMAND, so SSH URLs in .gitmodules could fail (host key / identity) even when the main repo clone worked.

Example failure (before this fix)

Submodule update could fail with errors such as:

2026-04-08T03:39:43.516370889Z Cloning into '{submodul_path}'...
2026-04-08T03:39:43.516372014Z Host key verification failed.
2026-04-08T03:39:43.516373389Z fatal: Could not read from remote repository.
2026-04-08T03:39:43.516374555Z
2026-04-08T03:39:43.516375430Z Please make sure you have the correct access rights
2026-04-08T03:39:43.516376389Z and the repository exists.
2026-04-08T03:39:43.516377430Z fatal: clone of '{submodule}' into submodule path 
'/tmp/airflow/dag_bundles/{submodule_path}s' failed
2026-04-08T03:39:43.516378680Z Failed to clone '{submodule}' a second time, aborting'
2026-04-08T03:39:43.518691472Z RuntimeError: Error pulling submodule from repository

Notes

Aligns with the existing pattern in _fetch_bare_repo for origin.fetch on the bare repo.
Unit tests still assert that submodule sync/update are invoked when submodules=True; no behavior change to other bundle modes.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

boring-cyborg · 2026-04-08T03:44:52Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates GitDagBundle submodule operations to inherit SSH configuration from the Git connection by running git submodule sync/update under GitPython’s custom_environment (using the worktree Repo).

Changes:

Wraps submodule sync/update calls with GIT_SSH_COMMAND (when present in hook.env).
Introduces a no-op context (nullcontext) when no SSH env override is needed.

Copilot · 2026-04-10T19:57:25Z

-        self._log.info("Initializing and updating submodules", repo_path=self.repo_path)
-        self.repo.git.submodule("sync", "--recursive")
-        self.repo.git.submodule("update", "--init", "--recursive", "--jobs", "1")
+        cm = nullcontext()


nullcontext must be in scope for this to run. If it isn’t already imported in this module, add from contextlib import nullcontext (or reference it as contextlib.nullcontext) to avoid a NameError at runtime.

nullcontext is already imported at the top of this module: from contextlib import nullcontext (line 21)

Copilot · 2026-04-10T19:57:25Z

+        cm = nullcontext()
+        if self.hook and (cmd := self.hook.env.get("GIT_SSH_COMMAND")):
+            cm = self.repo.git.custom_environment(GIT_SSH_COMMAND=cmd)
+        with cm:


The name cm is very generic for a value that controls an environment override. Consider renaming it to something more descriptive (e.g., env_cm / ssh_env_cm) to make the intent clearer when reading and debugging this method.

Variable name: cm is now ssh_env_cm in _fetch_bare_repo and _fetch_submodules to reflect that this context manager controls the SSH environment for git subprocesses. I left cm as-is for configure_hook_env() blocks since that’s broader hook setup.

fd68b54

Copilot · 2026-04-10T19:57:26Z

+        if self.hook and (cmd := self.hook.env.get("GIT_SSH_COMMAND")):
+            cm = self.repo.git.custom_environment(GIT_SSH_COMMAND=cmd)
+        with cm:
+            self._log.info("Initializing and updating submodules", repo_path=self.repo_path)
+            self.repo.git.submodule("sync", "--recursive")
+            self.repo.git.submodule("update", "--init", "--recursive", "--jobs", "1")


This change introduces new behavior (wrapping submodule sync/update with custom_environment when GIT_SSH_COMMAND is present). Add/extend a unit test to assert custom_environment(GIT_SSH_COMMAND=...) is invoked (and that submodule commands run within that context) when hook.env provides GIT_SSH_COMMAND.

added tests in providers/git/tests/unit/git/bundles/test_git.py that assert custom_environment(GIT_SSH_COMMAND=...) when hook.env includes it, that submodule sync/update run inside that context (context manager enter/exit and call order on repo.git), and that custom_environment is not used when GIT_SSH_COMMAND is missing.

7e67b4c

…m_environment

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

shunsuke-sugita · 2026-04-17T01:46:47Z

Hi @kaxil , sorry to bother you while you’re busy — just a gentle reminder on this PR.
I’ve addressed the previous feedback and Copilot checks are passing now.
Could you take another look when you have time? Thanks!

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-17T01:59:06Z

+        mock_repo = mock.MagicMock()
+        ssh_ctx = mock.MagicMock()
+        mock_repo.git.custom_environment.return_value = ssh_ctx


These new tests create MagicMock() instances without a spec/autospec (for both the repo and the context manager). Using a spec helps catch typos and mismatched GitPython APIs (e.g. git.custom_environment, git.submodule) and prevents silently passing tests when production code changes. Consider switching these mocks to create_autospec(...) or MagicMock(spec=...) targeting the minimal interface you use here.

This has been fixed on my side. See: 6346cb2

Copilot · 2026-04-17T01:59:06Z

+        mock_repo = mock.MagicMock()
+        bundle = GitDagBundle(


This test uses an unspec'd MagicMock() for bundle.repo. Adding a spec/autospec (even if only for the git attribute and its submodule method) will make the test better at detecting API misuse and reduce the chance of false positives.

This has been fixed on my side. See: 6346cb2

kaxil · 2026-04-17T02:12:10Z

+            [mock.call("sync", "--recursive"), mock.call("update", "--init", "--recursive", "--jobs", "1")]
+        )
+        assert mock_repo.git.mock_calls[0] == mock.call.custom_environment(GIT_SSH_COMMAND=expected_ssh_cmd)
+        assert mock_repo.git.mock_calls[1] == mock.call.submodule("sync", "--recursive")


This assertion will fail. With a with block, mock_repo.git.mock_calls is [call.custom_environment(...), call.custom_environment().__enter__(), call.submodule("sync", "--recursive"), call.submodule("update", ...), call.custom_environment().__exit__(None, None, None)], so index [1] is __enter__(), not the submodule call. I verified this empirically. Either drop these two index assertions (the assert_has_calls above already enforces submodule ordering) or use index [2].

Implemented the suggestions from #64879 (comment)

See: 2dcfd1d

kaxil · 2026-04-17T02:12:11Z

+        mock_repo.git.submodule.assert_has_calls(
+            [mock.call("sync", "--recursive"), mock.call("update", "--init", "--recursive", "--jobs", "1")]
+        )
+        assert mock_repo.git.mock_calls[0] == mock.call.custom_environment(GIT_SSH_COMMAND=expected_ssh_cmd)


This pair of mock_calls[0]/mock_calls[1] checks is redundant with assert_has_calls on the line above, which already enforces the order of submodule calls. Dropping both also fixes the broken [1] index. A cleaner check for "custom_environment is entered before submodule runs" is ssh_ctx.__enter__.assert_called_once() plus assert_has_calls, which you already have.

This has been fixed on my side. See: 2dcfd1d

kaxil · 2026-04-17T02:12:11Z

-        self.repo.git.submodule("update", "--init", "--recursive", "--jobs", "1")
+        ssh_env_cm = nullcontext()
+        if self.hook and (cmd := self.hook.env.get("GIT_SSH_COMMAND")):
+            ssh_env_cm = self.repo.git.custom_environment(GIT_SSH_COMMAND=cmd)


Only GIT_SSH_COMMAND is forwarded via custom_environment, but configure_hook_env also populates hook.env with SSH_ASKPASS/DISPLAY/SSH_ASKPASS_REQUIRE for passphrase-protected keys. These work today because _fetch_submodules is always called inside an active configure_hook_env() in _initialize/refresh, which sets those on os.environ. If a future refactor drops that outer context, passphrase-protected keys will silently break while the non-passphrase path keeps working. Would you consider forwarding the full dict (custom_environment(**self.hook.env)) or at minimum a comment documenting that this depends on the outer configure_hook_env context?

This has been fixed on my side. See: fe31907

shunsuke-sugita · 2026-04-20T01:41:19Z

@kaxil Thanks for the review. I’ve pushed updates addressing your feedback—please take another look when you have a moment.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-20T01:45:14Z

+    def test_fetch_submodules_skips_custom_environment_without_git_ssh_command(self, mock_githook_class):
+        """When hook.env is empty, submodule update does not use custom_environment."""
+        mock_hook = mock_githook_class.return_value
+        mock_hook.repo_url = "git@github.com:apache/airflow.git"
+        mock_hook.env = {}


The test name test_fetch_submodules_skips_custom_environment_without_git_ssh_command doesn’t match what the test actually verifies: it sets hook.env = {} and asserts custom_environment is not called. In production code, custom_environment is used whenever hook.env is a non-empty dict (even if it didn’t contain GIT_SSH_COMMAND). Rename this test to reflect the empty-env behavior, or change the fixture to use a non-empty env without GIT_SSH_COMMAND and assert the intended behavior.

This has been fixed on my side. See: 6d27243

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-20T01:57:12Z

+            mock.call.submodule("sync", "--recursive"),
+            mock.call.submodule("update", "--init", "--recursive", "--jobs", "1"),
+            mock.call.exit(None, None, None),
+        ]


Add a blank line between these two test methods to match the surrounding formatting (one blank line between methods inside the test class). As-is, the next @mock.patch decorator starts immediately after the previous assertion block.

Suggested change

]

]

This has been fixed on my side. See: 76fd8ca

Copilot · 2026-04-20T01:57:12Z

+        bundle._fetch_submodules()
+
+        mock_git.custom_environment.assert_called_once_with(GIT_SSH_COMMAND=expected_ssh_cmd)
+
+        parent = mock.Mock()
+        parent.attach_mock(mock_git.custom_environment, "custom_environment")
+        parent.attach_mock(ssh_ctx.__enter__, "enter")
+        parent.attach_mock(mock_git.submodule, "submodule")
+        parent.attach_mock(ssh_ctx.__exit__, "exit")
+
+        assert parent.mock_calls == [
+            mock.call.custom_environment(GIT_SSH_COMMAND=expected_ssh_cmd),
+            mock.call.enter(),
+            mock.call.submodule("sync", "--recursive"),
+            mock.call.submodule("update", "--init", "--recursive", "--jobs", "1"),
+            mock.call.exit(None, None, None),
+        ]


parent.attach_mock(...) is done after bundle._fetch_submodules() has already run, so parent.mock_calls will not include the earlier calls (calls are only propagated to the parent at call time, when the child is already attached). This makes the call-order assertion unreliable/likely to fail. Attach the mocks to parent before invoking _fetch_submodules() (or assert ordering using the individual mocks’ mock_calls/method_calls instead).

This has been fixed on my side. See: 51f14ca

… post-hoc attach_mock)

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Fix Git bundle submodule fetch not using SSH settings from connection

4987c26

boring-cyborg Bot added area:providers provider:git labels Apr 8, 2026

Merge branch 'main' into provider-git-fix-fetch-submodul

46f66a4

shunsuke-sugita marked this pull request as draft April 9, 2026 06:23

shunsuke-sugita marked this pull request as ready for review April 9, 2026 06:24

kaxil requested a review from Copilot April 10, 2026 19:55

Copilot AI reviewed Apr 10, 2026

View reviewed changes

shunsuke-sugita and others added 3 commits April 12, 2026 19:09

Rename cm to ssh_env_cm in _fetch_bare_repo and _fetch_submodules.

fd68b54

Test GitDagBundle._fetch_submodules applies GIT_SSH_COMMAND via custo…

7e67b4c

…m_environment

Merge branch 'main' into provider-git-fix-fetch-submodul

ffb5e5a

shunsuke-sugita requested a review from Copilot April 12, 2026 13:31

Copilot started reviewing on behalf of shunsuke-sugita April 12, 2026 13:32 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Merge branch 'main' into provider-git-fix-fetch-submodul

799e084

Merge branch 'main' into provider-git-fix-fetch-submodul

21098f9

kaxil requested review from Copilot, ephraimbuddy and jedcunningham April 17, 2026 01:55

Copilot started reviewing on behalf of kaxil April 17, 2026 01:56 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

kaxil reviewed Apr 17, 2026

View reviewed changes

shunsuke-sugita added 3 commits April 17, 2026 11:12

Use autospec for Git/Repo mocks in submodule fetch unit tests

6346cb2

Drop redundant mock_calls assertions in submodule fetch test

2dcfd1d

GitDagBundle: submodule custom_environment uses full hook.env

fe31907

Merge branch 'main' into provider-git-fix-fetch-submodul

62d9068

shunsuke-sugita requested a review from kaxil April 20, 2026 01:42

shunsuke-sugita requested a review from Copilot April 20, 2026 01:42

Copilot started reviewing on behalf of shunsuke-sugita April 20, 2026 01:42 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

shunsuke-sugita and others added 2 commits April 20, 2026 10:52

Rename submodule test for empty hook.env (custom_environment skip)

6d27243

Update providers/git/tests/unit/git/bundles/test_git.py

4f0d7fc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

shunsuke-sugita requested a review from Copilot April 20, 2026 01:53

Copilot started reviewing on behalf of shunsuke-sugita April 20, 2026 01:53 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

shunsuke-sugita added 3 commits April 20, 2026 11:07

rm airflow

9b6ca63

Add missing blank line between submodule fetch tests

76fd8ca

Fix submodule test call-order assertion (use mock_git.mock_calls, not…

51f14ca

… post-hoc attach_mock)

shunsuke-sugita requested a review from Copilot April 20, 2026 02:12

Copilot started reviewing on behalf of shunsuke-sugita April 20, 2026 02:13 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Conversation

shunsuke-sugita commented Apr 8, 2026

What

Why

Example failure (before this fix)

Notes

Was generative AI tooling used to co-author this PR?

Uh oh!

boring-cyborg Bot commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

shunsuke-sugita Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

shunsuke-sugita commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shunsuke-sugita Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shunsuke-sugita Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shunsuke-sugita commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

shunsuke-sugita Apr 12, 2026 •

edited

Loading

shunsuke-sugita Apr 20, 2026 •

edited

Loading

shunsuke-sugita Apr 20, 2026 •

edited

Loading