feat(dataproc): Add dataproc source and list/get clusters/jobs tools#2407
feat(dataproc): Add dataproc source and list/get clusters/jobs tools#2407duwenxin99 merged 10 commits intogoogleapis:mainfrom
Conversation
Summary of ChangesHello @dborowitz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the system's capabilities by integrating with Google Cloud Dataproc. It provides a new data source and a suite of tools that enable users to programmatically interact with Dataproc clusters and jobs, offering functionalities such as retrieving specific resource details and listing resources with filtering options. This enhancement mirrors existing functionalities for Serverless Spark, providing a consistent experience for managing big data processing environments. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new Dataproc source and associated tools for interacting with Dataproc clusters and jobs. While the implementation is generally solid, several critical security and stability issues have been identified. Specifically, the URL generation for Cloud Logging is vulnerable to filter injection if resource names contain double quotes, and the Invoke methods in the list tools use unsafe type assertions on user-supplied parameters, which can lead to application panics and Denial of Service (DoS). Additionally, there are areas for improvement regarding resource handling, data correctness in one of the tools, and inconsistencies in documentation and tests. Addressing these points will significantly enhance the robustness, clarity, and security of the new features.
|
Integration tests are failing because the Cloud Build SA is missing Dataproc Viewer/Editor IAM role. |
182297a to
d0c3c41
Compare
|
I made a mistake basing my original PR on a several-week-old local repo without syncing first. I cribbed from the Serverless implementation, but missed this important refactoring, guess I need to go rewrite this now... |
Done, as a separate commit. I would have tried harder to change the old commits in order if the tests weren't already passing, but they were, this is a pure refactoring. |
e57f17e to
12761ae
Compare
|
Hi @dborowitz, other than some refactor and nits, the PR LGTM. Let me know when the updates are made and I'll approve it. Thanks! |
Very closely analogous to the serverless-spark source and serverless-spark-list-batches tool; these are separate APIs for two closely related GCP products. There are of course minor differences in the APIs, for example Dataproc genearlly supports only regions, not general locations. One wrinkle is that a KI with the list jobs RPC causes it to be very slow in a project with many serverless batches (like the test project), unless filtering by cluster. This is mentioned in the param description so LLMs can provide it; in the tests, we always add it based on an env var. Unlike other env vars, the cluster name in the test project is arbitrary but not a secret.
Accounts for the new ToolboxError type in googleapis#2403
…s tools (#2407) ## Description Add a new source for Dataproc, which is closely related to Serverless Spark. Similar to get/list batches, we have get/list clusters and jobs, with minor API differences. ## PR Checklist > Thank you for opening a Pull Request! Before submitting your PR, there are a > few things you can do to make sure it goes smoothly: - [ ] Make sure you reviewed [CONTRIBUTING.md](https://github.com/googleapis/genai-toolbox/blob/main/CONTRIBUTING.md) - [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/genai-toolbox/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) - [ ] Make sure to add `!` if this involve a breaking change 🛠️ Part of #2405 cc05e57
…s tools (googleapis#2407) ## Description Add a new source for Dataproc, which is closely related to Serverless Spark. Similar to get/list batches, we have get/list clusters and jobs, with minor API differences. ## PR Checklist > Thank you for opening a Pull Request! Before submitting your PR, there are a > few things you can do to make sure it goes smoothly: - [ ] Make sure you reviewed [CONTRIBUTING.md](https://github.com/googleapis/genai-toolbox/blob/main/CONTRIBUTING.md) - [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/genai-toolbox/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) - [ ] Make sure to add `!` if this involve a breaking change 🛠️ Part of googleapis#2405 cc05e57
…oogleapis#2407) ## Description Add a new source for Dataproc, which is closely related to Serverless Spark. Similar to get/list batches, we have get/list clusters and jobs, with minor API differences. ## PR Checklist > Thank you for opening a Pull Request! Before submitting your PR, there are a > few things you can do to make sure it goes smoothly: - [ ] Make sure you reviewed [CONTRIBUTING.md](https://github.com/googleapis/genai-toolbox/blob/main/CONTRIBUTING.md) - [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/genai-toolbox/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) - [ ] Make sure to add `!` if this involve a breaking change 🛠️ Part of googleapis#2405
🤖 I have created a release *beep* *boop* --- ## [0.28.0](v0.27.0...v0.28.0) (2026-03-02) ### Features * Add polling system to dynamic reloading ([#2466](#2466)) ([fcaac9b](fcaac9b)) * Added basic template for sdks doc migrate ([#1961](#1961)) ([87f2eaf](87f2eaf)) * **dataproc:** Add dataproc source and list/get clusters/jobs tools ([#2407](#2407)) ([cc05e57](cc05e57)) * **sources/postgres:** Add configurable pgx query execution mode ([#2477](#2477)) ([57b77bc](57b77bc)) * **sources/redis:** Add TLS support for Redis connections ([#2432](#2432)) ([d6af290](d6af290)) * **tools/looker:** Enable Get All Lookml Tests, Run LookML Tests, and Create View From Table tools for Looker ([#2522](#2522)) ([e01139a](e01139a)) * **tools/looker:** Tools to list/create/delete directories ([#2488](#2488)) ([0036d8c](0036d8c)) * **ui:** Make tool list panel resizable ([#2253](#2253)) ([276cf60](276cf60)) ### Bug Fixes * **ci:** Add path for forked PR unit test runs ([#2540](#2540)) ([04dd2a7](04dd2a7)) * Deflake alloydb omni ([#2431](#2431)) ([62b8309](62b8309)) * **docs/adk:** Resolve dependency duplication ([#2418](#2418)) ([4d44abb](4d44abb)) * **docs/langchain:** Fix core at 0.3.0 and align compatible dependencies ([#2426](#2426)) ([36edfd3](36edfd3)) * Enforce required validation for explicit null parameter values ([#2519](#2519)) ([d5e9512](d5e9512)) * **oracle:** Enable DML operations and resolve incorrect array type error ([#2323](#2323)) ([72146a4](72146a4)) * **server/mcp:** Guard nil dereference in sseManager.get ([#2557](#2557)) ([e534196](e534196)), closes [#2548](#2548) * **tests/postgres:** Implement uuid-based isolation and reliable resource cleanup ([#2377](#2377)) ([8a96fb1](8a96fb1)) * **tests/postgres:** Restore list_schemas test and implement dynamic owner ([#2521](#2521)) ([7041e79](7041e79)) * **tests:** Resolve LlamaIndex dependency conflict in JS quickstart ([#2597](#2597)) ([ac11f5a](ac11f5a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [0.28.0](v0.27.0...v0.28.0) (2026-03-02) ### Features * Add polling system to dynamic reloading ([#2466](#2466)) ([fcaac9b](fcaac9b)) * Added basic template for sdks doc migrate ([#1961](#1961)) ([87f2eaf](87f2eaf)) * **dataproc:** Add dataproc source and list/get clusters/jobs tools ([#2407](#2407)) ([cc05e57](cc05e57)) * **sources/postgres:** Add configurable pgx query execution mode ([#2477](#2477)) ([57b77bc](57b77bc)) * **sources/redis:** Add TLS support for Redis connections ([#2432](#2432)) ([d6af290](d6af290)) * **tools/looker:** Enable Get All Lookml Tests, Run LookML Tests, and Create View From Table tools for Looker ([#2522](#2522)) ([e01139a](e01139a)) * **tools/looker:** Tools to list/create/delete directories ([#2488](#2488)) ([0036d8c](0036d8c)) * **ui:** Make tool list panel resizable ([#2253](#2253)) ([276cf60](276cf60)) ### Bug Fixes * **ci:** Add path for forked PR unit test runs ([#2540](#2540)) ([04dd2a7](04dd2a7)) * Deflake alloydb omni ([#2431](#2431)) ([62b8309](62b8309)) * **docs/adk:** Resolve dependency duplication ([#2418](#2418)) ([4d44abb](4d44abb)) * **docs/langchain:** Fix core at 0.3.0 and align compatible dependencies ([#2426](#2426)) ([36edfd3](36edfd3)) * Enforce required validation for explicit null parameter values ([#2519](#2519)) ([d5e9512](d5e9512)) * **oracle:** Enable DML operations and resolve incorrect array type error ([#2323](#2323)) ([72146a4](72146a4)) * **server/mcp:** Guard nil dereference in sseManager.get ([#2557](#2557)) ([e534196](e534196)), closes [#2548](#2548) * **tests/postgres:** Implement uuid-based isolation and reliable resource cleanup ([#2377](#2377)) ([8a96fb1](8a96fb1)) * **tests/postgres:** Restore list_schemas test and implement dynamic owner ([#2521](#2521)) ([7041e79](7041e79)) * **tests:** Resolve LlamaIndex dependency conflict in JS quickstart ([#2597](#2597)) ([ac11f5a](ac11f5a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com> 81253a0
🤖 I have created a release *beep* *boop* --- ## [0.28.0](googleapis/mcp-toolbox@v0.27.0...v0.28.0) (2026-03-02) ### Features * Add polling system to dynamic reloading ([googleapis#2466](googleapis#2466)) ([fcaac9b](googleapis@fcaac9b)) * Added basic template for sdks doc migrate ([googleapis#1961](googleapis#1961)) ([87f2eaf](googleapis@87f2eaf)) * **dataproc:** Add dataproc source and list/get clusters/jobs tools ([googleapis#2407](googleapis#2407)) ([cc05e57](googleapis@cc05e57)) * **sources/postgres:** Add configurable pgx query execution mode ([googleapis#2477](googleapis#2477)) ([57b77bc](googleapis@57b77bc)) * **sources/redis:** Add TLS support for Redis connections ([googleapis#2432](googleapis#2432)) ([d6af290](googleapis@d6af290)) * **tools/looker:** Enable Get All Lookml Tests, Run LookML Tests, and Create View From Table tools for Looker ([googleapis#2522](googleapis#2522)) ([e01139a](googleapis@e01139a)) * **tools/looker:** Tools to list/create/delete directories ([googleapis#2488](googleapis#2488)) ([0036d8c](googleapis@0036d8c)) * **ui:** Make tool list panel resizable ([googleapis#2253](googleapis#2253)) ([276cf60](googleapis@276cf60)) ### Bug Fixes * **ci:** Add path for forked PR unit test runs ([googleapis#2540](googleapis#2540)) ([04dd2a7](googleapis@04dd2a7)) * Deflake alloydb omni ([googleapis#2431](googleapis#2431)) ([62b8309](googleapis@62b8309)) * **docs/adk:** Resolve dependency duplication ([googleapis#2418](googleapis#2418)) ([4d44abb](googleapis@4d44abb)) * **docs/langchain:** Fix core at 0.3.0 and align compatible dependencies ([googleapis#2426](googleapis#2426)) ([36edfd3](googleapis@36edfd3)) * Enforce required validation for explicit null parameter values ([googleapis#2519](googleapis#2519)) ([d5e9512](googleapis@d5e9512)) * **oracle:** Enable DML operations and resolve incorrect array type error ([googleapis#2323](googleapis#2323)) ([72146a4](googleapis@72146a4)) * **server/mcp:** Guard nil dereference in sseManager.get ([googleapis#2557](googleapis#2557)) ([e534196](googleapis@e534196)), closes [googleapis#2548](googleapis#2548) * **tests/postgres:** Implement uuid-based isolation and reliable resource cleanup ([googleapis#2377](googleapis#2377)) ([8a96fb1](googleapis@8a96fb1)) * **tests/postgres:** Restore list_schemas test and implement dynamic owner ([googleapis#2521](googleapis#2521)) ([7041e79](googleapis@7041e79)) * **tests:** Resolve LlamaIndex dependency conflict in JS quickstart ([googleapis#2597](googleapis#2597)) ([ac11f5a](googleapis@ac11f5a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com> 81253a0
🤖 I have created a release *beep* *boop* --- ## [0.28.0](googleapis/mcp-toolbox@v0.27.0...v0.28.0) (2026-03-02) ### Features * Add polling system to dynamic reloading ([#2466](googleapis/mcp-toolbox#2466)) ([fcaac9b](googleapis/mcp-toolbox@fcaac9b)) * Added basic template for sdks doc migrate ([#1961](googleapis/mcp-toolbox#1961)) ([87f2eaf](googleapis/mcp-toolbox@87f2eaf)) * **dataproc:** Add dataproc source and list/get clusters/jobs tools ([#2407](googleapis/mcp-toolbox#2407)) ([cc05e57](googleapis/mcp-toolbox@cc05e57)) * **sources/postgres:** Add configurable pgx query execution mode ([#2477](googleapis/mcp-toolbox#2477)) ([57b77bc](googleapis/mcp-toolbox@57b77bc)) * **sources/redis:** Add TLS support for Redis connections ([#2432](googleapis/mcp-toolbox#2432)) ([d6af290](googleapis/mcp-toolbox@d6af290)) * **tools/looker:** Enable Get All Lookml Tests, Run LookML Tests, and Create View From Table tools for Looker ([#2522](googleapis/mcp-toolbox#2522)) ([e01139a](googleapis/mcp-toolbox@e01139a)) * **tools/looker:** Tools to list/create/delete directories ([#2488](googleapis/mcp-toolbox#2488)) ([0036d8c](googleapis/mcp-toolbox@0036d8c)) * **ui:** Make tool list panel resizable ([#2253](googleapis/mcp-toolbox#2253)) ([276cf60](googleapis/mcp-toolbox@276cf60)) ### Bug Fixes * **ci:** Add path for forked PR unit test runs ([#2540](googleapis/mcp-toolbox#2540)) ([04dd2a7](googleapis/mcp-toolbox@04dd2a7)) * Deflake alloydb omni ([#2431](googleapis/mcp-toolbox#2431)) ([62b8309](googleapis/mcp-toolbox@62b8309)) * **docs/adk:** Resolve dependency duplication ([#2418](googleapis/mcp-toolbox#2418)) ([4d44abb](googleapis/mcp-toolbox@4d44abb)) * **docs/langchain:** Fix core at 0.3.0 and align compatible dependencies ([#2426](googleapis/mcp-toolbox#2426)) ([36edfd3](googleapis/mcp-toolbox@36edfd3)) * Enforce required validation for explicit null parameter values ([#2519](googleapis/mcp-toolbox#2519)) ([d5e9512](googleapis/mcp-toolbox@d5e9512)) * **oracle:** Enable DML operations and resolve incorrect array type error ([#2323](googleapis/mcp-toolbox#2323)) ([72146a4](googleapis/mcp-toolbox@72146a4)) * **server/mcp:** Guard nil dereference in sseManager.get ([#2557](googleapis/mcp-toolbox#2557)) ([e534196](googleapis/mcp-toolbox@e534196)), closes [#2548](googleapis/mcp-toolbox#2548) * **tests/postgres:** Implement uuid-based isolation and reliable resource cleanup ([#2377](googleapis/mcp-toolbox#2377)) ([8a96fb1](googleapis/mcp-toolbox@8a96fb1)) * **tests/postgres:** Restore list_schemas test and implement dynamic owner ([#2521](googleapis/mcp-toolbox#2521)) ([7041e79](googleapis/mcp-toolbox@7041e79)) * **tests:** Resolve LlamaIndex dependency conflict in JS quickstart ([#2597](googleapis/mcp-toolbox#2597)) ([ac11f5a](googleapis/mcp-toolbox@ac11f5a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com>
…oogleapis#2407) ## Description Add a new source for Dataproc, which is closely related to Serverless Spark. Similar to get/list batches, we have get/list clusters and jobs, with minor API differences. ## PR Checklist > Thank you for opening a Pull Request! Before submitting your PR, there are a > few things you can do to make sure it goes smoothly: - [ ] Make sure you reviewed [CONTRIBUTING.md](https://github.com/googleapis/genai-toolbox/blob/main/CONTRIBUTING.md) - [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/genai-toolbox/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) - [ ] Make sure to add `!` if this involve a breaking change 🛠️ Part of googleapis#2405
🤖 I have created a release *beep* *boop* --- ## [0.28.0](googleapis/mcp-toolbox@v0.27.0...v0.28.0) (2026-03-02) ### Features * Add polling system to dynamic reloading ([googleapis#2466](googleapis#2466)) ([fcaac9b](googleapis@fcaac9b)) * Added basic template for sdks doc migrate ([googleapis#1961](googleapis#1961)) ([87f2eaf](googleapis@87f2eaf)) * **dataproc:** Add dataproc source and list/get clusters/jobs tools ([googleapis#2407](googleapis#2407)) ([cc05e57](googleapis@cc05e57)) * **sources/postgres:** Add configurable pgx query execution mode ([googleapis#2477](googleapis#2477)) ([57b77bc](googleapis@57b77bc)) * **sources/redis:** Add TLS support for Redis connections ([googleapis#2432](googleapis#2432)) ([d6af290](googleapis@d6af290)) * **tools/looker:** Enable Get All Lookml Tests, Run LookML Tests, and Create View From Table tools for Looker ([googleapis#2522](googleapis#2522)) ([e01139a](googleapis@e01139a)) * **tools/looker:** Tools to list/create/delete directories ([googleapis#2488](googleapis#2488)) ([0036d8c](googleapis@0036d8c)) * **ui:** Make tool list panel resizable ([googleapis#2253](googleapis#2253)) ([276cf60](googleapis@276cf60)) ### Bug Fixes * **ci:** Add path for forked PR unit test runs ([googleapis#2540](googleapis#2540)) ([04dd2a7](googleapis@04dd2a7)) * Deflake alloydb omni ([googleapis#2431](googleapis#2431)) ([62b8309](googleapis@62b8309)) * **docs/adk:** Resolve dependency duplication ([googleapis#2418](googleapis#2418)) ([4d44abb](googleapis@4d44abb)) * **docs/langchain:** Fix core at 0.3.0 and align compatible dependencies ([googleapis#2426](googleapis#2426)) ([36edfd3](googleapis@36edfd3)) * Enforce required validation for explicit null parameter values ([googleapis#2519](googleapis#2519)) ([d5e9512](googleapis@d5e9512)) * **oracle:** Enable DML operations and resolve incorrect array type error ([googleapis#2323](googleapis#2323)) ([72146a4](googleapis@72146a4)) * **server/mcp:** Guard nil dereference in sseManager.get ([googleapis#2557](googleapis#2557)) ([e534196](googleapis@e534196)), closes [googleapis#2548](googleapis#2548) * **tests/postgres:** Implement uuid-based isolation and reliable resource cleanup ([googleapis#2377](googleapis#2377)) ([8a96fb1](googleapis@8a96fb1)) * **tests/postgres:** Restore list_schemas test and implement dynamic owner ([googleapis#2521](googleapis#2521)) ([7041e79](googleapis@7041e79)) * **tests:** Resolve LlamaIndex dependency conflict in JS quickstart ([googleapis#2597](googleapis#2597)) ([ac11f5a](googleapis@ac11f5a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com>
Description
Add a new source for Dataproc, which is closely related to Serverless Spark. Similar to get/list batches, we have get/list clusters and jobs, with minor API differences.
PR Checklist
CONTRIBUTING.md
bug/issue
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
!if this involve a breaking change🛠️ Part of #2405