Skip to content

feat: Offline validate-config via local plugin spec schemas#22819

Closed
marianogappa wants to merge 9 commits into
mainfrom
feat/cli-offline-validate-config-schemas-dir
Closed

feat: Offline validate-config via local plugin spec schemas#22819
marianogappa wants to merge 9 commits into
mainfrom
feat/cli-offline-validate-config-schemas-dir

Conversation

@marianogappa
Copy link
Copy Markdown
Contributor

@marianogappa marianogappa commented May 12, 2026

Regarding https://app.usepylon.com/support/issues/views/all-issues?issueNumber=1315 from MongoDB

Summary

Adds two narrow CLI changes that together enable cloudquery validate-config to run fully offline in CI environments without managing plugin binaries.

New command: cloudquery plugin spec-schema

Exports a plugin's JSON spec schema to a local file. Run once on a machine with network/registry access; commit the output to source control.

# Stdout
cloudquery plugin spec-schema cloudquery/source/aws@v33.0.0

# Specific file
cloudquery plugin spec-schema cloudquery/source/aws@v33.0.0 -o aws.json

# Directory layout consumed by --schemas-dir
cloudquery plugin spec-schema cloudquery/source/aws@v33.0.0 -D ./schemas
New flag: validate-config --schemas-dir <dir>

For each source / destination, if <dir>/<spec-name>.json exists, validate against that JSON Schema directly. Plugin spawn and auth are skipped for any entry resolved from a file; if every entry resolves from a file, the entire managedplugin.NewClients and auth.GetAuthTokenIfNeeded path is bypassed. Entries without a matching file fall back to the existing flow.

cloudquery validate-config --schemas-dir ./schemas ./aws.yml
Why

Customers running validate-config in air-gapped CI cannot reach the CloudQuery registry to authenticate or download plugin binaries. The validation pipeline already operates on a JSON Schema string fully decoupled from plugin runtime — only the acquisition of that schema required a live plugin. This PR closes that gap.

Implementation notes

  • validatePluginSpec is split into getSpecSchemaFromPlugin + validateSpecAgainstSchema. Both new entry points (file-mode and the export command) reuse parseJSONSchema / validateSpecAgainstSchema unchanged.
  • Lookup rule: <schemas-dir>/<spec-name>.json, where the name is the spec's name:, not the plugin path.
  • --schemas-dir is opt-in; the no-flag path is byte-for-byte identical.
  • No changes to internal/auth, internal/specs, managedplugin consumers elsewhere, or the plugin SDK.

Test plan

  • Unit: TestPluginTypeFromKind, TestWriteSchemaOutput, TestLookupSchemaFile
  • Integration (offline): TestValidateConfigSchemasDir — exercises both good-spec and bad-spec paths against fixture JSON Schemas, with sources / destinations pointing at non-existent local binaries. Verifies plugin spawn is bypassed (no Initializing source/destination log lines).
  • Manual end-to-end against a real aws → clickhouse config from ~/Code/test-syncs. See transcript below.

Manual validation transcript

Real config used (excerpt — both source and destination use registry: cloudquery):

kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v32.6.0"
  tables: ["*"]
  destinations: ["clickhouse"]
  spec:
    use_paid_apis: true
---
kind: destination
spec:
  name: "clickhouse"
  path: "cloudquery/clickhouse"
  registry: "cloudquery"
  version: "v5.0.8"
1. Export schemas (one-time, machine with network access)
$ cloudquery plugin spec-schema cloudquery/source/aws@v32.6.0 -D ./schemas --log-console
INF CloudQuery CLI version=development
INF Plugin server listening address=...cq-...sock module=aws-source
INF started call grpc.method=GetSpecSchema grpc.service=cloudquery.plugin.v3.Plugin module=aws-source
INF finished call grpc.code=OK grpc.method=GetSpecSchema module=aws-source

$ cloudquery plugin spec-schema cloudquery/destination/clickhouse@v5.0.8 -D ./schemas --log-console
INF CloudQuery CLI version=development
INF Plugin server listening address=...cq-...sock module=clickhouse-destination
INF started call grpc.method=GetSpecSchema grpc.service=cloudquery.plugin.v3.Plugin module=clickhouse-destination
INF finished call grpc.code=OK grpc.method=GetSpecSchema module=clickhouse-destination

$ ls -la schemas/
aws.json          311 KB
clickhouse.json   3.7 KB
2. Validate offline with --schemas-dir (good config)

A fresh empty --cq-dir proves nothing was downloaded.

$ rm -rf .cq-offline
$ cloudquery validate-config aws_to_clickhouse.yaml --schemas-dir ./schemas --cq-dir ./.cq-offline --log-console
INF CloudQuery CLI version=development
INF Loading spec(s) args=["aws_to_clickhouse.yaml"]
INF Validating source against local schema schema=schemas/aws.json source="aws (cloudquery/aws@v32.6.0)"
INF validated successfully source="aws (cloudquery/aws@v32.6.0)"
INF Validating destination against local schema destination="clickhouse (cloudquery/clickhouse@v5.0.8)" schema=schemas/clickhouse.json
INF validated successfully destination="clickhouse (cloudquery/clickhouse@v5.0.8)"

$ ls .cq-offline 2>&1
ls: .cq-offline: No such file or directory   # no plugin spawn, no cache populated
3. Validate offline with --schemas-dir (bad config — type violation)

Same config but with use_paid_apis: "not-a-bool":

$ cloudquery validate-config aws_to_clickhouse_bad.yaml --schemas-dir ./schemas --cq-dir ./.cq-offline --log-console
INF CloudQuery CLI version=development
INF Loading spec(s) args=["aws_to_clickhouse_bad.yaml"]
INF Validating source against local schema schema=./schemas/aws.json source="aws (cloudquery/aws@v32.6.0)"
INF Validating destination against local schema destination="clickhouse (cloudquery/clickhouse@v5.0.8)" schema=./schemas/clickhouse.json
INF validated successfully destination="clickhouse (cloudquery/clickhouse@v5.0.8)"
Error: failed to validate source config aws (cloudquery/aws@v32.6.0): jsonschema validation failed
- at '/use_paid_apis': got string, want boolean

Note: log lines Initializing source / Initializing destination (emitted by the plugin-spawn code path) are absent in both offline runs, confirming the plugin SDK gRPC path is fully bypassed when a matching schema file exists.

🤖 Generated with Claude Code

Add `cloudquery plugin spec-schema <ref>` to export a plugin's JSON
spec schema, and a `--schemas-dir` flag to `validate-config` that uses
those local files instead of spawning the plugin. This lets users
validate configurations in air-gapped CI environments without
downloading plugin binaries or authenticating against the registry.

Validation logic (parseJSONSchema + Validate) is reused unchanged; the
existing in-line plugin flow is preserved for entries without a local
schema file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@marianogappa marianogappa requested a review from a team as a code owner May 12, 2026 08:24
@marianogappa marianogappa requested a review from stoovon May 12, 2026 08:24
@marianogappa marianogappa changed the title feat(cli): offline validate-config via local plugin spec schemas feat(main): Offline validate-config via local plugin spec schemas May 12, 2026
@marianogappa marianogappa changed the title feat(main): Offline validate-config via local plugin spec schemas feat: Offline validate-config via local plugin spec schemas May 12, 2026
- Apply gofmt
- Replace fmt.Errorf with errors.New for static error strings (revive)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@marianogappa marianogappa requested a review from erezrokah May 12, 2026 08:36
@disq
Copy link
Copy Markdown
Member

disq commented May 12, 2026

needs doc generation I think

…emas-dir

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread cli/docs/reference/cloudquery_validate-config.md Outdated
cobra/doc treats the first backtick-quoted token in a flag's usage
string as the value-type label, which caused the rendered reference
to show "--schemas-dir cloudquery plugin spec-schema" instead of
"--schemas-dir string". Switch to single quotes around the command
name to preserve the intended rendering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread cli/cmd/specs.go
…tSchema

The comment was inadvertently dropped during the helper extraction; it
remains relevant because gRPC Unimplemented results in a nil schema
proto, which the empty-string check still covers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Write to <dir>/<plugin-name>@<version>.json instead of
<plugin-name>.json so validation always pins to the schema matching
the plugin version in the config. validate-config --schemas-dir
now prefers the versioned filename, falling back to the unversioned
name when the spec has no version (e.g. registry: local) or only the
unversioned file is present.

Drop the --output flag — canonical naming under --schemas-dir is the
intended workflow; stdout is preserved when no flag is given for
ad-hoc inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the CloudQuery CLI to support fully offline validate-config runs by validating plugin specs against locally stored JSON Schemas, and adds a companion command to export those schemas from plugins ahead of time.

Changes:

  • Added cloudquery plugin spec-schema to export a plugin’s spec JSON Schema to stdout or to files (including a --schemas-dir layout).
  • Added validate-config --schemas-dir <dir> to validate sources/destinations offline when matching <plugin-name>.json schema files exist, bypassing auth and plugin spawn for those entries.
  • Refactored spec validation helpers to split schema acquisition (getSpecSchemaFromPlugin) from validation (validateSpecAgainstSchema), enabling reuse.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cli/docs/reference/cloudquery_validate-config.md Documents --schemas-dir and offline validation example.
cli/docs/reference/cloudquery_plugin.md Adds the new plugin spec-schema command to reference docs.
cli/docs/reference/cloudquery_plugin_spec-schema.md New reference page for exporting spec schemas.
cli/cmd/validate_config.go Implements schema-dir lookup, offline validation path, and conditional plugin spawning/auth.
cli/cmd/validate_config_test.go Adds integration coverage for offline validation + schema lookup unit test.
cli/cmd/testdata/validate-config-schemas-dir.yml Fixture config for offline validation success case.
cli/cmd/testdata/validate-config-schemas-dir-bad.yml Fixture config for offline validation failure case.
cli/cmd/testdata/schemas-dir/src.json Fixture JSON Schema for source spec.
cli/cmd/testdata/schemas-dir/dst.json Fixture JSON Schema for destination spec.
cli/cmd/specs.go Refactors plugin schema retrieval into getSpecSchemaFromPlugin and reuses validator.
cli/cmd/root.go Wires the new plugin spec-schema command into the root plugin command.
cli/cmd/plugin_spec_schema.go Implements schema export command and output routing (stdout/file/dir).
cli/cmd/plugin_spec_schema_test.go Adds unit tests for kind→plugin-type mapping and output writing.
Comments suppressed due to low confidence (1)

cli/cmd/plugin_spec_schema.go:159

  • writeSchemaOutput’s default case returns the error from fmt.Print. When --log-console is enabled, initLogging closes stdout, so this command can fail with a low-level “bad file descriptor” error when no -o/-D is provided. Consider either (a) emitting schema to stderr when stdout is unavailable, (b) detecting this case and returning a clearer error instructing users to use --output/--schemas-dir or disable --log-console, or (c) aligning with other commands by not failing on stdout write errors.
	}
	return os.WriteFile(filepath.Join(schemasDir, schemaFileName(pluginName, pluginVersion)), []byte(jsonSchema), 0o644)
}

// schemaFileName returns the canonical filename for a plugin's schema under --schemas-dir.
// Version is included whenever non-empty so consumers can pin validation to the right plugin version.
func schemaFileName(pluginName, pluginVersion string) string {
	if pluginVersion == "" {
		return pluginName + ".json"
	}
	return pluginName + "@" + pluginVersion + ".json"
}


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +169 to +178
log.Info().Str("source", source.VersionString()).Str("schema", sourceSchemaFiles[i]).Msg("Validating source against local schema")
schemaBytes, err := os.ReadFile(sourceSchemaFiles[i])
if err != nil {
initErrors = append(initErrors, fmt.Errorf("failed to read schema file for source %v: %w", source.VersionString(), err))
continue
}
if err := validateSpecAgainstSchema(string(schemaBytes), source.Spec); err != nil {
initErrors = append(initErrors, fmt.Errorf("failed to validate source config %v: %w", source.VersionString(), err))
} else {
log.Info().Str("source", source.VersionString()).Msg("validated successfully")
Comment thread cli/cmd/validate_config.go Outdated
Comment on lines +232 to +241
}
if version != "" {
p := filepath.Join(dir, name+"@"+version+".json")
if _, err := os.Stat(p); err == nil {
return p
}
}
p := filepath.Join(dir, name+".json")
if _, err := os.Stat(p); err == nil {
return p
marianogappa and others added 2 commits May 12, 2026 12:52
… failures

- Add validateSpecAgainstSchemaStrict for the --schemas-dir path: an
  unparseable or empty schema file now fails validation instead of
  being logged and treated as a pass. The lenient validateSpecAgainstSchema
  is retained for plugin-gRPC schemas where buggy plugins should not
  block sync.
- lookupSchemaFile now returns (string, error). Only os.ErrNotExist is
  swallowed; permission errors and other unexpected stat failures are
  surfaced so they cannot be masked by a fallback to the online path.
- Reject spec names containing path separators or equal to "..", to
  prevent a crafted config from escaping --schemas-dir via filepath.Join.

Addresses Copilot review comments on validate_config.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…only

Explicitly state in the command's long help that registry: local,
registry: grpc, and registry: docker plugins are not exportable —
they lack the stable (path, version) identity needed to anchor the
canonical <plugin-name>@<version>.json filename. The hub-ref input
format already constrains the command to the CloudQuery registry;
add an inline comment to keep this invariant if a --registry flag
is ever introduced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iles

TestDoc compares the generated docs/reference listing against a
hardcoded slice; the new cloudquery_plugin_spec-schema.md page now
needs to be in that list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants