Docs versioning: update CI to warn for outdated metadata#3279
Conversation
Normalize CLI target specs (handle Windows/backslashes, ./, absolute paths) and classify them as file/dir/glob. Implement matching logic and validation (report matched files and unmatched selectors, return rc=2 on unmatched), and apply target specs in scan/update. Add helpers (normalize_target_spec, compile_target_specs, validate_requested_targets, print_target_match_summary, iter_scan_candidate_paths) and adjust main argument help. Update tests to cover directory, glob, Windows-style paths, and CLI reporting.
Add a fail_if_metadata_sync_needed policy flag and utilities to detect/collect files whose embedded metadata last_content_updated differs from the computed git content date. New helpers: record_needs_metadata_sync() (skips non-md/ipynb and meta.ignore), collect_metadata_sync_targets() (returns sorted unique paths), and build_metadata_sync_command() (emits a ready-to-run python command to update targets using --set-content-date-from-git and --ack-meta-commit-marker). The enforce() flow now emits violations for out-of-sync md/ipynb files when the flag is enabled.
Introduce actionable metadata-sync guidance for maintainers: rename SCHEMA_VERSION to REPORT_SCHEMA_VERSION, restrict ToolConfig.version to 1, and add metadata_sync_targets and metadata_sync_command to the Report model. Add build_git_add_command helper and mark records requiring metadata sync (metadata_sync_needed) so summaries include that count. Extend markdown output and CLI to show which files need metadata updates and to print suggested commands (sync command, git add and commit) for applying fixes locally.
Update the GitHub Actions workflow to only run docs and notebook staleness checks for changed .md/.markdown/.ipynb files. Adds a step to collect changed docs into tmp/docs_nb_checks/changed_docs.txt and expose the count via outputs; conditionally runs the report and optional policy check only when there are changed files, passing the files as --targets to the checker. Adds a no-op step when no docs changed and uploads the changed_docs.txt alongside other artifacts. Also renames the job to reflect the new behavior to reduce unnecessary scanning and noise.
There was a problem hiding this comment.
Pull request overview
This PR updates the docs/notebooks staleness tooling and CI workflow to reduce unnecessary scanning and to add reporting/enforcement around syncing embedded deeplabcut.last_content_updated metadata with git-derived content dates.
Changes:
- Update the GitHub Actions workflow to detect and scan only changed
.md/.ipynb(and.markdown) files, skipping the job work when none changed. - Extend the staleness report to identify files needing metadata sync and provide suggested local commands to update and commit.
- Add a policy option to optionally fail
checkwhen metadata sync is needed, and bump the report schema version.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
tools/docs_and_notebooks_check.py |
Adds metadata-sync detection/reporting, optional enforcement, and updates report schema/version typing. |
.github/workflows/docs_and_notebooks_checks.yml |
Optimizes CI by computing changed doc targets and running the tool only on those files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Treat target specs that fail normalization as kind "invalid" so they are not silently ignored. compile_target_specs now appends invalid specs, target_spec_matches_path returns False for invalid kinds, and validate_requested_targets records invalid raw selectors as unmatched (and safely iterates when specs may be None). This ensures malformed or non-normalizable CLI selectors are reported back to the user rather than dropped.
Introduce shared pytest fixtures (repo, cfg) and import Callable to reduce repetition of tmp_path/git init logic across tests. Refactor many tests to use the new fixtures and ToolConfig factory, and add/adjust tests covering target validation and scanning edge cases (several validate_requested_targets variations, scan_files with invalid only-targets, and main returning 2 for invalid selector). Overall this centralizes repo/config setup and adds coverage for target handling behavior.
as it is unused
tools/docs_and_notebooks_check.py: Return False from record_needs_metadata_sync when computed timestamp is None to avoid triggering sync for files with no computed last_content_updated. Also import shlex and quote paths in build_metadata_sync_command so generated shell commands are safe for paths with spaces/special chars.
Expand and clarify the --targets argument help strings in tools/docs_and_notebooks_check.py for the update and normalize subcommands. The updated messages document that --targets accepts exact files, directories, and glob patterns (with examples) and note that both '/' and '\\' path separators are accepted. This is a documentation-only change to improve user guidance; no functional behavior is altered.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…bCut/DeepLabCut into cy/docs-versioning-tweaks
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…m/DeepLabCut/DeepLabCut into cy/docs-versioning-tweaks" This reverts commit 46fdb04, reversing changes made to 6d3b1be.
Introduce TypedDict-based types for FileKind and TargetSpec (with TargetKind) and tighten type annotations across functions (compile_target_specs, target_spec_matches_path, target_matches). Replace PurePosixPath-based glob matching with fnmatch.fnmatchcase only to ensure consistent, shell-style pattern behavior across platforms and remove unused imports. Minor cleanups: import TypedDict and update variable type hints for better static checking and readability.
Add parameterized tests covering both markdown and notebook files to verify metadata sync behavior. test_metadata_sync_warning_populates_report_targets_and_command creates a file with an out-of-date embedded last_content_updated, commits a newer git date, and asserts the record gets a metadata_sync_needed warning, that metadata_sync_targets and metadata_sync_command are populated (and include the target and --set-content-date-from-git flag), and that the rendered markdown includes guidance. test_enforce_fails_when_metadata_sync_needed_is_configured asserts that when fail_if_metadata_sync_needed=True an out-of-sync file is reported as a policy violation. These tests ensure reporting and enforcement handle embedded vs git-derived content dates correctly.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Will fix Copilot feedback first, no need to review for now |
Do not mark files as needing metadata sync when metadata could not be read/parsed/validated reliably. Add blocking error prefixes (metadata_read_failed:, markdown_frontmatter_invalid:, nbformat_invalid:) to record_needs_metadata_sync so such scan/repair issues are treated separately. Also clarify the enforcement message to mention embedded last_content_updated and the git content update date for more precise guidance.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
deruyter92
left a comment
There was a problem hiding this comment.
LGTM (few minor comments)!
Co-authored-by: Jaap de Ruyter van Steveninck <32810691+deruyter92@users.noreply.github.com>
This pull request enhances the documentation and notebook staleness check workflow by updating the CI process to now only scan documentation files that have changed, and the staleness report provides maintainers with guidance on syncing embedded metadata by providing commands examples.
Additionally, the policy configuration and enforcement logic have been extended to support stricter checks for metadata synchronization.
TODO
Automated summary
Key improvements include:
CI Workflow Efficiency:
.github/workflows/docs_and_notebooks_checks.yml) now detects and scans only changed.mdand.ipynbfiles, skipping the scan if no relevant files were modified. This reduces unnecessary computation and speeds up CI runs.Metadata Synchronization Guidance:
tools/docs_and_notebooks_check.py) now identifies files where the embeddeddeeplabcut.last_content_updatedmetadata is missing or out of sync with the git content date. The generated report includes a list of such files and provides maintainers with suggested commands to synchronize metadata and commit changes.Policy Configuration and Enforcement:
fail_if_metadata_sync_neededoption to the policy config, allowing the CI to fail if any documentation files require metadata synchronization. The enforcement logic was updated to respect this setting.Reporting and Summarization:
Schema and Type Updates:
These changes streamline the documentation maintenance workflow, improve feedback to contributors, and help keep embedded metadata accurate and up to date.