Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions .github/workflows/version_scanner.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: Version Scan

on:
push:
branches:
- main
- feat/version-scanner-cicd-upgrades
schedule:
- cron: '0 7 * * 2' # Run weekly on Tuesdays at 7 AM UTC (mirrors main.yml)
workflow_dispatch:

permissions:
contents: read
issues: write

jobs:
scan:
name: Version Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.14'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pyyaml

- name: Run Version Scanner
run: |
# Use -o to output the raw CSV to a file, and --stdout to print the summary to the GitHub Actions UI
python scripts/version_scanner/version_scanner.py -d python -v 3.7 --stdout -o version_scanner_output.csv

- name: Upload CSV Results
if: always()
uses: actions/upload-artifact@v7
with:
name: version-scanner-results
path: version_scanner_output.csv

- name: Create or update issue on finding
if: failure()
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
TITLE="Version Scanner found deprecated dependencies"
RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

# Read the first 50 lines to prevent blowing up the issue body if it's massive
CSV_PREVIEW=$(head -n 50 scanner_output.csv)

BODY="The [Version Scanner]($RUN_URL) found deprecated dependencies in the repository.

**Matches Found:**
\`\`\`csv
$CSV_PREVIEW
\`\`\`
*(If there are more than 50 matches, see the workflow logs for the full list)*"

# Mirroring regenerate-all.yml: check if an issue already exists to prevent spam
EXISTING_ISSUE=$(gh issue list --state open --search "in:title \"$TITLE\"" --json number --jq '.[0].number')

if [ -z "$EXISTING_ISSUE" ]; then
echo "WOULD HAVE CREATED ISSUE:"
echo "gh issue create --title \"$TITLE\" --body \"$BODY\""
# gh issue create --title "$TITLE" --body "$BODY"
else
echo "Issue #$EXISTING_ISSUE already exists."
echo "WOULD HAVE ADDED COMMENT:"
echo "gh issue comment \"$EXISTING_ISSUE\" --body \"Another scanner run found deprecated dependencies: $RUN_URL\""
# gh issue comment "$EXISTING_ISSUE" --body "Another scanner run found deprecated dependencies: $RUN_URL"
fi
14 changes: 7 additions & 7 deletions scripts/version_scanner/regex_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,15 @@ rules:
- |
sys\.version_info\s*<\s*\(3,\s*{minor_plus_one}\)
- |
sys\.version_info\.minor\s*==\s*{minor}
sys\.version_info\.minor\s*==\s*{minor}(?!\d)
- |
sys\.version_info\.minor\s*>=\s*{minor}
sys\.version_info\.minor\s*>=\s*{minor}(?!\d)
- |
sys\.version_info\.minor\s*<=\s*{minor}
sys\.version_info\.minor\s*<=\s*{minor}(?!\d)
- |
sys\.version_info\.minor\s*>\s*{minor_minus_one}
sys\.version_info\.minor\s*>\s*{minor_minus_one}(?!\d)
- |
sys\.version_info\.minor\s*<\s*{minor_plus_one}
sys\.version_info\.minor\s*<\s*{minor_plus_one}(?!\d)

- name: python_env_short
description: Finds short python environment names often used in tox or nox.
Expand All @@ -87,7 +87,7 @@ rules:
- "Python3.7"
rules:
- |
python3\.{minor}
python3\.{minor}(?!\d)

- name: combined_version_string
description: Finds combined version strings often used in class or variable names.
Expand All @@ -97,6 +97,6 @@ rules:
- "Python37DeprecationWarning"
rules:
- |
Python{major}{minor}
Python{major}{minor}(?!\d)


58 changes: 56 additions & 2 deletions scripts/version_scanner/tests/unit/test_version_scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ def test_write_csv_report(tmp_path):
assert rows[0]["file_path"] == "./setup.py"
assert rows[0]["rule_name"] == "python_requires_check"
assert rows[0]["line_number"] == "1"
assert rows[0]["matched_string"] == "python_requires = '>=3.7'"
assert rows[0]["matched_string"] == '="python_requires = \'>=3.7\'"'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If we restrict the Excel-specific wrapping in version_scanner.py to only float-like strings, this test assertion can be reverted to its original clean state.

Suggested change
assert rows[0]["matched_string"] == '="python_requires = \'>=3.7\'"'
assert rows[0]["matched_string"] == "python_requires = '>=3.7'"

assert rows[0]["context_line"] == "python_requires = '>=3.7'"


Expand Down Expand Up @@ -376,7 +376,10 @@ def test_main_loads_ignore_from_script_dir(mock_scan, mock_load_ignore):

with mock.patch('sys.argv', test_args):
from version_scanner import main
main()
try:
main()
except SystemExit:
pass
Comment on lines +379 to +382

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using pytest.raises(SystemExit) is more idiomatic and cleaner than a try-except block in pytest tests.

        with pytest.raises(SystemExit):
            main()


mock_load_ignore.assert_called_once()
args, kwargs = mock_load_ignore.call_args
Expand All @@ -385,6 +388,13 @@ def test_main_loads_ignore_from_script_dir(mock_scan, mock_load_ignore):
assert "scripts/version_scanner" in path


try:
import googleapiclient
HAS_GOOGLE_API = True
except ImportError:
HAS_GOOGLE_API = False

@pytest.mark.skipif(not HAS_GOOGLE_API, reason="Requires googleapiclient")
@mock.patch('googleapiclient.discovery.build')
@mock.patch('google.auth.default')
def test_upload_to_drive(mock_auth, mock_build):
Expand Down Expand Up @@ -479,6 +489,50 @@ def test_regex_examples_from_config():
break
assert matched, f"Example '{example}' in group '{name}' did not match any pattern."

def test_main_exit_code_1():
"""Test that main() calls sys.exit(1) when matches are found."""
# We can mock scan_repository to return a dummy match
test_args = ['version_scanner.py', '-d', 'python', '-v', '3.7']
with mock.patch('sys.argv', test_args):
from version_scanner import main
with mock.patch('version_scanner.scan_repository', return_value=[{'file_path': 'test', 'line_number': 1, 'matched_string': '3.7', 'rule_name': 'test'}]):
with pytest.raises(SystemExit) as excinfo:
main()
assert excinfo.value.code == 1

def test_main_stdout(capsys):
"""Test that --stdout prints the CSV output to stdout."""
test_args = ['version_scanner.py', '-d', 'python', '-v', '3.7', '--stdout']
with mock.patch('sys.argv', test_args):
from version_scanner import main
with mock.patch('version_scanner.scan_repository', return_value=[{'file_path': 'test.py', 'line_number': 1, 'matched_string': '3.7', 'rule_name': 'test'}]):
with pytest.raises(SystemExit):
main()

captured = capsys.readouterr()
assert "=== CSV Output ===" in captured.out
assert "test.py," in captured.out
assert "test," in captured.out
assert '"=""3.7"""' in captured.out

def test_scan_file_truncation_bug(tmp_path):
"""Test that searching for 3.1 does NOT match 3.10 (truncation bug)."""
# Create a file with 3.10
test_file = tmp_path / "test_file.py"
test_file.write_text("python_requires = '>=3.10'\npython3.10\nPython310\n")

from version_scanner import ConfigManager, scan_file

# Init config for 3.1
config_manager = ConfigManager("regex_config.yaml", "python", "3.1")
rules = config_manager.load_config()
import re
compiled_rules = [{"name": r["name"], "pattern": re.compile(r["pattern"], re.IGNORECASE)} for r in rules]

# It should not match anything because all strings are 3.10, not 3.1
matches = scan_file(str(test_file), compiled_rules)
assert len(matches) == 0, f"Expected 0 matches for 3.1 in 3.10 content, but got {len(matches)}: {matches}"


def test_scan_repository_layout_agnostic(tmp_path):
# Create directories under different roots
Expand Down
26 changes: 26 additions & 0 deletions scripts/version_scanner/version_scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,11 @@ def format_match_for_csv(
context = formatted.get("context_line", "")
matched = formatted.get("matched_string", "")

# Force spreadsheet apps (Google Sheets, Excel) to treat the match as a string.
# Otherwise, they parse "3.10" as a number and drop the trailing zero, displaying "3.1".
if matched:
formatted["matched_string"] = f'="{matched}"'
Comment on lines +230 to +231

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Wrapping every matched string in ="..." is unnecessary for non-numeric strings (like code lines or warning names) and makes the CSV output harder to read in plain text. It can also cause formula errors in Excel if the matched string contains unescaped double quotes.

We should only apply this Excel-specific wrapping to pure numeric/version strings (like 3.10) that Excel would otherwise misinterpret as floats.

Suggested change
if matched:
formatted["matched_string"] = f'="{matched}"'
if matched and matched.strip().replace('.', '', 1).isdigit():
formatted["matched_string"] = f'="{matched}"'


if len(context) > 500:
match_start = context.find(matched)
if match_start != -1:
Expand Down Expand Up @@ -601,6 +606,12 @@ def main():
help="Upload results to a Google Sheet in Drive"
)

parser.add_argument(
"--stdout",
action="store_true",
help="Print the full CSV report to stdout instead of/in addition to writing to a file"
)

args = parser.parse_args()

# Resolve target packages if filtering is requested
Expand Down Expand Up @@ -670,5 +681,20 @@ def main():
if args.upload:
upload_to_drive(output_path, all_matches, github_repo=args.github_repo, branch=args.branch)

if args.stdout:
print("\n=== CSV Output ===")
import io
output = io.StringIO()
write_csv_report(output_path, all_matches, github_repo=args.github_repo, branch=args.branch) # this writes to the file, but we want stdout too
# Let's just read the file and print it
with open(output_path, 'r', encoding='utf-8') as f:
print(f.read(), end='')
Comment on lines +684 to +691

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import io and output = io.StringIO() are unused. Additionally, write_csv_report is already called earlier in main(), so calling it again here is redundant and performs duplicate file writes. We can simply read the existing file and print it.

Suggested change
if args.stdout:
print("\n=== CSV Output ===")
import io
output = io.StringIO()
write_csv_report(output_path, all_matches, github_repo=args.github_repo, branch=args.branch) # this writes to the file, but we want stdout too
# Let's just read the file and print it
with open(output_path, 'r', encoding='utf-8') as f:
print(f.read(), end='')
if args.stdout:
print("\n=== CSV Output ===")
with open(output_path, 'r', encoding='utf-8') as f:
print(f.read(), end='')
References
  1. Remove duplicate lines of code, especially duplicate assertions in tests, to keep the codebase clean and avoid redundancy.


# Distinct exit codes for CI/CD
if all_matches:
sys.exit(1)
else:
sys.exit(0)

if __name__ == "__main__":
main()
Loading