Use Allure agent-mode to design, review, validate, debug, and enrich tests in this repository.
- This repository is a Gradle multi-project build. Use
./gradlewfor repo-local test commands. - Prefer the narrowest relevant scope first, usually a module task such as
:allure-jupiter:testor a single test via--tests. - CI's broad verification entry point is
./gradlew --no-build-cache cleanTest test. - Many modules already emit framework results to
<module>/build/allure-results; agent mode adds a separate per-run review artifact layer and does not replace those module outputs. - If
allure runis unavailable in the local agent environment, fix that first before treating console-only runs as authoritative.
Runtime first, source second.
- If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through
allure run. It preserves the original console logs and adds agent-mode artifacts when you need them. - If the agent-mode output is missing or incomplete, debug that first and treat console-only conclusions as provisional.
- Use
allure runfor smoke checks too, even when the change is small or mechanical. - Only skip agent mode when it is impossible or when you are debugging agent mode itself.
- Identify the exact review scope.
- Create a fresh expectations file for this run in a temp directory.
- Run only that scope with
allure run. - Read
index.md,manifest/run.json,manifest/tests.jsonl, andmanifest/findings.jsonl. - Read per-test markdown only for tests that failed, drifted, or have findings.
- Only after runtime review, inspect source code for root cause or coverage gaps.
- If evidence is weak or partial, enrich the tests and rerun.
- Understand the feature or issue.
- Create a fresh expectations file for this run in a temp directory.
- Write or update the tests.
- Run the target Gradle scope with
allure run. - Review
index.md, manifests, and per-test markdown. - Enrich tests when evidence is weak.
- Rerun until scope and evidence are acceptable.
Use this when the run is functionally correct but too weak to review:
- Identify missing or low-signal findings.
- Add real steps, attachments, or minimal metadata.
- Rerun the same intended scope.
- Reject noop-style or placeholder evidence.
- Create a fresh expectations file and temp output directory for the touched scope.
- Run the touched scope with
allure run, even if the goal is only a smoke check after a mechanical change such as typing cleanup, mock refactors, or helper extraction. - Review
index.md,manifest/run.json,manifest/tests.jsonl, andmanifest/findings.jsonl. - Only then make a final statement about regression safety or test correctness.
- Split command, package, or module audits into scoped groups.
- Give each group its own expectations file and temp output directory.
- Run each group with
allure run. - Review runtime artifacts first, then inspect source code only after the run explains what actually executed.
- Mark the review incomplete until each scoped group either matched expectations or was explicitly documented as a broad package-health audit.
ALLURE_AGENT_OUTPUTmust use a unique temp directory per run.ALLURE_AGENT_EXPECTATIONSmust use a unique temp file per run.- Do not reuse those paths across parallel runs.
- Keep agent-mode artifacts in temp locations, not in committed repo paths or module
build/allure-resultsdirectories.
YAML is preferred for expectations in v1.
Review-oriented expectations example:
goal: Review a module-scoped Gradle test run
task_id: module-review
notes:
- Start with the smallest relevant Gradle test scope.
- Review runtime evidence before source inspection.Targeted module-run pattern:
TMP_DIR="$(mktemp -d)"
EXPECTATIONS="$TMP_DIR/expectations.yaml"
cat > "$EXPECTATIONS" <<'YAML'
goal: Review a module-scoped Gradle test run
task_id: module-review
notes:
- Start with the smallest relevant Gradle test scope.
- Review runtime evidence before source inspection.
YAML
ALLURE_AGENT_OUTPUT="$TMP_DIR/agent-output" \
ALLURE_AGENT_EXPECTATIONS="$EXPECTATIONS" \
allure run -- ./gradlew :allure-jupiter:test \
--tests io.qameta.allure.junit5.AllureJunit5Junit6CompatibilityTestBroad repo-smoke pattern:
TMP_DIR="$(mktemp -d)"
ALLURE_AGENT_OUTPUT="$TMP_DIR/agent-output" \
allure run -- ./gradlew --no-build-cache cleanTest testBroad package-health or repo-health audits may omit expectations, but the resulting scope review is weaker and should be called out explicitly.
- Steps must wrap real setup, actions, state transitions, or assertions.
- Attachments must contain real runtime evidence from that execution.
- Metadata should stay minimal and purposeful.
- Prefer helper-boundary instrumentation over repetitive caller wrapping.
Good example:
- instrument a shared assertion helper once instead of wrapping every caller
Rejected examples:
- empty wrapper steps
- static
test passedattachments - labels that no review or policy step uses
- Suite-load, import, or setup failures may appear only in
artifacts/global/stderr.txtor global errors. - If
manifest/tests.jsonldoes not account for all visible failures from the test runner, inspect global stderr before concluding the run is fully modeled. - Treat that state as a partial runtime review, not as a clean or complete result set.
- If runner-visible failures are present outside logical test files, final conclusions must stay provisional until the missing modeling is understood.
Accept a run only when:
- scope matches expectations
- evidence is strong enough to explain what happened
- no high-confidence noop or placeholder findings remain
A test review is not complete unless:
- the relevant scope was run with agent mode, unless that is impossible
- expectations were created for the intended scope, unless this is a broad package-health audit
- agent artifacts were reviewed before final conclusions
- missing or partial runtime modeling was called out explicitly
- console-only conclusions are treated as provisional when agent output is absent or incomplete