Skip to content

mlcommons/endpoints-submission-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLCommons Endpoints Submission Tools

A Python package with two tools for managing MLPerf Endpoints benchmark submissions:

  • endpoints-submission-cli — registers benchmark runs, assembles submission packages, runs compliance checks, and opens GitHub pull requests via the PRISM API.
  • submission-checker — validates a submission folder against the §9.1 automated compliance rules before or after upload.

Installation

With pip:

pip install endpoints-submission-cli

From source (editable):

pip install -e ".[dev]"

With uv:

uv sync --extra dev

endpoints-submission-cli

Requirements

  • Python 3.10 or later
  • gh CLI — required for creating, updating, and withdrawing submissions

Authentication

Every command requires a PRISM API token in mlc_… format. Supply it as an env var or pass --token per command:

# Persistent (add to shell profile)
export PRISM_USER_API_TOKEN=mlc_your_token_here

# Per-command override
endpoints-submission-cli runs list --token mlc_your_token_here

Submission commands that create or update GitHub pull requests also require the gh CLI:

gh auth login

Configuration

Environment variable Default Description
PRISM_USER_API_TOKEN API key. Required unless --token is passed.
MLPERF_SUBMISSION_REPO MLCommons-Systems/test-endpoints-submission-repo Target GitHub repository for submission PRs (owner/repo).

Add to your shell profile for a persistent setup:

export PRISM_USER_API_TOKEN=mlc_your_token_here
export MLPERF_SUBMISSION_REPO=MLCommons-Systems/endpoints-submission-repo

Quick start

# 1. Verify connectivity
endpoints-submission-cli runs list

# 2. Register a benchmark run from a local result folder
endpoints-submission-cli runs create --path /results/llama3_h100_c4
# → Run created: d5d9873e-5eca-4f8d-a487-4be1cb8b440c
RUN_ID=d5d9873e-5eca-4f8d-a487-4be1cb8b440c

# 3. Create a submission (assembles, checks, uploads, opens PR)
endpoints-submission-cli submissions create \
  --division standardized \
  --availability available \
  --run-ids $RUN_ID
# → Submission created: a1b2c3d4-…
# → PR: https://github.com/MLCommons-Systems/…/pull/42
SUB_ID=a1b2c3d4-e5f6-7890-abcd-ef1234567890

# 4. Add another run later
endpoints-submission-cli submissions add-run \
  --submission-id $SUB_ID \
  --run-id <new-run-id>

# 5. Withdraw if needed
endpoints-submission-cli submissions withdraw --submission-id $SUB_ID

Command reference

endpoints-submission-cli
├── runs
│   ├── list        List all runs
│   ├── create      Register a run from a local folder
│   ├── get         Fetch run details
│   ├── delete      Delete a run and its archive
│   ├── pin         Pin a run (prevent expiry)
│   └── unpin       Restore normal expiry
└── submissions
    ├── list        List all submissions
    ├── create      Create a submission from runs (full pipeline)
    ├── get         Fetch submission details
    ├── update      Update run list or metadata
    ├── withdraw    Withdraw a submission
    ├── add-run     Add a run to an existing submission
    └── remove-run  Remove a run from a submission

Use --help on any command for full flag details:

endpoints-submission-cli submissions create --help

submission-checker

CLI tool for validating MLPerf Endpoints submissions against the §9.1 automated compliance checks.

Usage

Check a submission

submission-checker check /path/to/submission

The tool expects the submission root to contain systems/ and pareto/ subdirectories as specified in §8.1.

Options:

Flag Description
--strict Treat warnings as errors (exit 1 on any warning)
--quiet / -q Suppress INFO-level passing checks
--output FILE / -o FILE Write full results as JSON to FILE

Exit codes: 0 = all checks passed, 1 = one or more errors (or warnings with --strict).

Show region boundaries

submission-checker regions --max-concurrency 1024

Prints the concurrency ranges for each region given a declared Maximum Supported Concurrency M (§5.5).

Required Files in submission structure

<org>/
├── systems/
│   └── <system_desc_id>.json         # §8.2 — hardware + software description
└── pareto/
    └── <system_desc_id>/
        └── <benchmark_model>/
            ├── points/
            │   └── point_<N>.yaml    # §8.3 — one config per measurement point
            ├── results/
            │   └── point_<N>/
            │       ├── mlperf_endpoints_log_summary.json
            │       └── mlperf_endpoints_log_detail.json
            └── accuracy/
                ├── accuracy.txt
                └── accuracy_result.json

What gets checked

Rule Spec Description
path-exists §1 Submission root directory exists
required-dir §1 systems/ and pareto/ present
system-description-present §1 At least one *.json file found in systems/
system-description-valid §1 systems/*.json parses against schema
src-dir §1 src/ present for Standardized submissions
pareto-dir-exists §1 pareto/<system_id>/ directory exists
benchmark-model-dir §1 At least one benchmark-model directory in pareto/<system_id>/
pareto-subdir §1 points/, results/, accuracy/ present
measurement-points-present §1 At least one point_*.yaml found
point-config-valid §1 YAML parses against PointConfig schema
point-filename-concurrency §1 Filename concurrency matches declared value
result-file-present §1 Result summary log exists for each point config
result-detail-present §1 Result detail log exists for each point config
result-file-valid §1 Result summary log parses against PointSummary schema
point-count §2, §8 7–32 measurement points
point-cap §2, §8 Point count does not exceed 32
low-latency-coverage §3 At least one point in Low Latency region
low-throughput-coverage §4 At least one point in Low Throughput region
med-throughput-coverage §5 At least one point in Medium Throughput region
high-throughput-coverage §6 At least one point in High Throughput region
max-concurrency-declared §7 max_supported_concurrency field present
region-computation §7 M > 32 (required for region formula)
concurrency-in-range §9 Concurrency within region bounds (incl. 10% margin)
load-pattern §10 load_pattern is concurrency with a positive concurrency level
point-duration §11 Point meets per-region minimum duration
min-query-count §12 n_samples_completed meets dataset-specific minimum (§6.4)
streaming-config §13 stream_all_chunks is True
metric-consistency-duration §14 duration_ns > 0
metric-consistency-accounting §14 completed + failed == issued
metric-consistency-output-tokens §14 total_output_tokens ≥ 0
metric-consistency-system-tps §9.1 Stored system_tps consistent with derived value
metric-consistency-tps-per-user §9.1 Stored tps_per_user consistent with system_tps / concurrency
accuracy-file §15 accuracy.txt and accuracy_result.json present
accuracy-valid §15 accuracy_result.json parses correctly
accuracy-consistency §15 passed flag consistent with score >= quality_target
accuracy-gate §15 Score ≥ quality target
config-consistency-dataset §16 All points use the same dataset
config-consistency-model §16 Directory name matches benchmark_model
region-declared §8.3 Declared region field (if present) is valid and matches computed region

Programmatic API

from submission_checker import SubmissionChecker, Report

checker = SubmissionChecker(Path("/submissions/acme_corp"))
report = checker.run()

if report.passed:
    print("All checks passed")
else:
    for result in report.errors:
        print(f"[{result.rule}] {result.message}")

The Report object also exposes report.warnings and serialises cleanly via report.model_dump_json().


Development

uv run pytest                          # run all tests
uv run pytest --no-cov -x             # fast fail on first error
uv run ruff check src/ tests/          # lint
uv run ruff format src/ tests/         # auto-format

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

Generated from mlcommons/template