Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Yolean/y-cluster
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: Yolean/y-cluster
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: appliance-workflows
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 16 commits
  • 11 files changed
  • 2 contributors

Commits on May 12, 2026

  1. feat(scripts): appliance build / publish / e2e drivers

    Brings every scripts/ change from agents/appliance-export-import
    to upstream main as a single bump. The Go and testdata side of
    this work landed in #19 (appliance-primitives); this commit is
    the operator-facing bash that drives the released binary
    through the appliance lifecycle, plus the .env-style config
    the scripts source.
    
    What's new vs main:
    
      - appliance-build-hetzner.sh / appliance-build-virtualbox.sh:
        interactive build flows producing a .qcow2 + a VirtualBox-
        importable .ova respectively, both via the released
        y-cluster binary's prepare-export and export subcommands.
      - appliance-publish-hetzner.sh: pushes a built appliance to
        Hetzner Object Storage for handoff.
      - appliance-qemu-to-gcp.sh: end-to-end qemu -> GCP custom
        image flow (export --format=gcp-tar -> gsutil cp -> compute
        images create) with persistent /data/yolean disk preserved
        across redeploys, plus a teardown subcommand.
      - gcp-bootstrap-credentials.sh: one-shot bootstrap for the
        service account / project / key file the GCP flow needs.
      - e2e-appliance-export-import.sh: local qemu -> qemu round-
        trip exercising the full prepare-export / export / import
        cycle without any cloud cred dependency.
      - e2e-appliance-hetzner.{sh,pkr.hcl}: Packer-based snapshot
        flow; lays the snapshot down once, spins fresh servers on
        top to verify boot.
      - e2e-appliance-qemu-to-gcp.sh: non-interactive driver of
        appliance-qemu-to-gcp.sh end to end, including teardown.
      - .env.example + .gitignore: documents every overridable
        knob (GCP_PROJECT, GCP_KEY, H_S3_ENV_FILE, ENV_FILE) with
        a generic example path; .env stays out of git.
    
    Configuration: required values are operator-supplied via env
    vars (no built-in defaults). Each script derives REPO_ROOT
    from BASH_SOURCE and sources $REPO_ROOT/.env via `set -o
    allexport` when present, so the .env path works regardless of
    CWD (including `cd /tmp && bash /path/to/script`). Missing
    required values fail fast with a clear "set $VAR in .env or
    shell env" message.
    
    Scope: scripts/ + repo-root .env plumbing. The Go side is
    already on main via #19. Both `go build ./...` and `go test
    ./...` are unchanged-clean on this branch -- the scripts add
    no go.mod or testdata edits.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    160e482 View commit details
    Browse the repository at this point in the history
  2. test: align local gateway access on port 80 (e2e + scripts)

    Yolean dev / setup scripts that smoke-test the gateway expect a
    host-side port that reaches guest 80. Today's qemu-side host port
    forwards default to 39080 in both the Go e2e helper and the bash
    appliance-build scripts, so any consumer that hardcodes
    http://localhost:80 has to remember the offset.
    
    This host (and most modern Linux distros) ships
    net.ipv4.ip_unprivileged_port_start=80, so qemu's user-mode
    hostfwd inherits the ability to bind port 80 without root. Default
    APP_HTTP_PORT and the e2e port-forward helper to 80 in lockstep:
    
      - e2e/qemu_test.go: e2eUniqueForwards now takes both apiPort
        and httpPort; every test passes its own pair (28443 / 28444 /
        ... vs 26443 / 26444 / ...) keyed off the apiPort so concurrent
        test runs on the same host don't collide. Each test always gets
        a guest-80 forward, matching what the appliance-build scripts
        install.
      - scripts/appliance-{qemu-to-gcp,build-hetzner,build-virtualbox}.sh
        + scripts/e2e-appliance-{export-import,qemu-to-gcp}.sh: the
        APP_HTTP_PORT default flips from 39080 to 80, with YHELP /
        inline curl examples updated to match. Override via env
        (APP_HTTP_PORT=39080 ./scripts/...) on hosts that keep port 80
        privileged.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    c4f1cec View commit details
    Browse the repository at this point in the history
  3. test: bump appliance VM/disk size to 40G (avoid disk-pressure flakes)

    Appliance e2e / build flows install workloads, build a seed
    tarball, prepare-export, and re-boot from the prepared disk -- the
    cumulative footprint pushes the 20G default disk into pressure on
    the kubelet's image-gc thresholds, which surfaces as flaky pod
    evictions mid-test or mid-build.
    
    Bump to 40G everywhere a 20G default sat:
    
      - e2e/qemu_test.go: e2eQEMURuntime overrides DiskSize to 40G so
        every qemu e2e test boots with the larger disk by default.
      - scripts/appliance-{qemu-to-gcp,build-hetzner,build-virtualbox}.sh
        + scripts/e2e-appliance-{export-import,qemu-to-gcp}.sh: the
        generated y-cluster-provision.yaml now sets diskSize: "40G".
      - scripts/appliance-qemu-to-gcp.sh: --boot-disk-size on
        `gcloud compute instances create` flips from 20GB to 40GB so
        the GCE VM doesn't reject the 40G custom image with "Requested
        disk size cannot be smaller than the image size".
    
    qcow2 is sparse, so the host-disk footprint only grows with actual
    usage; the larger virtual size is a no-cost ceiling. The GCE side
    similarly uses a thin-provisioned persistent disk.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    7208521 View commit details
    Browse the repository at this point in the history
  4. chore(scripts): drop port defaults from bash, defer to y-cluster's

    The appliance-build / e2e scripts each carried a defaults block:
    
        APP_HTTP_PORT="${APP_HTTP_PORT:-80}"
        APP_API_PORT="${APP_API_PORT:-39443}"
        APP_SSH_PORT="${APP_SSH_PORT:-2229}"
    
    then interpolated those into the heredoc'd y-cluster-provision.yaml.
    Three of the four values restate y-cluster's own defaults
    (80/6443/2222 in pkg/provision/config); the bash defaults that
    DIFFERED (39443 vs 6443; 2229 vs 2222) were chosen for collision
    avoidance against an operator's regular y-cluster, but were quiet
    duplicates of the same defaulting concept.
    
    Replace the heredoc with a brace block that emits each port field
    ONLY when the env var is set. Net behaviour:
    
      - No env override   -> minimal YAML; y-cluster fills 2222 +
                             {6443:6443, 80:80, 443:443}.
      - APP_HTTP_PORT=N   -> only the host:N -> guest:80 entry lands;
                             API/SSH still y-cluster-default.
      - Multiple set      -> all set entries land; requireHostAPIPort
                             validates that a guest:6443 entry exists.
    
    Display refs (banner curl examples, ssh commands, smoketest
    probes) use ${APP_*_PORT:-NN} inline so the printed URL/SSH
    command shows the right port whether overridden or default.
    YHELP entries reworded from "(default: 80)" to
    "(y-cluster default: 80)" so the operator sees who owns the
    default.
    
    IMP_HTTP_PORT / IMP_SSH_PORT in e2e-appliance-export-import.sh
    left as-is (test-only; the import-side qemu is started directly,
    no y-cluster CLI involvement, so y-cluster's defaults don't
    apply).
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    b3f4791 View commit details
    Browse the repository at this point in the history
  5. chore(scripts): also forward host:guest 443 conditionally

    Symmetric with APP_HTTP_PORT / APP_API_PORT: a new
    APP_HTTPS_PORT env var lets operators override the host port
    forwarded to guest 443. Unset means "let y-cluster apply its
    default" -- the YAML still omits the field when no port var is
    set, which matches the behaviour for the other ports.
    
    Without this, an operator who overrides any one of {HTTP, API}
    silently lost 443 forwarding (the YAML's portForwards block
    became canonical and didn't include 443; previously y-cluster's
    [6443:6443, 80:80, 443:443] default applied only when the bash
    emitted no portForwards at all).
    
    The host:guest match keeps standard ports inside the appliance
    unchanged; the host-side ip_unprivileged_port_start sysctl on
    modern Linux distros allows binding 443 without root the same
    way 80 already does.
    
    YHELP entries updated to surface the new knob.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    fe59a25 View commit details
    Browse the repository at this point in the history
  6. feat(scripts/qemu-to-gcp): optional regional HTTPS LB at end-of-build

    Adds a post-deploy step that offers to stand up a GCP regional
    External Application Load Balancer in front of the appliance VM
    with a self-signed cert covering operator-supplied FQDNs.
    Idempotent (describe-then-create) so re-runs converge; teardown
    integrated into the existing teardown subcommand.
    
    Why a self-signed cert and a prompt-not-default
    
    The cert-manager → upload-real-cert path is the eventual
    production shape, but for the dev loop a self-signed cert lets
    the operator verify the LB stack + HTTPRoute hostname matching
    without DNS / CA dependencies. The opt-in default is a billing
    meter (forwarding rule ~hourly, reserved IP) the operator should
    deliberately accept; we don't want a forgotten ASSUME_YES run to
    silently provision one.
    
    Operator UX
    
      - Default: prompts after the HTTP probe with a one-paragraph
        explainer (cost, self-signed cert, HTTPRoute prerequisite),
        accepts comma-separated FQDNs, empty skips.
      - TLS_DOMAINS env var preset: skips the prompt and runs.
      - ASSUME_YES alone: skips silently (unattended e2e shouldn't
        surprise-bill).
      - Final banner prints the LB IP + a single /etc/hosts line
        covering all FQDNs, marks the cert SELF-SIGNED, points at
        the gcloud commands to swap in a real cert later.
    
    Resources, all named ${NAME}-tls-*
    
      proxy-only subnet (reuses any ACTIVE one in the region;
                         creates per-build only when none exists)
      static regional IP
      SSL cert (uploaded, self-signed)
      HTTP health check on /q/envoy/echo
      zonal NEG with the VM as endpoint
      backend service (EXTERNAL_MANAGED) + add-backend
      URL map (default-service points at the backend)
      target HTTPS proxy
      forwarding rule on :443
    
    Teardown
    
    do_tls_teardown is invoked from the existing do_teardown so a
    plain `appliance-qemu-to-gcp.sh teardown` cleans up the LB
    stack alongside the VM/image/object/disk. Order forces the
    forwarding rule first (stops the meter), then proxy / url-map /
    backend / NEG / health-check / cert / IP. Subnet last and only
    when it's the per-build one (we never delete a reused regional
    subnet). Each delete is idempotent: missing resources are not
    errors. The `Will DELETE:` inventory now lists `${NAME}-tls-*`
    when a forwarding rule of that shape exists.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    d30020b View commit details
    Browse the repository at this point in the history
  7. feat(scripts/qemu-to-gcp): :80 -> :443 LB redirect + Gateway-route probe

    Two related fixes for the GCP appliance smoke flow:
    
    1. do_tls_frontend now creates a :80 forwarding rule that 301s
       to :443. Previously the function set up only a :443 listener,
       so any `curl http://<lb-ip>/...` against the LB IP hung at TCP
       connect (no listener on 80). Hangs from `curl ... http://...`
       were diagnosed against the live ext-app01-* LB stack which
       has the same shape.
    
       Mechanism: GCP regional EXTERNAL_MANAGED URL maps can either
       forward (defaultService) or redirect (defaultUrlRedirect),
       not both, so the redirect needs its own URL map. The chain:
    
           :80 fwd -> tls-http-proxy -> tls-redirect URL map (301 to https)
           :443 fwd -> tls-proxy      -> tls-urlmap (existing, ->backend)
    
       `gcloud compute url-maps create` has no flag for default-
       redirect, hence the `url-maps import` from a heredoc.
       Hostname-agnostic on both ports: every request, any Host:,
       either redirects (on :80) or forwards to the VM (on :443).
       The VM's envoy-gateway is the only Host-aware hop.
    
       do_tls_teardown grew matching delete calls in dependency order
       (forwarding rules -> proxies -> URL maps) so re-runs converge
       cleanly.
    
    2. The post-deploy probe at the end of the GCP stage now
       enumerates HTTPRoute + GRPCRoute hostnames via SSH +
       `sudo k3s kubectl ... -o jsonpath` and probes each FQDN
       through `--resolve <fqdn>:80:$PUBLIC_IP`. Replaces the
       single-path `/q/envoy/echo` probe -- which only verified
       "envoy answers anything", not "every advertised route is
       reachable end-to-end".
    
       Reachability == any HTTP status code (2xx/3xx/4xx/5xx),
       not 200: a route that legitimately answers 302 / 401 / 404 is
       still proof the firewall + klipper-lb + envoy-gateway chain
       is working. Only `000` (timeout / refused) counts as
       unreachable. On any unreachable route the script logs a
       warning with diagnostic suggestions (firewall source-ranges
       narrowed, backend Service not Ready, workload still rolling
       out) and continues -- info-level surfacing today, gating /
       strict mode is a deliberately deferred follow-up.
    
    Falls back to the old `/q/envoy/echo` probe when the cluster has
    no Gateway-bound routes (a workload that hasn't applied yet).
    
    Verified end-to-end against the live appliance: 4 routes
    enumerated (dev.yolean.net, ext-app01.yolean.se, keycloak-admin,
    keycloak-admin.ext-app01.yolean.se), all returned HTTP 302 on
    the first attempt. The redirect chain itself is intentionally
    NOT exercised against ext-app01-* in this commit (would require
    mutating an in-use LB the operator owns); it lands on the next
    do_tls_frontend run.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    dd9db67 View commit details
    Browse the repository at this point in the history
  8. feat(scripts/qemu-to-gcp): explicit --reuse-disk / --keep-disk; teard…

    …own-side PRESERVED message
    
    State preservation across appliance redeploys is the overarching
    design goal of the data-seed mechanism (commit f69addf +
    APPLIANCE_MAINTENANCE.md). What was missing on the operator-
    facing side: the QA-flow build script silently reused the
    persistent disk on every redeploy, masking the seed-skip from
    build-time-only operators who expected each run to validate the
    seed end-to-end. Conversely, the production "preserve customer
    state across upgrades" intent was never written down where it
    mattered (the operator only saw a generic banner at deploy time,
    not after teardown when the disk-keep decision is most actionable).
    
    Changes:
    
      - Build-flow `--reuse-disk=true|false` with an interactive
        prompt (default Y -- preserve, matching the design goal).
        On `--reuse-disk=false` the script delete-and-recreates the
        persistent disk so the next boot's data-seed unit lands the
        OS image's seed cleanly. Non-TTY callers MUST pass the flag
        explicitly; ASSUME_YES + missing flag fails fast rather than
        silently picking a default for an irreversible decision.
    
      - Teardown `--keep-disk=true|false`. Default behavior is
        unchanged (keep). Legacy `--delete-data-disk` continues to
        work as `--keep-disk=false` with a one-line deprecation
        notice, so any existing automation isn't broken.
    
      - Decoupled the new disk decisions from the existing
        `confirm()` helper (which consults ASSUME_YES). New
        `prompt_yes_default()` helper requires a TTY or an
        explicit flag, never falls back to ASSUME_YES. The umbrella
        ASSUME_YES still covers the existing 'Proceed?' + TLS-LB
        prompts.
    
      - Moved the "Persistent data disk PRESERVED" message from
        the build-success banner to the END of teardown when the
        disk was kept. That's the moment the operator's mental
        model needs the reminder ('what survived?' + 'how do I
        delete it later?'). The build success block keeps a brief
        one-line pointer to teardown's message instead of carrying
        the full paragraph.
    
    Verified end-to-end against yo-sre-appliance-qa over the past
    two days: --reuse-disk=false correctly recreates the disk and
    the data-seed unit extracts the image's seed onto it; the
    recreated disk + grastate.dat workaround round-tripped
    mariadb's keycloak.REALM rows through prepare-export -> seed
    -> fresh-disk -> boot, with `keycloak/auth/realms/ext-bfv01`
    returning 200 from the resulting cluster.
    
    Two follow-up fixes lined up but not in this commit (kept
    working-tree, separate commit): a `return 0` belt at the end
    of do_tls_teardown so its trailing `[[ -n "$subnet" ]] && ...`
    doesn't leak a non-zero exit and abort the caller before the
    new PRESERVED block fires; and the revert of the route-
    enumeration block that this same teardown-issue debugging
    surfaced as post-import SSH+kubectl scope-creep.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    475d677 View commit details
    Browse the repository at this point in the history
  9. chore(scripts/qemu-to-gcp): bump build VM memory 4096 -> 8192

    The build VM occasionally OOMs during heavier customer
    workloads applied at PROMPT 1 (mariadb + kafka + envoy +
    the bundled controllers all in 4GB is tight). 8GB matches
    the y-cluster default for stand-alone provisions but the
    qemu-to-gcp script was overriding it down to 4GB to keep
    the host's headroom; the headroom is fine on the build
    host, so lift the override.
    
    The y-cluster default itself is unchanged (8192 in
    config.QEMUConfig.applyDefaults), so other provisioner
    flows (multipass, docker, plain qemu) are not affected.
    Disk size stays at 40GB.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    77bdca2 View commit details
    Browse the repository at this point in the history
  10. fix(scripts/qemu-to-gcp): drop explicit stop before prepare-export

    PR #20 changed prepare-export to require the cluster RUNNING:
    its live phase clears the per-deploy dns-hint-ip annotation
    and snapshots reconciled Gateway state into <cacheDir>/<name>-
    gateway-state.json (both need the apiserver up). prepare-export
    then stops the VM itself before its offline (virt-customize)
    phase.
    
    The plan called for dropping `y-cluster stop` from the script
    ahead of prepare-export, but the script edit never landed. The
    result: every run of appliance-qemu-to-gcp.sh would stop the
    cluster, then crash with "VM not running; start the cluster
    first" when prepare-export ran against the stopped VM.
    
    Drop the explicit stop call. Update the docstring stage list
    to reflect that prepare-export does its own stop.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    6c2e3fd View commit details
    Browse the repository at this point in the history
  11. feat(scripts/qemu-to-gcp): TLS_DOMAINS=auto derives from gateway state

    Drops the parallel-list footgun: today the operator declares
    hostnames in HTTPRoute manifests AND in TLS_DOMAINS, and drift
    between the two means the LB cert covers hostnames the cluster
    doesn't serve, or vice versa.
    
    Setting TLS_DOMAINS=auto now resolves the FQDN list by calling
    `y-cluster gateway hostnames --csv` against the just-provisioned
    cluster, immediately after PROMPT 1 confirmation. The cluster's
    reconciled HTTPRoute / GRPCRoute hostnames become the LB cert
    SAN list -- one source of truth.
    
    Resolution runs BEFORE prepare-export because by the TLS LB
    stage (after prepare-export + GCP deploy) the local apiserver
    is gone. Other TLS_DOMAINS values (literal CSV / empty /
    prompt) are still handled at the LB stage as before.
    
    Empty result aborts with an explicit error (operator asked for
    auto, none found = something wrong with the cluster state).
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    6b0c5d2 View commit details
    Browse the repository at this point in the history
  12. feat(scripts/qemu-to-gcp): APPLIANCE_SEED_CMD + APPLIANCE_VERIFY_CMD …

    …hooks
    
    The unattended flow had ASSUME_YES + TLS_DOMAINS=auto landed
    already, but no work-doing hook in PROMPT 1's hands-on window.
    Result: a build with ASSUME_YES=1 reached prepare-export with
    only the y-cluster echo HTTPRoute applied; TLS_DOMAINS=auto
    then aborted because the cluster had no non-wildcard hostnames
    to derive from.
    
    Add the two hooks documented in
    specs/y-cluster/FEATURE_APPLIANCE_AUTOMATED_FLOW.md:
    
    - APPLIANCE_SEED_CMD runs after echo install, before PROMPT 1.
      Customer workloads applied here populate /data/yolean for the
      data-seed extraction AND give TLS_DOMAINS=auto real hostnames.
    - APPLIANCE_VERIFY_CMD runs at the end, after the GCP deploy
      + optional TLS LB. Receives the LB IP / VM IP / domains via
      the Y_CLUSTER_CURRENT_* surface so a remote probe can curl
      --resolve through the deployed VM without /etc/hosts.
    
    Both fire via `bash -c "$cmd"` so the operator-supplied string
    can pipe / chain / cd freely. Both export a single, consistent
    Y_CLUSTER_CURRENT_* env surface (via the new current_env
    helper) -- a verify script `printenv | grep ^Y_CLUSTER_CURRENT_`
    sees the full surface either way; vars not yet known at the
    seed hook (REMOTE_VM_IP, etc.) are exported as empty strings.
    
    Non-zero exit aborts under set -e. Local cluster / VM / LB
    stay up for inspection.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Yolean k8s-qa and claude committed May 12, 2026
    Configuration menu
    Copy the full SHA
    900ff6a View commit details
    Browse the repository at this point in the history
  13. fix(scripts/qemu-to-gcp): default to e2-standard-2 (e2-medium OOMs th…

    …e stack)
    
    Observed an appliance build that ran fine for ~2h at 91-93% memory
    on a 4 GiB e2-medium, then died at 100% CPU / 3807 MiB used: ssh
    banner exchange timed out, :443 + :6443 went REFUSED while :80
    kept LISTEN with the userspace too starved to respond. Classic OOM
    spiral. The full appliance stack (k3s + containerd + keycloak +
    envoy gateway + envoy proxy + mysql + kafka) sits within ~300 MiB
    of the 4 GiB ceiling at idle; any workload spike pushes it over.
    
    e2-standard-2 (2 vCPU / 8 GiB) gives the stack the headroom it
    needs. GCE machine types bundle CPU + memory, so there's no
    separate memory override -- that's spelled out in both the help
    text and the default-assignment comment so the next operator
    reading either spot sees why we don't surface a GCP_MEMORY knob.
    GCP_MACHINE_TYPE stays as the escape hatch for highmem / larger
    shapes.
    Yolean k8s-qa committed May 12, 2026
    Configuration menu
    Copy the full SHA
    4654271 View commit details
    Browse the repository at this point in the history
  14. GCP NEG endpoint re-attach is idempotent on re-runs (VM is recreated …

    …each build)
    Yolean k8s-qa committed May 12, 2026
    Configuration menu
    Copy the full SHA
    009bea4 View commit details
    Browse the repository at this point in the history
  15. fix(scripts/qemu-to-gcp): drop apostrophes from YHELP block

    The previous commit added "e2-medium's" and "there's" inside the
    single-quoted YHELP heredoc. Single quotes in bash can't contain
    single quotes, so the apostrophes terminated the string mid-block;
    the resumed unquoted "4 GiB OOMs ..." got parsed as a command,
    and any consumer that sourced or executed the help block saw
    "line 76: 4: command not found".
    
    Reworded to avoid the apostrophes entirely. bash -n parses the
    file clean and --help renders the section as intended.
    Yolean k8s-qa committed May 12, 2026
    Configuration menu
    Copy the full SHA
    ee8c01e View commit details
    Browse the repository at this point in the history
  16. feat(scripts): fail-fast schema checks on GCP_KEY and H_S3_REGION

    Both files-pointed-at-by-env-var inputs surfaced the same
    foot-gun: a malformed value passed the existence check but
    failed deep inside the tool we shelled into, with a less
    helpful message:
    
      - GCP_KEY pointing at a truncated / wrong-format JSON
        (e.g. a re-exported key that lost its private_key during
        a copy-paste) only erred at `gcloud auth
        activate-service-account`, by which point the operator
        has already proven the file exists. Now `jq -e` checks
        that the four fields GCP requires for a service-account
        auth are populated -- type=service_account, project_id,
        client_email, private_key -- and errors with the missing
        field names so the operator knows what to fix.
    
      - H_S3_REGION accepted any string and only surfaced "could
        not resolve host" when the upload URL hit a non-existent
        endpoint hostname. The help text already documents the
        valid set (fsn1, hel1, nbg1); now the script enforces
        it at config-load time with a message naming the valid
        values.
    
    Both checks fire BEFORE any cloud-side state change. Adds no
    new dependency: jq is already required by the broader
    appliance flow.
    Yolean k8s-qa committed May 12, 2026
    Configuration menu
    Copy the full SHA
    61dd495 View commit details
    Browse the repository at this point in the history
Loading