Skip to content

Releases: Deep-CodeAI/Agents.KT

v0.7.21

02 Jun 20:31
3a59bcd

Choose a tag to compare

Security + de-slop release. Headlined by a nested-agent recursion bound (#3377) and the explicit
skill-routing failure on ambiguity (#3087), plus a build-wide one-type-per-file refactor (#3199) and
new release/quality guards (#3084 / #3089), the start of the AgenticLoop decomposition (#3376), and
honest README positioning (#3085 / #3086). Internal refactors are behavior-preserving; the two
behavior changes (routing, the maxAgentDepth default) are called out below. Drop-in on the 0.7.x line.

Fixed — bound nested agent recursion with maxAgentDepth (#3377, security)

  • Budgets bounded a single agentic loop, but a tool that re-invokes an agent (Swarm absorb,
    agent-as-tool) spun up a fresh loop with a fresh budget — so a self-re-entering agent (A→A) or a
    cycle (A→B→A) recursed one full LLM loop per level until StackOverflowError, a DoS / runaway-cost
    vector (triggerable e.g. by prompt injection into a tool result). Now AgentRuntimeContext carries
    a nested-invocation depth (incremented in newRuntimeContext), and budget { maxAgentDepth }
    (default 16) is enforced at the invocation chokepoint: exceeding it throws
    BudgetExceededException(BudgetReason.AGENT_DEPTH) before the over-deep loop starts — fast, no
    extra LLM calls, no overflow. An unconditional safety stop (not extendable via onBudgetExceeded),
    and budget caps now bypass the onError tool-recovery ladder so a nested cap can't be swallowed.

Changed — AgenticLoop decomposition: extract rendering + coercion (#3376 batch 1)

  • First slice of breaking up the 1369-line AgenticLoop.kt / 765-line executeAgentic. Extracted
    the pure tool-result/error renderers into ToolResultRendering (formatEscalatedToolError,
    formatDeniedToolError, wrapUntrustedToolResult, renderToolResultForLlm) and output coercion
    into OutputCoercion (parseOutput, coerceSubstituteOutput) — each a new internal file. These
    were private to the loop (untestable); they now have direct unit tests (ToolResultRenderingTest,
    OutputCoercionTest, TDD RED→GREEN). Behavior-preserving — AgenticLoop delegates. Internal
    refactor, no public API change.

Changed — one-type-per-file complete across the codebase (#3199, final batch)

  • Split every remaining multi-type file (rest of model/, all of core/, content/, composition/,
    generation/, runtime/, sandbox/, testing/, and the manifest / observability / langfuse
    / langsmith / detekt submodules) into one top-level type per file — ~110 new files, all
    same-package moves (no FQN / public-API change). checkOneTypePerFile now passes with an empty
    allowlist: zero multi-type files remain anywhere.
    Renamed 3 files so the filename matches the
    kept type (Snapshot.ktSessionSnapshot.kt, Memory.ktMemoryBank.kt,
    HumanApproval.ktApprovalBuilder.kt), which also satisfies detekt's MatchingDeclarationName.
  • Minor, non-public visibility consequence of the moves: a handful of file-private helpers that
    were referenced across now-separate files were promoted to internal (still module-scoped, not
    public): the manifest engines (ManifestVerifier/StableJson/ManifestJsonParser/StableYaml/
    ManifestGraph), the policy JSON/YAML helpers (ManifestMaps/ManifestJson/ManifestYaml),
    RuntimeContextThreadLocal, KnowledgeEntry, and the nonBlank helper.
  • Behavior-preserving: full ./gradlew build green (all modules + all tests + detekt 423/423 +
    checkOneTypePerFile 0 + checkReadmeVersion). Completes #3199.

Changed — one-type-per-file: split model error/cache types (#3199, batch 3)

  • Split three agents_engine.model files into one type per file (same package — no FQN/public-API
    change): ToolError.ktSeverity, EscalationException, ToolExecutionException (ToolError
    sealed union stays); CacheHint.ktCacheSegment (CacheHint stays); OnErrorBuilder.kt
    RepairResult, RepairScope, ToolErrorHandler (OnErrorBuilder + the executeAgentFix helper
    stay). Allowlist 40 → 37. Behavior-preserving pure moves; detekt baseline unchanged.

Changed — one-type-per-file: split McpServer.kt (#3199, batch 2b)

  • Split the four secondary types out of mcp/McpServer.kt (same package) — RegisteredPrompt,
    RegisteredResource, McpExposeBuilder, ExposedSkill → one file each; McpServer stays
    (597 → 454 lines). Four now-unused imports (constructFromMap, jsonSchema, KClass,
    hasGenerableAnnotation, all moved to ExposedSkill) removed. Completes mcp/ — allowlist
    41 → 40. Behavior-preserving pure moves.

Changed — one-type-per-file: split the mcp/ package (#3199, batch 2)

  • Split five multi-type files in agents_engine.mcp into one type per file (same package — zero
    import churn, no FQN/public-API change): AgentMcpDsl.ktMcpServerBuilder.kt; JsonRpc.kt
    JsonRpcWire / JsonRpcErrorCode / McpException (+ JsonRpc stays); McpClient.kt
    McpToolDescriptor; McpRunner.ktRunnerConfig + McpRunnerBuilder; McpServerSecurity.kt
    ClientPrincipal / McpHttpRequestContext / McpAuthDecision / McpServerAuth (original file
    removed). Allowlist 46 → 41. mcp/McpServer.kt (597 lines, ExposedSkill needs import surgery)
    is deferred to batch 2b. Behavior-preserving pure moves.

Changed — one-type-per-file convention + checkOneTypePerFile guard (#3199, batch 1)

  • New checkOneTypePerFile Gradle guard (wired into check) fails the build if a main-source .kt
    file declares >1 top-level type and isn't on config/one-type-per-file-allowlist.txt. The allowlist
    is a ratchet that may only shrink — it also fails on a stale entry (a listed file that no longer
    violates), so a split must record its own burndown. Documented sealed-ADT exceptions stay listed.
    Mirrors checkReadmeVersion / checkDetektBaseline.
  • Batch 1 split: mcp/McpServerInfo.kt (12 MCP wire DTOs) → one type per file in the same
    agents_engine.mcp package — zero import churn, no FQN/public-API change. New docs/source-layout.md
    documents the convention, exceptions, and the guard. Remaining multi-type files burn down
    package-by-package in follow-up batches under #3199.

Changed — skill resolution extracted into SkillResolver (#3088 stage 2, de-slop #3083)

  • The skill-resolution cluster — type-compatible candidate filter, manual skillSelection { }
    selector, LLM router (confidence gate), the before-skill-interceptor ProceedWith compatibility
    check, and the fail-loud ambiguity error — moved out of Agent's God-object body into its own
    SkillResolver collaborator (new SkillResolver.kt). Agent keeps a private val skillResolver
    and delegates. Internal refactor, behavior-preserving — every branch, condition, exception
    type, and message is identical; no public DSL change. Agent.kt is now 1017 lines (1116 → 1017
    across #3088 stages 1+2). Completes the staged decomposition of #3088.

Changed — README de-slop: honest positioning + accuracy fixes (#3085, #3086, de-slop #3083)

  • Replaced the unqualified hero copy ("The auditable Kotlin agent runtime for regulated teams") with
    a defensible positioning line ("The typed agent runtime for the JVM") plus an up-front pointer to
    the Security Model and threat model, and an explicit "not a compliance product / does not OS-sandbox
    arbitrary tool code" caveat in the intro. The honest enforce/don't-enforce tables already existed;
    the hero no longer contradicts them (#3085).
  • Fixed accuracy drift between "Implemented today" and the limitations/roadmap (#3086): "Four LLM
    providers shipped" → six (adds Kimi + OpenRouter); "Text-only I/O today" → image/document input
    shipped, audio + generation still roadmap; Kotlin badge 2.12.3; Phase 2 roadmap no longer
    lists already-shipped image multimodal as planned. No fabricated benchmark claims were found.

Added — explicit securityCheck gate, checkDetektBaseline burndown, and TESTING.md (#3089, de-slop #3083)

  • New securityCheck aggregate task makes the deterministic security suite addressable on its
    own — sandbox write-confinement (ProcessSandbox: Seatbelt / bwrap / firejail / fallback), tool-
    policy enforcement (#1916), snapshot manifest guard, the arg-size cap (#2888), the tamper-evident
    audit ledger (:agents-kt-observability:securityTest), and the static tool-body rules
    (:agents-kt-detekt:test + detekt). OS-specific confinement skips cleanly off-platform; run
    securityCheck on a macOS job to exercise Seatbelt in CI.
  • New checkDetektBaseline task (wired into check) fails if detekt-baseline.xml grows beyond
    the recorded ceiling (424) — the baseline may only shrink, so new violations get fixed rather than
    grandfathered.
  • New TESTING.md documents, honestly, what the default gate runs and excludes
    (live-llm / live-mcp / interactive are out; live-cloud-api is deliberately in), the
    security gate, the OS-specific confinement matrix, and the baseline ratchet.

Added — checkPublishedVersion release gate + release runbook (#3084, de-slop #3083)

  • New checkPublishedVersion Gradle task HEADs Maven Central for ai.deep-code:agents-kt and
    agents-kt-ksp at the current project version and fails unless both resolve (HTTP 200). It is
    not wired into check — it needs network and would (correctly) fail on an unreleased version
    during dev — so it's the manual last gate before anything user-facing names a new version.
    Override the base URL with -PcentralBaseUrl=…. Complements checkReadmeVersion (#2873): one
    stops the README drifting from the build, the other stops the build advertising a version Central
    can't serve — the exact drift (README/Gradle at 0.7.2 while Central served 0.7.1) an...
Read more

v0.7.2

02 Jun 09:52
59e4c41

Choose a tag to compare

Tool-security hardening — the self-contained first phase of the capability-ABI epic (#2882),
all additive and back-compat: a tamper-evident audit ledger, an argument-size cap, and the
static tool-body guard rails. Plus a release guard so the README's advertised version can't drift
from the build.

Added — release guard: README dependency version must match the Gradle version (#2873)

  • New checkReadmeVersion task (wired into check) fails the build if the
    ai.deep-code:agents-kt:<version> snippet in README.md differs from the Gradle project
    version — the exact drift an external 0.7.0 review flagged. README and version now move together.

Added — ToolCapabilityExtractor: static capability classification (#2884, epic #2882)

  • New ToolCapabilityExtractor in agents-kt-detekt statically classifies what a tool's executor
    body actually does — FS_READ / FS_WRITE / NETWORK / ENVIRONMENT / EXEC — by walking its
    call expressions and matching callee names (writeText/Files.write → write, readText/
    readAllBytes → read, URL/openConnection → network, getenv → env, ProcessBuilder/exec
    exec). The reusable input the upcoming ToolPolicy↔capability comparator (#2887) checks against the
    declared policy. Syntactic by design (callee-name match, no FQN resolution) and intentionally
    conservative — reflection / aliasing / transitive state are Pillar-3 residual.

Added — ToolAuditLedger: tamper-evident, Merkle-chained tool-action log (#2886, epic #2882)

  • New ToolAuditLedger (in agents-kt-observability, sibling to JsonlAuditExporter) — an
    append-only, Merkle-chained, PII-safe record of every tool action. Each row's
    entryHash = SHA-256(prevHash ‖ sequence ‖ callId ‖ toolName ‖ decision ‖ denialReason ‖ resultHash ‖ timestamp) chains to the previous, so ToolAuditLedger.verify(path) recomputes the
    chain and pinpoints the first edited / inserted / deleted / reordered row. The tool result is
    stored only as a hash, never raw (Pillar 2 of #2882).
  • Auto-wire with agent.events.ledger(file) — records PipelineEvent.ToolCalled as APPROVED,
    ToolDenied as DENIED (with reason), ToolHallucinated as HALLUCINATED, and returns the
    ledger for later verify(...). (callId-keying of denied/hallucinated rows lands once
    PipelineEvent carries the callId — a scoped #2886 follow-up.)

Added — maxToolArgsBytes tool-argument size cap (#2888, epic #2882)

  • New budget { maxToolArgsBytes = … } (Long?, default null = off) hard-caps a single tool
    call's argument byte size, checked at one chokepoint (executeToolWithBudget) before the
    executor runs — so an oversized (often prompt-injected) call is rejected, not executed. Resource-
    exhaustion guard (attack A5). Unconditional like perToolTimeout — not extendable via
    onBudgetExceeded; surfaces as BudgetExceededException(reason = BudgetReason.TOOL_ARGS_SIZE).
    Size is the provider wire form (ToolCall.rawArguments) when present, else the serialized arg map.
    Gates both the session and regular executor paths; back-compat (null = unbounded).

Added — agents-kt-detekt rule module + ToolBodyForbiddenApis (#2885, epic #2882)

  • New :agents-kt-detekt module ships custom detekt rules (Pillar 1 static layer). The first rule,
    ToolBodyForbiddenApis, flags raw outside-world APIs (java.io.File, java.net.URL /
    HttpURLConnection, ProcessBuilder / Runtime.exec, Class.forName, Unsafe, sockets) used
    inside a tool executor { } body — a tool must reach fs/net/env only through the (forthcoming)
    closed ToolEnvironment ABI, so every action is policy-gated and audited. Suppressible with
    @Suppress("ToolBodyForbiddenApis") + a reviewed reason. Wired into the project's own detekt run
    (scoped to main source — test fixtures legitimately exercise tools). Consumers opt in via
    detektPlugins("ai.deep-code:agents-kt-detekt").
  • Honest limit: syntactic (matches the callee name, not a resolved FQN) — reflection / aliasing /
    transitive state changes are residual risk covered by Pillar 3 (process isolation). The capability
    extractor (#2884) builds on this module next.

v0.7.1

31 May 18:56
de189c3

Choose a tag to compare

A hardening release on top of 0.7.0 driven by external review. Small upd

v0.7.0

31 May 18:47
6968d8e

Choose a tag to compare

Boundaries you can enforce externally. The 0.6 line made tool policies declarative and
auditable; 0.7.0 makes them enforced. A tool's declared ToolPolicy now constrains it at
runtime — Layer 1 (in-JVM filesystem-argument gate, #2890) plus Layer 2 OS sandboxing (#1916):
macOS Seatbelt, Linux bubblewrap, a firejail setuid fallback, and a plain
ProcessBuilder + loud UNCONFINED warning where no tool is present. Subprocess-shaped tools are
confined to their declared write roots, a derived environment allow-list, a working directory, and a
default-deny network. And the deterministic permission manifest is now reachable outside
Gradle
via the standalone agents-kt CLI (generate / inspect / verify) — a drop-in CI
gate that fails when a change widens a capability boundary.

Deferred to 0.8 (tracked, not shipped here): WasmSandbox (#2894), DockerSandbox (#2895), the
network hostname-allowlist proxy (#2893; default-deny ships, selective allow does not), and the
grants { } hierarchical structure DSL.

Added — standalone agents-kt CLI: permission manifest from a binary (#1923)

  • New :agents-kt-cli module (Gradle application plugin) — the "externally" half of
    the 0.7.0 arc. The deterministic permission manifest, previously reachable only through a
    Gradle task, is now generatable / inspectable / verifiable from a binary, so non-Gradle
    consumers (CI gates, ops, regulators) can enforce capability boundaries:
    • agents-kt generate --entrypoint <FQN> [--classpath a:b] [--format json|yaml] [--out file]
    • agents-kt inspect <manifest.json> [--format json|yaml]
    • agents-kt verify (--entrypoint <FQN> [--classpath a:b] | --current <file>) --baseline <file>
    • Exit codes: 0 ok · 1 verify findings (policy widened) · 2 usage · 3 runtime.
  • The reflective entrypoint→manifest loader was extracted from the Gradle plugin into a
    Gradle-free agents_engine.manifest.ManifestEntrypointLoader, shared by the plugin and
    the CLI — a build and the CLI produce byte-identical manifests (same manifestSha256).
    verify raises the same tool.risk.increased / tool.network.widened /
    tool.filesystem.write.widened findings as the verifyAgentManifest Gradle task. See
    docs/cli.md. (A jlink/native single-file image is a packaging follow-up; the
    entrypoint-loading commands reflect into arbitrary user classes and need a real JVM.)

Added — injectable HttpClient on every provider client (#2385)

  • model { httpClient = … } lets multiple agents share one networking surface
    a connection pool, a bounded executor that rate-limits concurrent LLM calls, an
    outbound proxy, or an HttpClient already wired to your telemetry. All four provider
    clients (Ollama/Claude/OpenAI/DeepSeek) take an optional httpClient: HttpClient?
    constructor param; ModelConfig.httpClient is threaded into each by defaultClientFor()
    (DeepSeek inherits it via its OpenAiClient superclass).
  • Opt-in, never automatic. null (default) → each client builds its own, byte-for-byte
    unchanged. The framework provides the seam; the rate-limit/circuit-breaker/bulkhead policy
    lives in your injected client. See docs/model-and-tools.md → "Sharing a networking surface".

Added — automatic in-JVM tool-policy enforcement (Layer 1 of #1916, #2890)

  • A tool's declared ToolPolicy is now enforced at runtime by default. When a tool
    call carries an absolute filesystem-path argument that falls outside the tool's
    declared read/write globs, the call is denied before its executor runs — surfacing
    through the existing onToolDenied / PipelineEvent.ToolDenied audit path (with
    toolPolicyRisk + usedDeclaredCapability). No hand-written onBeforeToolCall
    interceptor is required anymore. Paths are normalized first, so .. traversal cannot
    escape a declared glob.
  • Opt-in by declaration: a tool that declares no filesystem stance
    (filesystem left Unspecified) is never gated — existing tools are unaffected.
  • Escape hatch: agent { enforceToolPolicies = false } restores the prior 0.6.0
    declare-only (inert) behavior.
  • Scope (this is Layer 1): in-JVM, filesystem-argument enforcement for in-process
    tools. Relative-path precision and network/environment isolation require the
    Layer 2 OS sandbox (ProcessSandbox / WasmSandbox / DockerSandbox, tracked under
    #1916). See docs/tool-policy-enforcement.md.
  • This flips the ToolPolicyEnforcementTest 0.6.0-gap tripwire (#2395) from "restricted
    write still happens" to "restricted write is blocked."

Added — Layer 2 OS sandbox, first slice: macOS write-confinement (#2906, under #2891)

  • New agents_engine.sandbox.ProcessSandbox — runs a command under macOS Seatbelt
    (sandbox-exec) with a generated profile that denies by default and allows file
    writes only under a single canonical folder. A write to any path outside that
    folder is blocked by the kernel, not just the in-JVM Layer-1 gate — so it holds
    even for paths the tool constructs itself. seatbeltProfile(root) is a pure,
    unit-testable function; isSupported() is false off macOS and run throws there.
  • New sandboxedEchoToFileTool(folder) — the simplest demonstration: a tool that echoes
    text into a given path, OS-confined to folder. In-folder writes succeed; out-of-folder
    writes return an ERROR and create no file.
  • The sandbox now builds its profile from a tool's declared ToolPolicy (#2909):
    ProcessSandbox.forPolicy(policy) derives the writable roots from the filesystem.write
    globs (each glob's directory prefix via globToWriteRoot) and opens network only for
    network = AllowAll; ProcessSandbox.forWritableRoots(roots) confines writes to several
    folders at once. This is the bridge that lets Layer 1's declaration drive Layer 2's OS
    enforcement.
  • processTool(name, policy) { args -> command } (#2914) auto-sandboxes a subprocess tool
    from its declared policy — no hand-wiring of ProcessSandbox. It returns the command's
    stdout on success (or an ERROR: string), carries the policy onto the ToolDef so Layer-1
    (#2890) gates path args too, and fails closed (refuses to run rather than executing
    unsandboxed) where no OS sandbox is available.
  • Linux backend (#2892)ProcessSandbox dispatches by OS at run time: macOS Seatbelt,
    Linux bubblewrap (bwrap), then Linux firejail (the setuid fallback). The Linux paths
    bind/mount the whole filesystem read-only, re-mount the declared write roots read-write, and drop
    the network unless opened — same write-confinement contract as Seatbelt, enforced by the kernel.
    firejail still confines where unprivileged user namespaces are restricted (e.g. Ubuntu 24.04's
    apparmor_restrict_unprivileged_userns) and bwrap can't start. On a host with no sandbox
    tool, run no longer throws — it runs the command via a plain ProcessBuilder and prints a loud
    UNCONFINED warning (isSupported() stays false, so a caller that requires enforcement can
    refuse). isSupported() is true when any backend is present, so processTool / forPolicy work
    across all three. The pure bwrapArgs(...) / firejailArgs(...) are unit-tested everywhere; the
    kernel-level integration (@EnabledOnOs(OS.LINUX) + @Tag("linux_only")) is verified on CI's
    native Ubuntu runner.
  • Subprocess env + cwd honored (#2892): ProcessSandbox now confines the child's environment and
    working directory. forPolicy derives the env from the declared ToolEnvironmentPolicy
    environment { allow("HOME") } passes only those vars through, environment { denyAll() } gives the
    child an empty environment, unspecified inherits; forWritableRoots(..., env, workingDir) sets them
    explicitly. Applied on the ProcessBuilder, so every backend (Seatbelt / bwrap / firejail) inherits
    the confinement.
  • Network default-deny ships across all backends (#2893 core): only network { allowAll() } opens
    the network; denyAll / Hosts / unspecified stay blocked (Seatbelt no-network, bwrap
    --unshare-net, firejail --net=none). The hostname-allowlist proxy (so Hosts can selectively
    allow domains) remains the deferred part of #2893.
  • Remaining Layer-2 follow-ups: the network hostname-allowlist proxy (#2893), read-confinement, the
    grants { } structure DSL, and the process { } DSL. Wasm/Docker backends are #2894/#2895.

v0.6.6

30 May 19:50
3bc8e60

Choose a tag to compare

Fixed — Session catch swallowed CancellationException as AgentEvent.Failed (#2863)
All six session extensions (AgentSessionExtension, PipelineSessionExtension, ParallelSessionExtension, BranchSessionExtension, LoopSessionExtension, ForumSessionExtension) — the outer catch (t: Throwable) block previously treated every CancellationException as a real failure: it emitted a synthetic AgentEvent.Failed, closed the channel cleanly, and swallowed the cancel from the surrounding scope. Field-reported regression (SSE bridge rendered "FlowSubscription was cancelled" as a user-visible failure, clobbering already-streamed partial output).
Rewritten as ordered multi-catch: TimeoutCancellationException first (real failure → Failed path, must come before bare CancellationException because it's a subtype), then CancellationException (propagate per structured-concurrency contract — close channel with the cancel, rethrow), then Throwable (real failure → Failed).
Pinned by new SessionCancellationTest — 2 structural cases (bare cancellation propagates — no Failed event, executor failure still emits Failed) plus 4 per-vendor cases (Ollama / Claude / OpenAI / DeepSeek) using stub ModelClient injections so a future adapter-specific regression can't slip past CI.
Changed — Maintainability epic #2790 (10 refactor tickets)
A code-smell audit landed 10 focused refactors. All behavior-preserving; no public API removals.

#2806 — Runtime cleanup. Central agents_engine.runtime.Ansi object owns ESC / RESET / ERASE_LINE; AnsiColor.code + wrap + spinner clear route through it; dead AnsiColor.Companion.RESET deleted. Session-extension bracket events (Completed/Failed) on Agent/Pipeline/Branch/Parallel switched from non-suspending trySend → suspending send so terminal events can't be dropped silently; inner per-token emitter stays on trySend (typealias is non-suspending) but now logs JUL warnings on failure. agents_engine.internal.BuildInfo.version reads Implementation-Version from the JAR manifest (stamped by tasks.jar { manifest { ... } }); McpServer.SERVER_VERSION / McpClient.CLIENT_VERSION / McpRunner.VERSION all forward to it — the three constants had drifted to 0.1.3 / 0.1.3 / 0.3.0.
#2805 — Core/generation cleanup. enum class ToolRisk(val manifestName: String); fromManifest derives from entries instead of a duplicate when-block. Agent.describeBudget() reflection (BudgetConfig::class.members) replaced with BudgetConfig.describeOverrides() — restores reflect-optional contract (#1718). Broad catches in GenerableSupport / LenientJsonParser / GeneratedMetaCache FINE-logged via new tryGenerable helper; GeneratedMetaCache.tryLoad narrowed to LinkageError / ReflectiveOperationException / SecurityException. ManifestYaml.parsePolicyMap literal 0/2/4/6 indent levels replaced with named DEPTH_TOPLEVEL/SECTION/LEAF/FILESYSTEM_LEAF constants.
#2804 — Model-layer cleanup. AgenticLoop reuses RESERVED_MEMORY_TOOL_NAMES (no parallel inline set). Named constants MANIFEST_HASH_PREFIX_LEN=12, BLOB_HASH_PREFIX_LEN=12, ANTHROPIC_MAX_CACHE_BREAKPOINTS=4, EPHEMERAL_TTL_BOUNDARY_MINUTES=5L. New MutableList.reserveName(name) collapses 5× duplicated require(...) for "Tool already defined". Severity.valueOf bad parse logs at WARNING. 4 near-identical AgentEvent.ToolCallFinished emit blocks → emitToolFinished(...) helper; 5 inline agents_engine.runtime.events.AgentEvent.… FQNs removed.
#2799 — JSON escape consolidation. JsonEscape moved to agents_engine.internal so generation + core can depend on it without inverting the model→generation direction. generation.GenerableSupport.escapeJson, core.ToolPolicy.ManifestJson.quote, core.Snapshot all flow through toJsonString() now. The repeated {"type":"object","properties":{},"additionalProperties":true} literal promoted to internal.OPEN_EMPTY_OBJECT_SCHEMA_JSON. ClaudeClient removeSuffix("}") + ",$cc}" cache-control surgery extracted to appendCacheControlToBlock / appendCacheControlToLastBlock helpers. New control-char regression test in UntrustedToolOutputTest.
#2796 — Shared JsonRpc helper for MCP. New agents_engine.mcp.JsonRpc consolidates encodeRequest/encodeResult/encodeError/parseEnvelope/isNotification. JsonRpcWire owns the literal "2.0", wire keys, and notification prefix. JsonRpcErrorCode names -32700/-32600/-32601/-32602/-32603 as PARSE_ERROR/INVALID_REQUEST/METHOD_NOT_FOUND/INVALID_PARAMS/INTERNAL_ERROR. New sealed class McpException : IllegalStateException (extends ISE for back-compat) with Transport / Protocol / ToolFailure subclasses.
#2792 — Shared HttpModelClientSupport. HttpModelClientSupport.sendBounded(http, request, providerLabel, maxResponseBytes) consolidates the duplicated bounded-read + OOM-guard pattern; Claude / OpenAI / Ollama sendChat all delegate. ModelClient.chatStream(messages) (default impl) delegates to chatStream(messages, jsonSchema = null) instead of carrying a byte-identical 28-line clone.
#2800 — Dedup MCP client list/text-block + Skills factories. 4 file-private helpers in McpClient.kt: resultArray(result, key), joinTextContent(blocks, contentKey), prefixed(prefix, name), makeMcpSkill(name, description, impl). toolSkills/promptSkills/resourceSkills all flow through the factory (8 boilerplate lines each → 3-4).
#2794 — toLlmInput + jsonSerialize collapse. Both flow through a single parameterised serializeForLlm(value, quoteTopLevelStrings) walker. Deferred (out of scope for the maintainability pass — flagged in commit body): the forEachGenerableParam 6-walker unification and constructFromMapReflective 5-job split.
#2801 — Primary (String) -> Any? overload. New LiveShow.from(invoke, ...) and LiveRunner.serve(invoke, args, ...) overloads. Future operator types just pass myAgent::invokeSuspend — no edit to LiveShow/LiveRunner required. The six typed overloads stay for source-compat.
#2807 — Detekt static analysis. detekt 1.23.7 plugin wired into root build.gradle.kts. detekt.yml enables complexity (LongMethod, LargeClass, CyclomaticComplexMethod, NestedBlockDepth), exceptions (SwallowedException, TooGenericExceptionCaught/Thrown), style (MagicNumber with sensible allowlist, UnusedPrivateMember), naming (FunctionNaming), potential-bugs, empty-blocks. detekt-baseline.xml freezes current violations so the build stays green on existing code; new violations fail. README's "First 10 Minutes" lists ./gradlew detekt alongside ./gradlew test.
Notes
No API removals; every 0.6.5 caller compiles and runs unchanged.
agents_engine.model.JsonEscape → agents_engine.internal.JsonEscape: only the internal qualifier is package-visible, so this is a binary-compatible relocation for any consumer using only the public API.
The detekt baseline file (788 lines) is checked in; future PRs are held to the rules without retroactively forcing cleanup of the audited code.

v0.6.5

30 May 18:48
87c35aa

Choose a tag to compare

Fixed — Hardcoded 60s LLM request timeout killed long Sonnet turns (#2850)

  • ClaudeClient / OpenAiClient / DeepSeekClient / OllamaClient — bumped DEFAULT_REQUEST_TIMEOUT from 60.seconds to 300.seconds. Field report against 0.6.4 showed long Sonnet turns (multi-step agentic loops with extended thinking) consistently breached the 60s cap on the JDK HttpClient, surfacing as HttpTimeoutException: request timed out and tearing down the streaming Flow. New floor matches what production agents actually need; 0.6.5 callers see no behavior change unless they were silently relying on the truncation. DEFAULT_CONNECT_TIMEOUT stays at 10.seconds — healthy networks never spend that long on TCP connect.
  • model { requestTimeout = …; connectTimeout = … } — tunable from the DSL on every built-in provider (Ollama, Claude, OpenAI, DeepSeek). Both fields default to null, which falls back to the adapter's DEFAULT_REQUEST_TIMEOUT / DEFAULT_CONNECT_TIMEOUT. Set the override when long-context calls, big Ollama generations, or extended-thinking turns regularly approach 5 minutes. Wired through ModelConfig.requestTimeout / connectTimeoutdefaultClientFor() → each adapter ctor — no shared global; per-agent, per-config, per-test.
  • No public API removals — additive only. Existing ModelBuilder callers compile and run unchanged.

Added — Files convenience surface

  • agents_engine.content.Files — one-line file loading for the typed Content hierarchy. Files.load(path, store): Content reads the file, detects modality + mime from filename extension (case-insensitive, no magic-byte sniffing), puts bytes via the BlobStore, returns the right Content variant. Same ContentRef.hash as a manual store.put. Throws UnknownExtensionException (names the extension + path + full list of known extensions) on unrecognised.
  • Variants: loadOrNull (null-on-unknown), loadAll (throws on first unknown), loadAllOrSkip (silently skips — directory ingestion), canonicalExtensionFor(content) (inverse mapping), knownExtensions: Set<String> (predicate for callers).
  • Extension coverage: every wireMime on every modality variant has at least one canonical extension. Image: png, jpg/jpeg, gif, webp. Audio: mp3, wav, flac, ogg. Video: mp4, webm, mov. Document: pdf, docx, md/markdown, html/htm, txt.
  • 13 unit tests pin per-extension mapping, hash round-trip, case-insensitivity, unknown-extension behavior on every entry point, and the canonical-extension inverse for all 17 variants.

Added — Typed agent attachments (#2470 slice b)

  • agent.invokeWithAttachments(input, attachments) + suspending sibling invokeSuspendWithAttachments — user-facing API for vision input via typed Content.Image. The runtime dereferences each ref against the agent's injected BlobStore, base64-encodes once, and attaches ImagePart to the first user LlmMessage. Per-provider wire translation is the slice-a work — this commit routes the typed surface into it.
  • Agent.blobStore: BlobStore? + blobStore(store) DSL — optional injection; null when the agent doesn't take attachments. Passing attachments to an agent with no blobStore errors fast at invoke time with a clear message — caller misconfiguration surfaces before any provider HTTP.
  • Closed mime mappingImageMime → ImagePart.WireMime for all four variants (Png, Jpeg, Gif, Webp). No String conversion at any boundary.
  • Forensic-friendly errors — when a ref's blob is missing from the store, the error names the ref's hash prefix. Helps debug snapshot resumes against partially-purged stores.
  • Non-image variants skipped in v1Content.Text / Document / Audio / Video flow through the attachment path as no-ops. Slice c will wire Document via provider doc-input adapters; Audio/Video land in Stage 2.
  • Empty / all-skipped attachments → null images — no provider sees an empty array; legacy wire shape preserved.
  • Resume compositionattachments argument is ignored on resume because the restored conversation already carries the original LlmMessage.images on the saved user turn.
  • Tests: 8 unit cases (AgentAttachmentsTest) + 6 live cases (AgentVisionLiveTest) running the same VisionFixtures from slice a through the agent surface on Ollama qwen3-vl:8b, Claude Haiku 4.5, OpenAI gpt-4o-mini. See docs/multimodal.md.

Added — Vision input across all providers (#2470 slice a)

  • LlmMessage.images: List<ImagePart>? = null — new optional field; back-compat default leaves the wire shape byte-identical to pre-#2470 for callers that don't pass images. Closed ImagePart(base64, wireMime) with WireMime sealed type (Png, Jpeg, Gif, Webp) — String mime is intentionally not accepted in the public ctor.
  • Per-provider adapters translate vision on role = "user" messages:
    • Ollama: {role:"user", content:"text", images:["<b64>", ...]} — works with qwen3-vl:8b, llava, llama3.2-vision, etc. Non-vision models silently ignore the field.
    • Claude: typed content array — [{type:"text"}, {type:"image", source:{type:"base64", media_type:"image/png", data:"<b64>"}}, ...]. Works with all Claude vision-capable models (Haiku 4.5, Sonnet 4.6, Opus 4.7).
    • OpenAI: typed content array — [{type:"text"}, {type:"image_url", image_url:{url:"data:image/png;base64,<b64>"}}, ...]. Works with gpt-4o, gpt-4o-mini, gpt-4-turbo, the o* reasoning models.
    • DeepSeek: inherits the OpenAI adapter shape; current DeepSeek models lack vision and silently ignore the field. Shape-tested; no live call to avoid spending on a no-op.
  • Role-gated: non-user messages (system/assistant/tool) with non-null images ignore the field on the wire — no provider's API accepts images on those roles. Pinned by tests.
  • Programmatic fixtures in src/test: VisionFixtures.threeSquaresPng() (256×256 red/blue/green squares for "count the squares" eval) and VisionFixtures.housePng() (256×256 cartoon house for "what is this?" eval). Rendered via BufferedImage + ImageIO — reproducible byte-for-byte across machines and CI, no external assets in the repo.
  • Live integration tests (VisionLiveTest) cover all three vision-capable providers with cost discipline (temperature = 0, maxTokens = 80, single-turn, ~5KB base64 payloads): Ollama qwen3-vl:8b (tagged live-llm, runs via :integrationTest), Claude claude-haiku-4-5 and OpenAI gpt-4o-mini (tagged live-cloud-api, runs in default :test with assumeTrue skipping when no key). Model names overridable via env. Assertion shape is loose keyword-match — robust against per-model phrasing variance.
  • 8 wire-format unit tests pin per-provider JSON shape + the no-images back-compat path. See docs/multimodal.md.

Added — Multimodal foundation (#2465 epic, Stage 1)

  • Typed Content hierarchy (#2466)sealed interface Content with variants Text, Image, Audio, Video, Document in package agents_engine.content. Each non-text variant carries a ContentRef plus a typed mime (ImageMime, AudioMime, VideoMime, DocMime). Mime types are closed sealed interfaces with wireMime: String accessors — no String mime in any public API. Extension property Content.modality: String is the audit-stable per-variant name. Stage 1 wires Image + Document end-to-end (the modalities the 0.8 spec → product loop consumes); Audio + Video are modelled now and exercised through provider adapters in Stage 2 (#2470, deferred).
  • ContentRef + BlobStore (#2467) — content-addressed reference (hash: String SHA-256 hex, sizeBytes: Long, wireMime: String). BlobStore interface with InMemoryBlobStore (defensive byte-array copies on put + get) and FileBlobStore(dir) (one file per blob, filename = hash, atomic tmp + rename, survives process restart, idempotent put). Hash family matches the manifest hash (#1912) and snapshot filename hash (#2753) — single algorithm across the audit surface. Public top-level computeContentHash(bytes): String for byte-level comparison without a store.
  • ToolResult (#2469)data class ToolResult(parts: List<Content>) for tools that return mixed content (a screenshot tool returns text + image; OCR returns extracted text + the source PDF ref). Just another Any? the tool executor returns — no ToolDef signature change; existing tools that return strings keep working byte-for-byte. AgenticLoop renders multipart returns as text + [modality: <wireMime>] (<hash-prefix>, <size>B) placeholders for the LLM tool-result message; provider-specific multipart rendering (vision-capable Claude/OpenAI/Gemini) is sibling #2470 (deferred). JSONL audit exporter gains an outputParts: List<String>? column on audit rows — for ToolResult returns it emits one entry per part as <modality>:<hash-prefix>:<sizeBytes>:<wireMime> (text parts as text:inline:<charCount>:text/plain); blob bytes never enter the audit row. Field is null for non-multimodal returns — legacy audit rows unchanged. EXPECTED_FIELDS schema-pin updated to include the new column. Composes with snapshot/resume (refs serialise, blobs stay external) and untrustedOutput (the text-summary rendering goes through the existing JSON envelope). See docs/multimodal.md.

Added — Eval harness (#2491 epic, feature-complete)

  • DeterministicModelClient (#2492)agents_engine.testing.DeterministicModelClient(scripted: List<LlmResponse>) (or vararg ctor) hands back pre-scripted responses one per chat call. No network, byte-deterministic. requests records every message list the agent built up; remaining() reports unconsumed responses. Exhaustion throws `Determin...
Read more

v0.6.3

29 May 11:40

Choose a tag to compare

[0.6.3] — 2026-05-29

"Prompt-caching foundation + Koog-bug regression net." Ships the vendor-neutral prompt-caching DSL — the foundation of the #2655 epic — and lands the first eight Koog issue-set regression checks under #2474 (five real fixes including the sealed @Generable parent-dispatch unblock, plus three regression-pin tests against the existing contracts).

Added

  • Vendor-neutral prompt-caching DSL + neutral hint model (#2656, part of the #2655 epic) — agent-controllable prompt caching declared in provider-agnostic terms. New caching { } block: enabled (default true), cacheSystemPrompt / cacheToolDefs (default true — byte-stable system prompt + KSP-stable tool defs, #1703), cacheConversation = None | Rolling (default None; opt-in because rolling has per-vendor write cost), ttl (null = provider default), plus a cacheable(id, ttl) { content } helper for per-segment marking of large retrieved documents / instruction sets. Internally, the agentic loop attaches a neutral CacheHint(segment, ttl, breakpoint) (with sealed CacheSegment { SystemPrompt; ToolDefs; Conversation; Custom(id) }) to LlmMessage at message-assembly time. LlmMessage gains an optional cacheHint: CacheHint? = null field — backward-compatible: existing adapters ignore it, preserving the pre-#2656 wire shape exactly. No provider cache types (cache_control, Gemini cache IDs, …) appear in the public API. Per-provider adapter consumption (Anthropic / OpenAI / Gemini / DeepSeek / Ollama) lands in #2658-#2662; stability guard in #2657; observability in #2663. See the Prompt Caching wiki page.
  • SessionHistory — ergonomic, stable history accessors over AgentSession events (#2485, addresses Koog signal under #2474)class SessionHistory(events: List<AgentEvent<*>>) exposes toolCalls() / toolResults(excludeErrors = false) / assistantMessages() / completedOutput() / failed() / skillsStarted(). Thin wrapper — no new state, deterministic ordering from the source flow, no allocation beyond filtered list materializations. ToolCallRecord(callId, toolName, arguments) and ToolResultRecord(callId, toolName, result, isError) are the surfaced shapes. Not in v1: a userMessages() accessor — the agent input is passed to agent.session(input) directly and is not surfaced as an event; adding it requires a new AgentEvent.UserMessage and is out of scope for this slice.

Changed

  • Unknown / unlisted tool name mid-loop is now recoverable, not fatal (#2476, regression for Koog signal under #2474) — when the model emits a tool name absent from the active skill's allowlist (whether outright unknown or belonging to a different skill on the same agent), the agentic loop previously threw IllegalStateException and the run died. It now appends a tool-result message naming the bad call and listing the skill's allowed tools, then continues — so the model gets a turn to self-correct. The disallowed executor still never runs (authorization boundary unchanged), the skill's allowlist is the only set named (no leak of the wider agent.toolMap), and streaming consumers see a ToolCallFinished(isError = true) for the rejected call. Pinned by KoogRegressionUnknownToolTest; ToolAuthorizationTest rewritten to assert the recovery contract (two of its prior assertions were accidentally passing via fail() message contents — replaced with honest tool-message inspection).
  • McpServer tools/call now serializes @Generable outputs as JSON, not as Kotlin debug toString (#2483, regression for Koog signal under #2474)McpServer.handleToolCall previously rendered the executor's return value through output?.toString(), leaking the Kotlin data-class debug shape (SearchPayload(text=Hello, source=wiki)) into the MCP text content. Routed through toLlmInput instead: @Generable outputs render as JSON ({"text":"Hello","source":"wiki"}), String stays clean, and primitives stay clean. Non-@Generable typed outputs still fall back to .toString() — documented limitation, register a @Generable output type for typed MCP boundaries.
  • Enum-typed fields now appear in JSON Schema with a typed value list (#2479 part 1, regression for Koog signal under #2474)KType.jsonSchemaTypeObject previously fell through to {"type":"string"} for enum-typed constructor parameters, so the LLM had no way to know which values were valid and the constrained-decoding provider path couldn't enforce them. Enums now render as {"type":"string","enum":["veryHigh","normal","low"]} with constant names emitted verbatim from Enum.name — no case mutation, no @SerialName-style lowercasing. Mixed-case constants (RED / Green / blue) survive intact. The tool_choice configurability half of #2479 is a separate slice (ToolChoice { Auto | Required | None | Specific(name) } API + adapter wiring).
  • Sealed @Generable parent classes now deserialize via type-discriminator dispatch (#2482a, regression for Koog signal under #2474)KClass<Sealed>.constructFromMap(...) previously returned null because primaryConstructor is null on sealed parents. The schema-gen path emits {"oneOf": [...]} for sealed types, so any MCP-exposed skill (or other typed entry point) declaring a sealed @Generable input was unusable — the model could produce a matching payload, the server couldn't read it. constructFromMapReflective now checks isSealed, looks up the matching variant by the type discriminator, and recurses — including the data object case via objectInstance. Unknown variants and missing-discriminator maps return null so the call routes through onError.invalidArgs instead of constructing a wrong-shape value.
  • Stringified-JSON coercion for nested object / list / sealed fields (#2482b, regression for Koog signal under #2474) — when the LLM emits a typed field whose value is a JSON string (instead of a nested object / array), coerceValue now parses the string with LenientJsonParser and continues coercion. Guarded: String fields are NOT JSON-decoded (a value like "The {weather} report" stays the literal string — String::class matches first in the when), and unparseable JSON for an object/list field returns null so the failure routes through onError.invalidArgs. Composes with #2482a — a sealed-typed field accepts a JSON string carrying the type discriminator.

Tests

  • Koog issue-set regression suite — first slice (#2474) — pin Agents.KT contracts where Koog broke. #2475 ships KoogRegressionWrongTypedArgsTest (3 cases): (1) scalar Number → String is intentional coercion per coerceValue (not a malformed arg — executor runs with the stringified value); (2) a truly-unparseable value for a typed field (e.g. "abc" for count: Int) routes through onError.invalidArgs with end-to-end recovery via RepairResult.Fixed, executor runs exactly once for the repaired call; (3) without a handler the failure is the framework's ToolExecutionException with typed-arg context — never a raw kotlinx.serialization / NumberFormatException.
  • Koog regression — loop protection (#2480)KoogRegressionLoopProtectionTest (4 cases) pins budget { maxConsecutiveSameTool = N }: same tool past the cap throws BudgetExceededException(reason = CONSECUTIVE_TOOL) naming the offending tool; an interleaved call resets the counter (alpha → beta → alpha → beta) so an alternating agent doesn't trip; name-only semantics — varying args still trip the cap (stricter than the Koog signal's "identical args" framing — Agents.KT catches more loop shapes); pre-cap threshold listener (onBudgetThreshold) fires for CONSECUTIVE_TOOL. Repeated-identical-assistant-output detection mentioned in the Koog signal is NOT yet implemented — known gap, separate detector if/when needed.
  • Koog regression — OpenRouter-style streaming chunk reconstruction (#2478)KoogRegressionStreamingChunkReconstructionTest (3 cases) feeds synthetic chunk sequences through chatOrStream and pins: OpenRouter shape (toolName in the first ToolCallStarted only, args split across N ToolCallArgumentsDelta chunks, finalized by ToolCallFinished) reconstructs into one coherent LlmResponse.ToolCalls entry with full args; every wire arg-delta surfaces as exactly one AgentEvent.ToolCallArgumentsDelta event in arrival order verbatim (so streaming UIs can show JSON building up); interleaved chunks for parallel calls route by callId and reconstruct both calls cleanly; an orphan args delta (no preceding ToolCallStarted) doesn't crash the aggregator and doesn't fabricate a Started — the delta still fires as a consumer event so a UI sees the wire activity.
  • ClaudeClientChatStreamLiveTest stabilised (#2723)1..50 prompt was still small enough for Haiku 4.5 to occasionally batch the entire response into ~3 same-millisecond SSE chunks, failing the >=10ms gap OR >=5 chunks assertion intended to catch wire-level re-bundling regressions. Bumped to 1..200; three consecutive validation runs each report 8 chunks across ~1.3s of streaming. Also corrected the test's stale doc comment — the actual @Tag is live-cloud-api running under default :test, not live-llm running under :integrationTest.

v0.6.2

29 May 09:54

Choose a tag to compare

Title:

v0.6.2 — Attribution you can filter by

Body (paste into the description field):

"Attribution you can filter by." Closes the bridge-observability gap that every downstream Langfuse / LangSmith / OTel consumer was working around.
AgentRuntimeContext now carries free-form business attribution alongside the technical correlation fields, so bridges can drop their per-bridge
ConcurrentHashMap<requestId, userId> + onBeforeTurn capture pattern and read user / project / dialog identifiers directly off the runtime context.

implementation("ai.deep-code:agents-kt:0.6.2")
implementation("ai.deep-code:agents-kt-ksp:0.6.2")  // optional but recommended

Drop-in for 0.6.0 / 0.6.1 consumers — every new surface is additive, off by default.

---
Headlines

Native attribution on AgentRuntimeContext (#2720)

AgentRuntimeContext.attribution: Map<String, String> plus typed accessors userId / projectId / dialogId (canonical keys via AttributionKeys.{USER_ID, 
PROJECT_ID, DIALOG_ID}). Set once at the session boundary; every nested AgentEvent / PipelineEvent surfaces it:

withAgentRuntimeContext(
    AgentRuntimeContext.currentOrNew().copy(
        attribution = mapOf(
            AttributionKeys.USER_ID to userId,
            AttributionKeys.PROJECT_ID to projectId,
            AttributionKeys.DIALOG_ID to dialogId,
            "tenantId" to tenantId,   // arbitrary keys round-trip
        ),
    ),
) {
    agent.session(input).events.collect { event ->
        // event.userId / event.projectId / event.dialogId all populated
        // event.attribution["tenantId"] == tenantId
    }
}

Non-breaking — defaults to emptyMap(). Replaces the per-bridge side-channel pattern (ConcurrentHashMap<requestId, userId> + onBeforeTurn capture).

Bundles the 0.6.1 batch

Since 0.6.1 shipped on a parallel release branch and never merged to main, 0.6.2 rolls it up:

- Snapshot/resume foundation (#2416) — experimental — message-history-as-state design; ships SessionSnapshot, SnapshotStore (InMemory + File with atomic
temp-write + rename), the executeAgentic turn-boundary checkpoint + resumeFrom seam, MemoryBank snapshot/restore. Round-trip proven by test (3 turns →
crash → fresh agent → restore → finish).
- Reasoning/thinking stream (#2406) — opt-in model { reasoning(...) } surfaces a model's reasoning as AgentEvent.Reasoning on a channel separate from the
answer Token stream. Claude / DeepSeek / Ollama emit reasoning text; OpenAI Chat Completions reports reasoning_effort + TokenUsage.reasoningTokens.
- onBudgetExceeded (#2412) — raise a budget cap and continue mid-run via BudgetDecision.Extend(newLimit) instead of throwing. Currently wired for the
tool-call cap.
- onToolDenied + PipelineEvent.ToolDenied (#2395) — calls blocked by an onBeforeToolCall Decision.Deny are now first-class observable. Previously denials
silently dropped from onToolUse / observe { }.
- Typed parameter schemas for built-in tools (#2379) — memory_*, forum_return, swarm absorb carry typed @Generable schemas instead of relying on
providers' permissive empty-properties fallback. No public API change.

Dependency bumps in the published artifacts

- org.jline:jline 3.27.14.1.2
- com.google.devtools.ksp:symbol-processing-api 2.3.72.3.8 (the :agents-kt-ksp module)
- Gradle wrapper 9.5.09.5.1

---
Verified

Full ./gradlew test green on the new toolchain — 1596 tests, 0 failures across all 7 modules. LiveShow JLine path (terminal builder + history + readLine +
 EOF/interrupt handling) verified under the 4.x line.

Documentation

- docs/observability.md (https://github.com/Deep-CodeAI/Agents.KT/blob/v0.6.2/docs/observability.md) — bridge consumption pattern for attribution.
- docs/regulated-deployment.md (https://github.com/Deep-CodeAI/Agents.KT/blob/v0.6.2/docs/regulated-deployment.md) — how attribution interacts with the
audit story.
- CHANGELOG.md [0.6.2] (https://github.com/Deep-CodeAI/Agents.KT/blob/v0.6.2/CHANGELOG.md) — full line-by-line.

What's next

- Composition snapshots for Pipeline / Forum / Loop / Branch (#2386 Phase 2c)
- Manifest-hash restore guard on snapshot resume (#2386 Phase 2b)
- Mid-tool coroutine suspension (depends on #638)
- The 0.7.0 epic — enterprise policy layer + human-in-the-loop (#2487)

**Optional attachments** to drag-and-drop into the release form (so consumers can verify the Maven Central jars against the exact GPG-signed bundle):
- `build/agents-kt-0.6.2-combined-bundle.zip` — single file, both artifacts
- Or separately: `build/agents-kt-0.6.2-bundle.zip` and `build/agents-kt-ksp-0.6.2-bundle.zip`

Check **"Set as the latest release"**, leave the other defaults, click **Publish release**.

v0.6.1

27 May 23:11

Choose a tag to compare

Added

  • Reasoning/thinking stream (#2406) — opt-in model { reasoning(budgetTokens = …, effort = …) } surfaces a model's reasoning as AgentEvent.Reasoning, separate from the
    answer Token stream, so a UI can render live reasoning instead of a spinner. Claude (extended thinking), DeepSeek (reasoning_content), and Ollama (thinking) emit reasoning
    text; OpenAI Chat Completions reports reasoning_effort + TokenUsage.reasoningTokens only.
  • onBudgetExceeded (#2412) — when a budget cap would throw, return BudgetDecision.Extend(newLimit) to raise the cap and continue, or Stop to throw. A long-running
    agent can grant itself more tool calls mid-run instead of failing. Wired for the tool-call cap.
  • onToolDenied + PipelineEvent.ToolDenied (#2395) — tool calls blocked by an onBeforeToolCall Decision.Deny are now first-class observable (audit no longer silently
    drops blocked attempts).
  • Snapshot/resume foundation (#2416, experimental)Snapshotable, SessionSnapshot, SnapshotStore (InMemory + File), and the loop resumeFrom/checkpoint seam. An
    agent's resumable state is its message history, so resume re-enters the loop rather than suspending a coroutine.

Changed

  • Typed parameter schemas for built-in tools (#2379)memory_*, forum_return, and swarm delegates now declare real schemas instead of the permissive empty-properties
    fallback. No public API change.

Install

implementation("ai.deep-code:agents-kt:0.6.1")

v0.6.0

25 May 20:10
abae433

Choose a tag to compare

[0.6.0] — 2026-05-24

"Boundaries you can audit." The 0.6.0 epic (#1911) turns Agents.KT's typed-boundary model into auditor-ready evidence: deterministic permission manifests with runtime hash correlation, append-only JSONL audit, before-interceptor guardrails, typed tool / MCP-tool hierarchies, vendor-neutral observability bridges (OTel / LangSmith / Langfuse), constrained decoding for @Generable outputs, DeepSeek as a fourth provider, and onTokenUsage telemetry. Existing consumers see no behavior change unless they opt into the new surfaces.

Added

Permission manifest — the 0.6.0 hero feature (#1912)

  • :agents-kt-manifest moduleagentManifest(agent) returns a deterministic capability graph: every agent, skill, tool, knowledge entry, MCP endpoint, provider, budget, and policy boundary in a system, in YAML or JSON, with stable ordering and masked provider secrets.
  • verifyAgentManifest Gradle task — diffs the current manifest against a checked-in baseline; fails the build on capability widening (new tools, new MCP endpoints, broader policies) so reviewers always see surface-area changes before they merge.
  • Manifest SHA-256 propagates into the runtime — every PipelineEvent / AgentEvent carries the manifestHash of the agent that emitted it, so static manifest and dynamic audit trace tie back to the same approved capability set.
  • Provider secrets masked — API keys, base URLs containing credentials, and any field marked @SecretSafe are redacted from the emitted manifest.

Runtime event context (#1913)

  • manifestHash, requestId, sessionId on every runtime eventPipelineEvent and AgentEvent both carry them, so JSONL audit / OTel / LangSmith / Langfuse downstreams all bind events to the manifest hash that was authoritative at invocation time.
  • withAgentRuntimeContext { ... } extension — Kotlin-coroutines-context-aware threading so nested compositions (then, branch, loop, forum, wrap) inherit the outer request/session/manifest correlation without re-derivation.

JSONL audit exporter (#1914)

  • :agents-kt-observability JsonlAuditExporter — append-only, one-line-per-event audit format with requestId, sessionId, manifestHash, agent/skill/tool ids, event type, provider, and model. Raw arguments and results are omitted by default; opt-in via includeRawArgs = true / includeRawResults = true when the audit consumer needs them.
  • Stable canonical field ordering — same audit row produces the same JSON line on every run, so the file is grep-friendly and diff-able.
  • PII-safe defaults — designed for the regulated-deployment workflow in docs/regulated-deployment.md.

Before-interceptor guardrails (#1907)

  • onBeforeSkill / onBeforeToolCall / onBeforeTurn — Rails-style interceptors returning a sealed Decision { Proceed | ProceedWith(...) | Deny(reason) | Substitute(result) }. Sibling to the post-hoc onToolUse / onSkillChosen / onError observer hooks already in 0.4.x.
  • Chain semantics — interceptors run in registration order; every interceptor runs; the first non-Proceed wins; Deny short-circuits with an onUnauthorizedToolCall-shaped audit event; Substitute skips the model and returns the substituted value.
  • Unified use cases — per-client tool policy (McpServer per-principal allowlists), action confirmation (Escalate(reason, reviewerRole) resumed by the host app), prompt-injection filtering as a one-liner, uniform perToolTimeout wrapping. See docs/interceptors.md.

Declarative tool policy (#1915)

  • ToolPolicy DSL on tool { policy { … } } — declares tool risk (LOW / MEDIUM / HIGH / CRITICAL) plus filesystem / network / environment declarations. Consumed by the permission manifest and by audit-row formatters.
  • No runtime enforcement yet — the sandbox-enforcement work is deferred to 0.7.0 (#1916). 0.6.0 ships the declaration surface so manifest reviewers can already see "this tool reads ~/.ssh" or "this tool calls *.openai.com" at policy-review time.

Typed tool + MCP-tool hierarchies (#1948)

  • Tool<IN, OUT> typed handlestool<Args, Result>("name", "desc") { args -> ... } returns a Tool<Args, Result> with phantom types so Skill.tools(addTool, divideTool, …) is compile-time-checked instead of stringly-typed.
  • McpTool<IN, OUT> — every MCP-imported tool also gets a typed handle via McpClient.tools(prefix). Composes with the same Skill.tools(...) builder. Additive alongside the existing MCP-as-skill adapter.

MCP server hardening (#1902)

  • Inbound bearer authMcpServer.tokens(...) configures principal → token mappings; unauthenticated requests get a structured 401. McpStdioServer shares the same authn surface for stdio deployments.
  • Host / Origin allowlists — DNS-rebinding and CSRF defenses against browser-side localhost exploits; explicit allowlist required for non-loopback hosts.
  • Per-principal tool policy — each principal can have its own subset of agent skills exposed as MCP tools. Policy decisions flow through the onBefore* chain and into audit events.
  • Default-deny — unconfigured server rejects everything except initialize / tools/list; opt-in for each authorization grant.

Stdio MCP server transport (#2045)

  • McpStdioServer.from(agent) — exposes the same agent surface (tools, prompts, resources, tools/listChanged: false) over line-delimited stdio instead of HTTP. Same authentication + policy plumbing as the HTTP server.
  • McpRunner --stdio — picocli-style one-liner for shipping agents as stdio-MCP services without a Gradle dependency on :server-style infrastructure.

LiveShow line editing (#985)

  • LineEditor — line-discipline-aware input handling for the LiveShow runner: cursor movement, history, kill-line, basic readline-style navigation, all while the agent streams events to the display.
  • Cancellation-safe — collector cancellation propagates through the editor; no orphaned threads.

Runtime observability bridge (#1908)

  • ObservabilityBridge in :agents-kt-observability — vendor-neutral bridge contract with onPipelineEvent, onAgentEvent, and onInterceptorDecision, plus .observe(bridge) for one-call wiring.
  • :agents-kt-otel module — OpenTelemetry adapter that maps agent sessions to agent.invoke spans, model turns to gen_ai.chat spans, tool calls to gen_ai.tool child spans, errors to span status, usage to GenAI attrs, and before-interceptor decisions to span events.
  • :agents-kt-langsmith module — LangSmith run-tree adapter that maps skill invocations to chain runs, model turns to child llm runs, tool calls to child tool runs, failures to run errors, budget threshold events to run extras, and interceptor decisions to run tags. Dispatch is asynchronous, batched, oldest-drop under backpressure, and never throws into the agent path.
  • :agents-kt-langfuse module — Langfuse trace adapter that maps skill invocations to traces, model turns to generations, tool calls to spans, runtime events to Langfuse events, and interceptor decisions to tags plus interceptor.decision observations. Dispatch is asynchronous, batched, oldest-drop under backpressure, and uses Langfuse's native ingestion endpoint without a vendor SDK.
  • Core remains vendor-free — OTel, LangSmith, and Langfuse integration code is isolated to adapter modules.

Provider constrained decoding (#1949)

  • @Generable schemas are threaded into provider payloads — OpenAI receives response_format.json_schema, Ollama receives format, and Anthropic receives a structured-output tool path for typed agentic outputs.
  • Provider capability detectionModelClient.supportsConstrainedDecoding gates schema forwarding so unsupported adapters keep the existing repair-loop behavior.

DeepSeek provider adapter

  • model { deepseek(name); apiKey = ... } — OpenAI-compatible Chat Completions adapter with DeepSeek provider identity, configurable deepSeekBaseUrl, usage normalization, streaming through the OpenAI-compatible SSE path, and manifest provider metadata.
  • Constrained decoding stays disabled for DeepSeek — the adapter does not send OpenAI response_format.json_schema because DeepSeek documents JSON-object mode rather than that schema payload.

Token usage telemetry (#2354, #2355, #2356, #2357)

  • Public Agent.onTokenUsage { usage: TokenUsage -> } listener — fires once per successful LLM round-trip that reports usage, including streaming paths at end-of-stream. Tool-use cycles fire once per provider response, not once per agent invocation.
  • Widened TokenUsage — now carries promptTokens, completionTokens, cachedInputTokens, provider, and model. total remains prompt + completion; cached tokens are a provider-visible subset of prompt tokens, not an extra addend.
  • Provider-normalized usage mapping — Anthropic maps input_tokens / output_tokens / cache_read_input_tokens with provider = "claude"; OpenAI maps prompt_tokens / completion_tokens / prompt_tokens_details.cached_tokens with provider = "openai"; Ollama maps prompt_eval_count / eval_count with cachedInputTokens = null and provider = "ollama".
  • Listener safety semantics — missing usage does not fire, LLM failures do not fire and remain covered by onError, multiple listeners run in registration order, and listener exceptions are logged and swallowed so telemetry cannot break the agent run.

Tests

  • Added OnTokenUsageTest coverage for widened fields, multi-listener ordering, listener-error swallowing, missing-usage skip, model-failure skip with onError, multi-turn tool-use ordering, and streaming single-fire behavior.
  • Updated Anthropic, OpenAI, and Ollama adapter tests to assert ...
Read more