Releases: Deep-CodeAI/Agents.KT
v0.7.21
Security + de-slop release. Headlined by a nested-agent recursion bound (#3377) and the explicit
skill-routing failure on ambiguity (#3087), plus a build-wide one-type-per-file refactor (#3199) and
new release/quality guards (#3084 / #3089), the start of the AgenticLoop decomposition (#3376), and
honest README positioning (#3085 / #3086). Internal refactors are behavior-preserving; the two
behavior changes (routing, the maxAgentDepth default) are called out below. Drop-in on the 0.7.x line.
Fixed — bound nested agent recursion with maxAgentDepth (#3377, security)
- Budgets bounded a single agentic loop, but a tool that re-invokes an agent (Swarm
absorb,
agent-as-tool) spun up a fresh loop with a fresh budget — so a self-re-entering agent (A→A) or a
cycle (A→B→A) recursed one full LLM loop per level untilStackOverflowError, a DoS / runaway-cost
vector (triggerable e.g. by prompt injection into a tool result). NowAgentRuntimeContextcarries
a nested-invocationdepth(incremented innewRuntimeContext), andbudget { maxAgentDepth }
(default 16) is enforced at the invocation chokepoint: exceeding it throws
BudgetExceededException(BudgetReason.AGENT_DEPTH)before the over-deep loop starts — fast, no
extra LLM calls, no overflow. An unconditional safety stop (not extendable viaonBudgetExceeded),
and budget caps now bypass theonErrortool-recovery ladder so a nested cap can't be swallowed.
Changed — AgenticLoop decomposition: extract rendering + coercion (#3376 batch 1)
- First slice of breaking up the 1369-line
AgenticLoop.kt/ 765-lineexecuteAgentic. Extracted
the pure tool-result/error renderers intoToolResultRendering(formatEscalatedToolError,
formatDeniedToolError,wrapUntrustedToolResult,renderToolResultForLlm) and output coercion
intoOutputCoercion(parseOutput,coerceSubstituteOutput) — each a newinternalfile. These
wereprivateto the loop (untestable); they now have direct unit tests (ToolResultRenderingTest,
OutputCoercionTest, TDD RED→GREEN). Behavior-preserving —AgenticLoopdelegates. Internal
refactor, no public API change.
Changed — one-type-per-file complete across the codebase (#3199, final batch)
- Split every remaining multi-type file (rest of
model/, all ofcore/,content/,composition/,
generation/,runtime/,sandbox/,testing/, and themanifest/observability/langfuse
/langsmith/detektsubmodules) into one top-level type per file — ~110 new files, all
same-package moves (no FQN / public-API change).checkOneTypePerFilenow passes with an empty
allowlist: zero multi-type files remain anywhere. Renamed 3 files so the filename matches the
kept type (Snapshot.kt→SessionSnapshot.kt,Memory.kt→MemoryBank.kt,
HumanApproval.kt→ApprovalBuilder.kt), which also satisfies detekt'sMatchingDeclarationName. - Minor, non-public visibility consequence of the moves: a handful of file-
privatehelpers that
were referenced across now-separate files were promoted tointernal(still module-scoped, not
public): the manifest engines (ManifestVerifier/StableJson/ManifestJsonParser/StableYaml/
ManifestGraph), the policy JSON/YAML helpers (ManifestMaps/ManifestJson/ManifestYaml),
RuntimeContextThreadLocal,KnowledgeEntry, and thenonBlankhelper. - Behavior-preserving: full
./gradlew buildgreen (all modules + all tests + detekt 423/423 +
checkOneTypePerFile0 +checkReadmeVersion). Completes #3199.
Changed — one-type-per-file: split model error/cache types (#3199, batch 3)
- Split three
agents_engine.modelfiles into one type per file (same package — no FQN/public-API
change):ToolError.kt→Severity,EscalationException,ToolExecutionException(ToolError
sealed union stays);CacheHint.kt→CacheSegment(CacheHintstays);OnErrorBuilder.kt→
RepairResult,RepairScope,ToolErrorHandler(OnErrorBuilder+ theexecuteAgentFixhelper
stay). Allowlist 40 → 37. Behavior-preserving pure moves; detekt baseline unchanged.
Changed — one-type-per-file: split McpServer.kt (#3199, batch 2b)
- Split the four secondary types out of
mcp/McpServer.kt(same package) —RegisteredPrompt,
RegisteredResource,McpExposeBuilder,ExposedSkill→ one file each;McpServerstays
(597 → 454 lines). Four now-unused imports (constructFromMap,jsonSchema,KClass,
hasGenerableAnnotation, all moved toExposedSkill) removed. Completesmcp/— allowlist
41 → 40. Behavior-preserving pure moves.
Changed — one-type-per-file: split the mcp/ package (#3199, batch 2)
- Split five multi-type files in
agents_engine.mcpinto one type per file (same package — zero
import churn, no FQN/public-API change):AgentMcpDsl.kt→McpServerBuilder.kt;JsonRpc.kt→
JsonRpcWire/JsonRpcErrorCode/McpException(+JsonRpcstays);McpClient.kt→
McpToolDescriptor;McpRunner.kt→RunnerConfig+McpRunnerBuilder;McpServerSecurity.kt→
ClientPrincipal/McpHttpRequestContext/McpAuthDecision/McpServerAuth(original file
removed). Allowlist 46 → 41.mcp/McpServer.kt(597 lines,ExposedSkillneeds import surgery)
is deferred to batch 2b. Behavior-preserving pure moves.
Changed — one-type-per-file convention + checkOneTypePerFile guard (#3199, batch 1)
- New
checkOneTypePerFileGradle guard (wired intocheck) fails the build if a main-source.kt
file declares >1 top-level type and isn't onconfig/one-type-per-file-allowlist.txt. The allowlist
is a ratchet that may only shrink — it also fails on a stale entry (a listed file that no longer
violates), so a split must record its own burndown. Documented sealed-ADT exceptions stay listed.
MirrorscheckReadmeVersion/checkDetektBaseline. - Batch 1 split:
mcp/McpServerInfo.kt(12 MCP wire DTOs) → one type per file in the same
agents_engine.mcppackage — zero import churn, no FQN/public-API change. Newdocs/source-layout.md
documents the convention, exceptions, and the guard. Remaining multi-type files burn down
package-by-package in follow-up batches under #3199.
Changed — skill resolution extracted into SkillResolver (#3088 stage 2, de-slop #3083)
- The skill-resolution cluster — type-compatible candidate filter, manual
skillSelection { }
selector, LLM router (confidence gate), the before-skill-interceptorProceedWithcompatibility
check, and the fail-loud ambiguity error — moved out ofAgent's God-object body into its own
SkillResolvercollaborator (newSkillResolver.kt).Agentkeeps aprivate val skillResolver
and delegates. Internal refactor, behavior-preserving — every branch, condition, exception
type, and message is identical; no public DSL change.Agent.ktis now 1017 lines (1116 → 1017
across #3088 stages 1+2). Completes the staged decomposition of #3088.
Changed — README de-slop: honest positioning + accuracy fixes (#3085, #3086, de-slop #3083)
- Replaced the unqualified hero copy ("The auditable Kotlin agent runtime for regulated teams") with
a defensible positioning line ("The typed agent runtime for the JVM") plus an up-front pointer to
the Security Model and threat model, and an explicit "not a compliance product / does not OS-sandbox
arbitrary tool code" caveat in the intro. The honest enforce/don't-enforce tables already existed;
the hero no longer contradicts them (#3085). - Fixed accuracy drift between "Implemented today" and the limitations/roadmap (#3086): "Four LLM
providers shipped" → six (adds Kimi + OpenRouter); "Text-only I/O today" → image/document input
shipped, audio + generation still roadmap; Kotlin badge2.1→2.3; Phase 2 roadmap no longer
lists already-shipped image multimodal as planned. No fabricated benchmark claims were found.
Added — explicit securityCheck gate, checkDetektBaseline burndown, and TESTING.md (#3089, de-slop #3083)
- New
securityCheckaggregate task makes the deterministic security suite addressable on its
own — sandbox write-confinement (ProcessSandbox: Seatbelt / bwrap / firejail / fallback), tool-
policy enforcement (#1916), snapshot manifest guard, the arg-size cap (#2888), the tamper-evident
audit ledger (:agents-kt-observability:securityTest), and the static tool-body rules
(:agents-kt-detekt:test+detekt). OS-specific confinement skips cleanly off-platform; run
securityCheckon a macOS job to exercise Seatbelt in CI. - New
checkDetektBaselinetask (wired intocheck) fails ifdetekt-baseline.xmlgrows beyond
the recorded ceiling (424) — the baseline may only shrink, so new violations get fixed rather than
grandfathered. - New
TESTING.mddocuments, honestly, what the default gate runs and excludes
(live-llm/live-mcp/interactiveare out;live-cloud-apiis deliberately in), the
security gate, the OS-specific confinement matrix, and the baseline ratchet.
Added — checkPublishedVersion release gate + release runbook (#3084, de-slop #3083)
- New
checkPublishedVersionGradle task HEADs Maven Central forai.deep-code:agents-ktand
agents-kt-kspat the current project version and fails unless both resolve (HTTP 200). It is
not wired intocheck— it needs network and would (correctly) fail on an unreleased version
during dev — so it's the manual last gate before anything user-facing names a new version.
Override the base URL with-PcentralBaseUrl=…. ComplementscheckReadmeVersion(#2873): one
stops the README drifting from the build, the other stops the build advertising a version Central
can't serve — the exact drift (README/Gradle at0.7.2while Central served0.7.1) an...
v0.7.2
Tool-security hardening — the self-contained first phase of the capability-ABI epic (#2882),
all additive and back-compat: a tamper-evident audit ledger, an argument-size cap, and the
static tool-body guard rails. Plus a release guard so the README's advertised version can't drift
from the build.
Added — release guard: README dependency version must match the Gradle version (#2873)
- New
checkReadmeVersiontask (wired intocheck) fails the build if the
ai.deep-code:agents-kt:<version>snippet inREADME.mddiffers from the Gradle project
version — the exact drift an external 0.7.0 review flagged. README and version now move together.
Added — ToolCapabilityExtractor: static capability classification (#2884, epic #2882)
- New
ToolCapabilityExtractorinagents-kt-detektstatically classifies what a tool's executor
body actually does —FS_READ/FS_WRITE/NETWORK/ENVIRONMENT/EXEC— by walking its
call expressions and matching callee names (writeText/Files.write→ write,readText/
readAllBytes→ read,URL/openConnection→ network,getenv→ env,ProcessBuilder/exec→
exec). The reusable input the upcomingToolPolicy↔capability comparator (#2887) checks against the
declared policy. Syntactic by design (callee-name match, no FQN resolution) and intentionally
conservative — reflection / aliasing / transitive state are Pillar-3 residual.
Added — ToolAuditLedger: tamper-evident, Merkle-chained tool-action log (#2886, epic #2882)
- New
ToolAuditLedger(inagents-kt-observability, sibling toJsonlAuditExporter) — an
append-only, Merkle-chained, PII-safe record of every tool action. Each row's
entryHash = SHA-256(prevHash ‖ sequence ‖ callId ‖ toolName ‖ decision ‖ denialReason ‖ resultHash ‖ timestamp)chains to the previous, soToolAuditLedger.verify(path)recomputes the
chain and pinpoints the first edited / inserted / deleted / reordered row. The tool result is
stored only as a hash, never raw (Pillar 2 of #2882). - Auto-wire with
agent.events.ledger(file)— recordsPipelineEvent.ToolCalledasAPPROVED,
ToolDeniedasDENIED(with reason),ToolHallucinatedasHALLUCINATED, and returns the
ledger for laterverify(...). (callId-keying of denied/hallucinated rows lands once
PipelineEventcarries the callId — a scoped #2886 follow-up.)
Added — maxToolArgsBytes tool-argument size cap (#2888, epic #2882)
- New
budget { maxToolArgsBytes = … }(Long?, defaultnull= off) hard-caps a single tool
call's argument byte size, checked at one chokepoint (executeToolWithBudget) before the
executor runs — so an oversized (often prompt-injected) call is rejected, not executed. Resource-
exhaustion guard (attack A5). Unconditional likeperToolTimeout— not extendable via
onBudgetExceeded; surfaces asBudgetExceededException(reason = BudgetReason.TOOL_ARGS_SIZE).
Size is the provider wire form (ToolCall.rawArguments) when present, else the serialized arg map.
Gates both the session and regular executor paths; back-compat (null = unbounded).
Added — agents-kt-detekt rule module + ToolBodyForbiddenApis (#2885, epic #2882)
- New
:agents-kt-detektmodule ships custom detekt rules (Pillar 1 static layer). The first rule,
ToolBodyForbiddenApis, flags raw outside-world APIs (java.io.File,java.net.URL/
HttpURLConnection,ProcessBuilder/Runtime.exec,Class.forName,Unsafe, sockets) used
inside a toolexecutor { }body — a tool must reach fs/net/env only through the (forthcoming)
closedToolEnvironmentABI, so every action is policy-gated and audited. Suppressible with
@Suppress("ToolBodyForbiddenApis")+ a reviewed reason. Wired into the project's own detekt run
(scoped to main source — test fixtures legitimately exercise tools). Consumers opt in via
detektPlugins("ai.deep-code:agents-kt-detekt"). - Honest limit: syntactic (matches the callee name, not a resolved FQN) — reflection / aliasing /
transitive state changes are residual risk covered by Pillar 3 (process isolation). The capability
extractor (#2884) builds on this module next.
v0.7.1
A hardening release on top of 0.7.0 driven by external review. Small upd
v0.7.0
Boundaries you can enforce externally. The 0.6 line made tool policies declarative and
auditable; 0.7.0 makes them enforced. A tool's declared ToolPolicy now constrains it at
runtime — Layer 1 (in-JVM filesystem-argument gate, #2890) plus Layer 2 OS sandboxing (#1916):
macOS Seatbelt, Linux bubblewrap, a firejail setuid fallback, and a plain
ProcessBuilder + loud UNCONFINED warning where no tool is present. Subprocess-shaped tools are
confined to their declared write roots, a derived environment allow-list, a working directory, and a
default-deny network. And the deterministic permission manifest is now reachable outside
Gradle via the standalone agents-kt CLI (generate / inspect / verify) — a drop-in CI
gate that fails when a change widens a capability boundary.
Deferred to 0.8 (tracked, not shipped here): WasmSandbox (#2894), DockerSandbox (#2895), the
network hostname-allowlist proxy (#2893; default-deny ships, selective allow does not), and the
grants { } hierarchical structure DSL.
Added — standalone agents-kt CLI: permission manifest from a binary (#1923)
- New
:agents-kt-climodule (Gradleapplicationplugin) — the "externally" half of
the 0.7.0 arc. The deterministic permission manifest, previously reachable only through a
Gradle task, is now generatable / inspectable / verifiable from a binary, so non-Gradle
consumers (CI gates, ops, regulators) can enforce capability boundaries:agents-kt generate --entrypoint <FQN> [--classpath a:b] [--format json|yaml] [--out file]agents-kt inspect <manifest.json> [--format json|yaml]agents-kt verify (--entrypoint <FQN> [--classpath a:b] | --current <file>) --baseline <file>- Exit codes:
0ok ·1verify findings (policy widened) ·2usage ·3runtime.
- The reflective entrypoint→manifest loader was extracted from the Gradle plugin into a
Gradle-freeagents_engine.manifest.ManifestEntrypointLoader, shared by the plugin and
the CLI — a build and the CLI produce byte-identical manifests (samemanifestSha256).
verifyraises the sametool.risk.increased/tool.network.widened/
tool.filesystem.write.widenedfindings as theverifyAgentManifestGradle task. See
docs/cli.md. (A jlink/native single-file image is a packaging follow-up; the
entrypoint-loading commands reflect into arbitrary user classes and need a real JVM.)
Added — injectable HttpClient on every provider client (#2385)
model { httpClient = … }lets multiple agents share one networking surface —
a connection pool, a bounded executor that rate-limits concurrent LLM calls, an
outbound proxy, or anHttpClientalready wired to your telemetry. All four provider
clients (Ollama/Claude/OpenAI/DeepSeek) take an optionalhttpClient: HttpClient?
constructor param;ModelConfig.httpClientis threaded into each bydefaultClientFor()
(DeepSeek inherits it via itsOpenAiClientsuperclass).- Opt-in, never automatic.
null(default) → each client builds its own, byte-for-byte
unchanged. The framework provides the seam; the rate-limit/circuit-breaker/bulkhead policy
lives in your injected client. Seedocs/model-and-tools.md→ "Sharing a networking surface".
Added — automatic in-JVM tool-policy enforcement (Layer 1 of #1916, #2890)
- A tool's declared
ToolPolicyis now enforced at runtime by default. When a tool
call carries an absolute filesystem-path argument that falls outside the tool's
declaredread/writeglobs, the call is denied before its executor runs — surfacing
through the existingonToolDenied/PipelineEvent.ToolDeniedaudit path (with
toolPolicyRisk+usedDeclaredCapability). No hand-writtenonBeforeToolCall
interceptor is required anymore. Paths are normalized first, so..traversal cannot
escape a declared glob. - Opt-in by declaration: a tool that declares no filesystem stance
(filesystemleftUnspecified) is never gated — existing tools are unaffected. - Escape hatch:
agent { enforceToolPolicies = false }restores the prior 0.6.0
declare-only (inert) behavior. - Scope (this is Layer 1): in-JVM, filesystem-argument enforcement for in-process
tools. Relative-path precision andnetwork/environmentisolation require the
Layer 2 OS sandbox (ProcessSandbox/WasmSandbox/DockerSandbox, tracked under
#1916). Seedocs/tool-policy-enforcement.md. - This flips the
ToolPolicyEnforcementTest0.6.0-gap tripwire (#2395) from "restricted
write still happens" to "restricted write is blocked."
Added — Layer 2 OS sandbox, first slice: macOS write-confinement (#2906, under #2891)
- New
agents_engine.sandbox.ProcessSandbox— runs a command under macOS Seatbelt
(sandbox-exec) with a generated profile that denies by default and allows file
writes only under a single canonical folder. A write to any path outside that
folder is blocked by the kernel, not just the in-JVM Layer-1 gate — so it holds
even for paths the tool constructs itself.seatbeltProfile(root)is a pure,
unit-testable function;isSupported()is false off macOS andrunthrows there. - New
sandboxedEchoToFileTool(folder)— the simplest demonstration: a tool that echoes
text into a given path, OS-confined tofolder. In-folder writes succeed; out-of-folder
writes return anERRORand create no file. - The sandbox now builds its profile from a tool's declared
ToolPolicy(#2909):
ProcessSandbox.forPolicy(policy)derives the writable roots from thefilesystem.write
globs (each glob's directory prefix viaglobToWriteRoot) and opens network only for
network = AllowAll;ProcessSandbox.forWritableRoots(roots)confines writes to several
folders at once. This is the bridge that lets Layer 1's declaration drive Layer 2's OS
enforcement. processTool(name, policy) { args -> command }(#2914) auto-sandboxes a subprocess tool
from its declared policy — no hand-wiring ofProcessSandbox. It returns the command's
stdout on success (or anERROR:string), carries the policy onto theToolDefso Layer-1
(#2890) gates path args too, and fails closed (refuses to run rather than executing
unsandboxed) where no OS sandbox is available.- Linux backend (#2892) —
ProcessSandboxdispatches by OS at run time: macOS Seatbelt,
Linux bubblewrap (bwrap), then Linux firejail (the setuid fallback). The Linux paths
bind/mount the whole filesystem read-only, re-mount the declared write roots read-write, and drop
the network unless opened — same write-confinement contract as Seatbelt, enforced by the kernel.
firejail still confines where unprivileged user namespaces are restricted (e.g. Ubuntu 24.04's
apparmor_restrict_unprivileged_userns) andbwrapcan't start. On a host with no sandbox
tool,runno longer throws — it runs the command via a plainProcessBuilderand prints a loud
UNCONFINEDwarning (isSupported()stays false, so a caller that requires enforcement can
refuse).isSupported()is true when any backend is present, soprocessTool/forPolicywork
across all three. The purebwrapArgs(...)/firejailArgs(...)are unit-tested everywhere; the
kernel-level integration (@EnabledOnOs(OS.LINUX)+@Tag("linux_only")) is verified on CI's
native Ubuntu runner. - Subprocess env + cwd honored (#2892):
ProcessSandboxnow confines the child's environment and
working directory.forPolicyderives the env from the declaredToolEnvironmentPolicy—
environment { allow("HOME") }passes only those vars through,environment { denyAll() }gives the
child an empty environment, unspecified inherits;forWritableRoots(..., env, workingDir)sets them
explicitly. Applied on theProcessBuilder, so every backend (Seatbelt / bwrap / firejail) inherits
the confinement. - Network default-deny ships across all backends (#2893 core): only
network { allowAll() }opens
the network;denyAll/Hosts/ unspecified stay blocked (Seatbelt no-network, bwrap
--unshare-net, firejail--net=none). The hostname-allowlist proxy (soHostscan selectively
allow domains) remains the deferred part of #2893. - Remaining Layer-2 follow-ups: the network hostname-allowlist proxy (#2893), read-confinement, the
grants { }structure DSL, and theprocess { }DSL. Wasm/Docker backends are #2894/#2895.
v0.6.6
Fixed — Session catch swallowed CancellationException as AgentEvent.Failed (#2863)
All six session extensions (AgentSessionExtension, PipelineSessionExtension, ParallelSessionExtension, BranchSessionExtension, LoopSessionExtension, ForumSessionExtension) — the outer catch (t: Throwable) block previously treated every CancellationException as a real failure: it emitted a synthetic AgentEvent.Failed, closed the channel cleanly, and swallowed the cancel from the surrounding scope. Field-reported regression (SSE bridge rendered "FlowSubscription was cancelled" as a user-visible failure, clobbering already-streamed partial output).
Rewritten as ordered multi-catch: TimeoutCancellationException first (real failure → Failed path, must come before bare CancellationException because it's a subtype), then CancellationException (propagate per structured-concurrency contract — close channel with the cancel, rethrow), then Throwable (real failure → Failed).
Pinned by new SessionCancellationTest — 2 structural cases (bare cancellation propagates — no Failed event, executor failure still emits Failed) plus 4 per-vendor cases (Ollama / Claude / OpenAI / DeepSeek) using stub ModelClient injections so a future adapter-specific regression can't slip past CI.
Changed — Maintainability epic #2790 (10 refactor tickets)
A code-smell audit landed 10 focused refactors. All behavior-preserving; no public API removals.
#2806 — Runtime cleanup. Central agents_engine.runtime.Ansi object owns ESC / RESET / ERASE_LINE; AnsiColor.code + wrap + spinner clear route through it; dead AnsiColor.Companion.RESET deleted. Session-extension bracket events (Completed/Failed) on Agent/Pipeline/Branch/Parallel switched from non-suspending trySend → suspending send so terminal events can't be dropped silently; inner per-token emitter stays on trySend (typealias is non-suspending) but now logs JUL warnings on failure. agents_engine.internal.BuildInfo.version reads Implementation-Version from the JAR manifest (stamped by tasks.jar { manifest { ... } }); McpServer.SERVER_VERSION / McpClient.CLIENT_VERSION / McpRunner.VERSION all forward to it — the three constants had drifted to 0.1.3 / 0.1.3 / 0.3.0.
#2805 — Core/generation cleanup. enum class ToolRisk(val manifestName: String); fromManifest derives from entries instead of a duplicate when-block. Agent.describeBudget() reflection (BudgetConfig::class.members) replaced with BudgetConfig.describeOverrides() — restores reflect-optional contract (#1718). Broad catches in GenerableSupport / LenientJsonParser / GeneratedMetaCache FINE-logged via new tryGenerable helper; GeneratedMetaCache.tryLoad narrowed to LinkageError / ReflectiveOperationException / SecurityException. ManifestYaml.parsePolicyMap literal 0/2/4/6 indent levels replaced with named DEPTH_TOPLEVEL/SECTION/LEAF/FILESYSTEM_LEAF constants.
#2804 — Model-layer cleanup. AgenticLoop reuses RESERVED_MEMORY_TOOL_NAMES (no parallel inline set). Named constants MANIFEST_HASH_PREFIX_LEN=12, BLOB_HASH_PREFIX_LEN=12, ANTHROPIC_MAX_CACHE_BREAKPOINTS=4, EPHEMERAL_TTL_BOUNDARY_MINUTES=5L. New MutableList.reserveName(name) collapses 5× duplicated require(...) for "Tool already defined". Severity.valueOf bad parse logs at WARNING. 4 near-identical AgentEvent.ToolCallFinished emit blocks → emitToolFinished(...) helper; 5 inline agents_engine.runtime.events.AgentEvent.… FQNs removed.
#2799 — JSON escape consolidation. JsonEscape moved to agents_engine.internal so generation + core can depend on it without inverting the model→generation direction. generation.GenerableSupport.escapeJson, core.ToolPolicy.ManifestJson.quote, core.Snapshot all flow through toJsonString() now. The repeated {"type":"object","properties":{},"additionalProperties":true} literal promoted to internal.OPEN_EMPTY_OBJECT_SCHEMA_JSON. ClaudeClient removeSuffix("}") + ",$cc}" cache-control surgery extracted to appendCacheControlToBlock / appendCacheControlToLastBlock helpers. New control-char regression test in UntrustedToolOutputTest.
#2796 — Shared JsonRpc helper for MCP. New agents_engine.mcp.JsonRpc consolidates encodeRequest/encodeResult/encodeError/parseEnvelope/isNotification. JsonRpcWire owns the literal "2.0", wire keys, and notification prefix. JsonRpcErrorCode names -32700/-32600/-32601/-32602/-32603 as PARSE_ERROR/INVALID_REQUEST/METHOD_NOT_FOUND/INVALID_PARAMS/INTERNAL_ERROR. New sealed class McpException : IllegalStateException (extends ISE for back-compat) with Transport / Protocol / ToolFailure subclasses.
#2792 — Shared HttpModelClientSupport. HttpModelClientSupport.sendBounded(http, request, providerLabel, maxResponseBytes) consolidates the duplicated bounded-read + OOM-guard pattern; Claude / OpenAI / Ollama sendChat all delegate. ModelClient.chatStream(messages) (default impl) delegates to chatStream(messages, jsonSchema = null) instead of carrying a byte-identical 28-line clone.
#2800 — Dedup MCP client list/text-block + Skills factories. 4 file-private helpers in McpClient.kt: resultArray(result, key), joinTextContent(blocks, contentKey), prefixed(prefix, name), makeMcpSkill(name, description, impl). toolSkills/promptSkills/resourceSkills all flow through the factory (8 boilerplate lines each → 3-4).
#2794 — toLlmInput + jsonSerialize collapse. Both flow through a single parameterised serializeForLlm(value, quoteTopLevelStrings) walker. Deferred (out of scope for the maintainability pass — flagged in commit body): the forEachGenerableParam 6-walker unification and constructFromMapReflective 5-job split.
#2801 — Primary (String) -> Any? overload. New LiveShow.from(invoke, ...) and LiveRunner.serve(invoke, args, ...) overloads. Future operator types just pass myAgent::invokeSuspend — no edit to LiveShow/LiveRunner required. The six typed overloads stay for source-compat.
#2807 — Detekt static analysis. detekt 1.23.7 plugin wired into root build.gradle.kts. detekt.yml enables complexity (LongMethod, LargeClass, CyclomaticComplexMethod, NestedBlockDepth), exceptions (SwallowedException, TooGenericExceptionCaught/Thrown), style (MagicNumber with sensible allowlist, UnusedPrivateMember), naming (FunctionNaming), potential-bugs, empty-blocks. detekt-baseline.xml freezes current violations so the build stays green on existing code; new violations fail. README's "First 10 Minutes" lists ./gradlew detekt alongside ./gradlew test.
Notes
No API removals; every 0.6.5 caller compiles and runs unchanged.
agents_engine.model.JsonEscape → agents_engine.internal.JsonEscape: only the internal qualifier is package-visible, so this is a binary-compatible relocation for any consumer using only the public API.
The detekt baseline file (788 lines) is checked in; future PRs are held to the rules without retroactively forcing cleanup of the audited code.
v0.6.5
Fixed — Hardcoded 60s LLM request timeout killed long Sonnet turns (#2850)
ClaudeClient/OpenAiClient/DeepSeekClient/OllamaClient— bumpedDEFAULT_REQUEST_TIMEOUTfrom60.secondsto300.seconds. Field report against 0.6.4 showed long Sonnet turns (multi-step agentic loops with extended thinking) consistently breached the 60s cap on the JDK HttpClient, surfacing asHttpTimeoutException: request timed outand tearing down the streamingFlow. New floor matches what production agents actually need; 0.6.5 callers see no behavior change unless they were silently relying on the truncation.DEFAULT_CONNECT_TIMEOUTstays at10.seconds— healthy networks never spend that long on TCP connect.model { requestTimeout = …; connectTimeout = … }— tunable from the DSL on every built-in provider (Ollama, Claude, OpenAI, DeepSeek). Both fields default tonull, which falls back to the adapter'sDEFAULT_REQUEST_TIMEOUT/DEFAULT_CONNECT_TIMEOUT. Set the override when long-context calls, big Ollama generations, or extended-thinking turns regularly approach 5 minutes. Wired throughModelConfig.requestTimeout/connectTimeout→defaultClientFor()→ each adapter ctor — no shared global; per-agent, per-config, per-test.- No public API removals — additive only. Existing
ModelBuildercallers compile and run unchanged.
Added — Files convenience surface
agents_engine.content.Files— one-line file loading for the typedContenthierarchy.Files.load(path, store): Contentreads the file, detects modality + mime from filename extension (case-insensitive, no magic-byte sniffing), puts bytes via theBlobStore, returns the rightContentvariant. SameContentRef.hashas a manualstore.put. ThrowsUnknownExtensionException(names the extension + path + full list of known extensions) on unrecognised.- Variants:
loadOrNull(null-on-unknown),loadAll(throws on first unknown),loadAllOrSkip(silently skips — directory ingestion),canonicalExtensionFor(content)(inverse mapping),knownExtensions: Set<String>(predicate for callers). - Extension coverage: every
wireMimeon every modality variant has at least one canonical extension. Image:png,jpg/jpeg,gif,webp. Audio:mp3,wav,flac,ogg. Video:mp4,webm,mov. Document:pdf,docx,md/markdown,html/htm,txt. - 13 unit tests pin per-extension mapping, hash round-trip, case-insensitivity, unknown-extension behavior on every entry point, and the canonical-extension inverse for all 17 variants.
Added — Typed agent attachments (#2470 slice b)
agent.invokeWithAttachments(input, attachments)+ suspending siblinginvokeSuspendWithAttachments— user-facing API for vision input via typedContent.Image. The runtime dereferences each ref against the agent's injectedBlobStore, base64-encodes once, and attachesImagePartto the first userLlmMessage. Per-provider wire translation is the slice-a work — this commit routes the typed surface into it.Agent.blobStore: BlobStore?+blobStore(store)DSL — optional injection; null when the agent doesn't take attachments. Passing attachments to an agent with noblobStoreerrors fast at invoke time with a clear message — caller misconfiguration surfaces before any provider HTTP.- Closed mime mapping —
ImageMime → ImagePart.WireMimefor all four variants (Png,Jpeg,Gif,Webp). NoStringconversion at any boundary. - Forensic-friendly errors — when a ref's blob is missing from the store, the error names the ref's hash prefix. Helps debug snapshot resumes against partially-purged stores.
- Non-image variants skipped in v1 —
Content.Text/Document/Audio/Videoflow through the attachment path as no-ops. Slice c will wire Document via provider doc-input adapters; Audio/Video land in Stage 2. - Empty / all-skipped attachments → null images — no provider sees an empty array; legacy wire shape preserved.
- Resume composition —
attachmentsargument is ignored on resume because the restored conversation already carries the originalLlmMessage.imageson the saved user turn. - Tests: 8 unit cases (
AgentAttachmentsTest) + 6 live cases (AgentVisionLiveTest) running the sameVisionFixturesfrom slice a through the agent surface on Ollama qwen3-vl:8b, Claude Haiku 4.5, OpenAI gpt-4o-mini. See docs/multimodal.md.
Added — Vision input across all providers (#2470 slice a)
LlmMessage.images: List<ImagePart>? = null— new optional field; back-compat default leaves the wire shape byte-identical to pre-#2470 for callers that don't pass images. ClosedImagePart(base64, wireMime)withWireMimesealed type (Png,Jpeg,Gif,Webp) —Stringmime is intentionally not accepted in the public ctor.- Per-provider adapters translate vision on
role = "user"messages:- Ollama:
{role:"user", content:"text", images:["<b64>", ...]}— works withqwen3-vl:8b,llava,llama3.2-vision, etc. Non-vision models silently ignore the field. - Claude: typed content array —
[{type:"text"}, {type:"image", source:{type:"base64", media_type:"image/png", data:"<b64>"}}, ...]. Works with all Claude vision-capable models (Haiku 4.5, Sonnet 4.6, Opus 4.7). - OpenAI: typed content array —
[{type:"text"}, {type:"image_url", image_url:{url:"data:image/png;base64,<b64>"}}, ...]. Works with gpt-4o, gpt-4o-mini, gpt-4-turbo, the o* reasoning models. - DeepSeek: inherits the OpenAI adapter shape; current DeepSeek models lack vision and silently ignore the field. Shape-tested; no live call to avoid spending on a no-op.
- Ollama:
- Role-gated: non-user messages (system/assistant/tool) with non-null
imagesignore the field on the wire — no provider's API accepts images on those roles. Pinned by tests. - Programmatic fixtures in
src/test:VisionFixtures.threeSquaresPng()(256×256 red/blue/green squares for "count the squares" eval) andVisionFixtures.housePng()(256×256 cartoon house for "what is this?" eval). Rendered viaBufferedImage+ImageIO— reproducible byte-for-byte across machines and CI, no external assets in the repo. - Live integration tests (
VisionLiveTest) cover all three vision-capable providers with cost discipline (temperature = 0,maxTokens = 80, single-turn, ~5KB base64 payloads): Ollamaqwen3-vl:8b(taggedlive-llm, runs via:integrationTest), Claudeclaude-haiku-4-5and OpenAIgpt-4o-mini(taggedlive-cloud-api, runs in default:testwithassumeTrueskipping when no key). Model names overridable via env. Assertion shape is loose keyword-match — robust against per-model phrasing variance. - 8 wire-format unit tests pin per-provider JSON shape + the no-images back-compat path. See docs/multimodal.md.
Added — Multimodal foundation (#2465 epic, Stage 1)
- Typed
Contenthierarchy (#2466) —sealed interface Contentwith variantsText,Image,Audio,Video,Documentin packageagents_engine.content. Each non-text variant carries aContentRefplus a typed mime (ImageMime,AudioMime,VideoMime,DocMime). Mime types are closed sealed interfaces withwireMime: Stringaccessors — noStringmime in any public API. Extension propertyContent.modality: Stringis the audit-stable per-variant name. Stage 1 wires Image + Document end-to-end (the modalities the 0.8 spec → product loop consumes); Audio + Video are modelled now and exercised through provider adapters in Stage 2 (#2470, deferred). ContentRef+BlobStore(#2467) — content-addressed reference (hash: StringSHA-256 hex,sizeBytes: Long,wireMime: String).BlobStoreinterface withInMemoryBlobStore(defensive byte-array copies on put + get) andFileBlobStore(dir)(one file per blob, filename = hash, atomic tmp + rename, survives process restart, idempotent put). Hash family matches the manifest hash (#1912) and snapshot filename hash (#2753) — single algorithm across the audit surface. Public top-levelcomputeContentHash(bytes): Stringfor byte-level comparison without a store.ToolResult(#2469) —data class ToolResult(parts: List<Content>)for tools that return mixed content (a screenshot tool returns text + image; OCR returns extracted text + the source PDF ref). Just anotherAny?the tool executor returns — noToolDefsignature change; existing tools that return strings keep working byte-for-byte. AgenticLoop renders multipart returns as text +[modality: <wireMime>] (<hash-prefix>, <size>B)placeholders for the LLM tool-result message; provider-specific multipart rendering (vision-capable Claude/OpenAI/Gemini) is sibling #2470 (deferred). JSONL audit exporter gains anoutputParts: List<String>?column on audit rows — forToolResultreturns it emits one entry per part as<modality>:<hash-prefix>:<sizeBytes>:<wireMime>(text parts astext:inline:<charCount>:text/plain); blob bytes never enter the audit row. Field is null for non-multimodal returns — legacy audit rows unchanged.EXPECTED_FIELDSschema-pin updated to include the new column. Composes with snapshot/resume (refs serialise, blobs stay external) anduntrustedOutput(the text-summary rendering goes through the existing JSON envelope). See docs/multimodal.md.
Added — Eval harness (#2491 epic, feature-complete)
DeterministicModelClient(#2492) —agents_engine.testing.DeterministicModelClient(scripted: List<LlmResponse>)(or vararg ctor) hands back pre-scripted responses one perchatcall. No network, byte-deterministic.requestsrecords every message list the agent built up;remaining()reports unconsumed responses. Exhaustion throws `Determin...
v0.6.3
[0.6.3] — 2026-05-29
"Prompt-caching foundation + Koog-bug regression net." Ships the vendor-neutral prompt-caching DSL — the foundation of the #2655 epic — and lands the first eight Koog issue-set regression checks under #2474 (five real fixes including the sealed @Generable parent-dispatch unblock, plus three regression-pin tests against the existing contracts).
Added
- Vendor-neutral prompt-caching DSL + neutral hint model (#2656, part of the #2655 epic) — agent-controllable prompt caching declared in provider-agnostic terms. New
caching { }block:enabled(default true),cacheSystemPrompt/cacheToolDefs(default true — byte-stable system prompt + KSP-stable tool defs, #1703),cacheConversation = None | Rolling(defaultNone; opt-in because rolling has per-vendor write cost),ttl(null = provider default), plus acacheable(id, ttl) { content }helper for per-segment marking of large retrieved documents / instruction sets. Internally, the agentic loop attaches a neutralCacheHint(segment, ttl, breakpoint)(withsealed CacheSegment { SystemPrompt; ToolDefs; Conversation; Custom(id) }) toLlmMessageat message-assembly time.LlmMessagegains an optionalcacheHint: CacheHint? = nullfield — backward-compatible: existing adapters ignore it, preserving the pre-#2656 wire shape exactly. No provider cache types (cache_control, Gemini cache IDs, …) appear in the public API. Per-provider adapter consumption (Anthropic / OpenAI / Gemini / DeepSeek / Ollama) lands in #2658-#2662; stability guard in #2657; observability in #2663. See the Prompt Caching wiki page. SessionHistory— ergonomic, stable history accessors overAgentSessionevents (#2485, addresses Koog signal under #2474) —class SessionHistory(events: List<AgentEvent<*>>)exposestoolCalls()/toolResults(excludeErrors = false)/assistantMessages()/completedOutput()/failed()/skillsStarted(). Thin wrapper — no new state, deterministic ordering from the source flow, no allocation beyond filtered list materializations.ToolCallRecord(callId, toolName, arguments)andToolResultRecord(callId, toolName, result, isError)are the surfaced shapes. Not in v1: auserMessages()accessor — the agent input is passed toagent.session(input)directly and is not surfaced as an event; adding it requires a newAgentEvent.UserMessageand is out of scope for this slice.
Changed
- Unknown / unlisted tool name mid-loop is now recoverable, not fatal (#2476, regression for Koog signal under #2474) — when the model emits a tool name absent from the active skill's allowlist (whether outright unknown or belonging to a different skill on the same agent), the agentic loop previously threw
IllegalStateExceptionand the run died. It now appends a tool-result message naming the bad call and listing the skill's allowed tools, then continues — so the model gets a turn to self-correct. The disallowed executor still never runs (authorization boundary unchanged), the skill's allowlist is the only set named (no leak of the wideragent.toolMap), and streaming consumers see aToolCallFinished(isError = true)for the rejected call. Pinned byKoogRegressionUnknownToolTest;ToolAuthorizationTestrewritten to assert the recovery contract (two of its prior assertions were accidentally passing viafail()message contents — replaced with honest tool-message inspection). McpServertools/call now serializes@Generableoutputs as JSON, not as Kotlin debugtoString(#2483, regression for Koog signal under #2474) —McpServer.handleToolCallpreviously rendered the executor's return value throughoutput?.toString(), leaking the Kotlin data-class debug shape (SearchPayload(text=Hello, source=wiki)) into the MCP text content. Routed throughtoLlmInputinstead:@Generableoutputs render as JSON ({"text":"Hello","source":"wiki"}),Stringstays clean, and primitives stay clean. Non-@Generabletyped outputs still fall back to.toString()— documented limitation, register a@Generableoutput type for typed MCP boundaries.- Enum-typed fields now appear in JSON Schema with a typed value list (#2479 part 1, regression for Koog signal under #2474) —
KType.jsonSchemaTypeObjectpreviously fell through to{"type":"string"}for enum-typed constructor parameters, so the LLM had no way to know which values were valid and the constrained-decoding provider path couldn't enforce them. Enums now render as{"type":"string","enum":["veryHigh","normal","low"]}with constant names emitted verbatim fromEnum.name— no case mutation, no@SerialName-style lowercasing. Mixed-case constants (RED/Green/blue) survive intact. The tool_choice configurability half of #2479 is a separate slice (ToolChoice { Auto | Required | None | Specific(name) }API + adapter wiring). - Sealed
@Generableparent classes now deserialize via type-discriminator dispatch (#2482a, regression for Koog signal under #2474) —KClass<Sealed>.constructFromMap(...)previously returned null becauseprimaryConstructoris null on sealed parents. The schema-gen path emits{"oneOf": [...]}for sealed types, so any MCP-exposed skill (or other typed entry point) declaring a sealed@Generableinput was unusable — the model could produce a matching payload, the server couldn't read it.constructFromMapReflectivenow checksisSealed, looks up the matching variant by thetypediscriminator, and recurses — including thedata objectcase viaobjectInstance. Unknown variants and missing-discriminator maps return null so the call routes throughonError.invalidArgsinstead of constructing a wrong-shape value. - Stringified-JSON coercion for nested object / list / sealed fields (#2482b, regression for Koog signal under #2474) — when the LLM emits a typed field whose value is a JSON string (instead of a nested object / array),
coerceValuenow parses the string withLenientJsonParserand continues coercion. Guarded:Stringfields are NOT JSON-decoded (a value like"The {weather} report"stays the literal string —String::classmatches first in thewhen), and unparseable JSON for an object/list field returns null so the failure routes throughonError.invalidArgs. Composes with #2482a — a sealed-typed field accepts a JSON string carrying the type discriminator.
Tests
- Koog issue-set regression suite — first slice (#2474) — pin Agents.KT contracts where Koog broke. #2475 ships
KoogRegressionWrongTypedArgsTest(3 cases): (1) scalarNumber → Stringis intentional coercion percoerceValue(not a malformed arg — executor runs with the stringified value); (2) a truly-unparseable value for a typed field (e.g."abc"forcount: Int) routes throughonError.invalidArgswith end-to-end recovery viaRepairResult.Fixed, executor runs exactly once for the repaired call; (3) without a handler the failure is the framework'sToolExecutionExceptionwith typed-arg context — never a rawkotlinx.serialization/NumberFormatException. - Koog regression — loop protection (#2480) —
KoogRegressionLoopProtectionTest(4 cases) pinsbudget { maxConsecutiveSameTool = N }: same tool past the cap throwsBudgetExceededException(reason = CONSECUTIVE_TOOL)naming the offending tool; an interleaved call resets the counter (alpha → beta → alpha → beta) so an alternating agent doesn't trip; name-only semantics — varying args still trip the cap (stricter than the Koog signal's "identical args" framing — Agents.KT catches more loop shapes); pre-cap threshold listener (onBudgetThreshold) fires forCONSECUTIVE_TOOL. Repeated-identical-assistant-output detection mentioned in the Koog signal is NOT yet implemented — known gap, separate detector if/when needed. - Koog regression — OpenRouter-style streaming chunk reconstruction (#2478) —
KoogRegressionStreamingChunkReconstructionTest(3 cases) feeds synthetic chunk sequences throughchatOrStreamand pins: OpenRouter shape (toolName in the firstToolCallStartedonly, args split across NToolCallArgumentsDeltachunks, finalized byToolCallFinished) reconstructs into one coherentLlmResponse.ToolCallsentry with full args; every wire arg-delta surfaces as exactly oneAgentEvent.ToolCallArgumentsDeltaevent in arrival order verbatim (so streaming UIs can show JSON building up); interleaved chunks for parallel calls route bycallIdand reconstruct both calls cleanly; an orphan args delta (no precedingToolCallStarted) doesn't crash the aggregator and doesn't fabricate aStarted— the delta still fires as a consumer event so a UI sees the wire activity. ClaudeClientChatStreamLiveTeststabilised (#2723) —1..50prompt was still small enough for Haiku 4.5 to occasionally batch the entire response into ~3 same-millisecond SSE chunks, failing the>=10ms gap OR >=5 chunksassertion intended to catch wire-level re-bundling regressions. Bumped to1..200; three consecutive validation runs each report 8 chunks across ~1.3s of streaming. Also corrected the test's stale doc comment — the actual@Tagislive-cloud-apirunning under default:test, notlive-llmrunning under:integrationTest.
v0.6.2
Title:
v0.6.2 — Attribution you can filter by
Body (paste into the description field):
"Attribution you can filter by." Closes the bridge-observability gap that every downstream Langfuse / LangSmith / OTel consumer was working around.
AgentRuntimeContextnow carries free-form business attribution alongside the technical correlation fields, so bridges can drop their per-bridge
ConcurrentHashMap<requestId, userId>+onBeforeTurncapture pattern and read user / project / dialog identifiers directly off the runtime context.
implementation("ai.deep-code:agents-kt:0.6.2")
implementation("ai.deep-code:agents-kt-ksp:0.6.2") // optional but recommended
Drop-in for 0.6.0 / 0.6.1 consumers — every new surface is additive, off by default.
---
Headlines
Native attribution on AgentRuntimeContext (#2720)
AgentRuntimeContext.attribution: Map<String, String> plus typed accessors userId / projectId / dialogId (canonical keys via AttributionKeys.{USER_ID,
PROJECT_ID, DIALOG_ID}). Set once at the session boundary; every nested AgentEvent / PipelineEvent surfaces it:
withAgentRuntimeContext(
AgentRuntimeContext.currentOrNew().copy(
attribution = mapOf(
AttributionKeys.USER_ID to userId,
AttributionKeys.PROJECT_ID to projectId,
AttributionKeys.DIALOG_ID to dialogId,
"tenantId" to tenantId, // arbitrary keys round-trip
),
),
) {
agent.session(input).events.collect { event ->
// event.userId / event.projectId / event.dialogId all populated
// event.attribution["tenantId"] == tenantId
}
}
Non-breaking — defaults to emptyMap(). Replaces the per-bridge side-channel pattern (ConcurrentHashMap<requestId, userId> + onBeforeTurn capture).
Bundles the 0.6.1 batch
Since 0.6.1 shipped on a parallel release branch and never merged to main, 0.6.2 rolls it up:
- Snapshot/resume foundation (#2416) — experimental — message-history-as-state design; ships SessionSnapshot, SnapshotStore (InMemory + File with atomic
temp-write + rename), the executeAgentic turn-boundary checkpoint + resumeFrom seam, MemoryBank snapshot/restore. Round-trip proven by test (3 turns →
crash → fresh agent → restore → finish).
- Reasoning/thinking stream (#2406) — opt-in model { reasoning(...) } surfaces a model's reasoning as AgentEvent.Reasoning on a channel separate from the
answer Token stream. Claude / DeepSeek / Ollama emit reasoning text; OpenAI Chat Completions reports reasoning_effort + TokenUsage.reasoningTokens.
- onBudgetExceeded (#2412) — raise a budget cap and continue mid-run via BudgetDecision.Extend(newLimit) instead of throwing. Currently wired for the
tool-call cap.
- onToolDenied + PipelineEvent.ToolDenied (#2395) — calls blocked by an onBeforeToolCall Decision.Deny are now first-class observable. Previously denials
silently dropped from onToolUse / observe { }.
- Typed parameter schemas for built-in tools (#2379) — memory_*, forum_return, swarm absorb carry typed @Generable schemas instead of relying on
providers' permissive empty-properties fallback. No public API change.
Dependency bumps in the published artifacts
- org.jline:jline 3.27.1 → 4.1.2
- com.google.devtools.ksp:symbol-processing-api 2.3.7 → 2.3.8 (the :agents-kt-ksp module)
- Gradle wrapper 9.5.0 → 9.5.1
---
Verified
Full ./gradlew test green on the new toolchain — 1596 tests, 0 failures across all 7 modules. LiveShow JLine path (terminal builder + history + readLine +
EOF/interrupt handling) verified under the 4.x line.
Documentation
- docs/observability.md (https://github.com/Deep-CodeAI/Agents.KT/blob/v0.6.2/docs/observability.md) — bridge consumption pattern for attribution.
- docs/regulated-deployment.md (https://github.com/Deep-CodeAI/Agents.KT/blob/v0.6.2/docs/regulated-deployment.md) — how attribution interacts with the
audit story.
- CHANGELOG.md [0.6.2] (https://github.com/Deep-CodeAI/Agents.KT/blob/v0.6.2/CHANGELOG.md) — full line-by-line.
What's next
- Composition snapshots for Pipeline / Forum / Loop / Branch (#2386 Phase 2c)
- Manifest-hash restore guard on snapshot resume (#2386 Phase 2b)
- Mid-tool coroutine suspension (depends on #638)
- The 0.7.0 epic — enterprise policy layer + human-in-the-loop (#2487)
**Optional attachments** to drag-and-drop into the release form (so consumers can verify the Maven Central jars against the exact GPG-signed bundle):
- `build/agents-kt-0.6.2-combined-bundle.zip` — single file, both artifacts
- Or separately: `build/agents-kt-0.6.2-bundle.zip` and `build/agents-kt-ksp-0.6.2-bundle.zip`
Check **"Set as the latest release"**, leave the other defaults, click **Publish release**.v0.6.1
Added
- Reasoning/thinking stream (#2406) — opt-in
model { reasoning(budgetTokens = …, effort = …) }surfaces a model's reasoning asAgentEvent.Reasoning, separate from the
answerTokenstream, so a UI can render live reasoning instead of a spinner. Claude (extended thinking), DeepSeek (reasoning_content), and Ollama (thinking) emit reasoning
text; OpenAI Chat Completions reportsreasoning_effort+TokenUsage.reasoningTokensonly. onBudgetExceeded(#2412) — when a budget cap would throw, returnBudgetDecision.Extend(newLimit)to raise the cap and continue, orStopto throw. A long-running
agent can grant itself more tool calls mid-run instead of failing. Wired for the tool-call cap.onToolDenied+PipelineEvent.ToolDenied(#2395) — tool calls blocked by anonBeforeToolCallDecision.Denyare now first-class observable (audit no longer silently
drops blocked attempts).- Snapshot/resume foundation (#2416, experimental) —
Snapshotable,SessionSnapshot,SnapshotStore(InMemory+File), and the loopresumeFrom/checkpoint seam. An
agent's resumable state is its message history, so resume re-enters the loop rather than suspending a coroutine.
Changed
- Typed parameter schemas for built-in tools (#2379) —
memory_*,forum_return, and swarm delegates now declare real schemas instead of the permissive empty-properties
fallback. No public API change.
Install
implementation("ai.deep-code:agents-kt:0.6.1")v0.6.0
[0.6.0] — 2026-05-24
"Boundaries you can audit." The 0.6.0 epic (#1911) turns Agents.KT's typed-boundary model into auditor-ready evidence: deterministic permission manifests with runtime hash correlation, append-only JSONL audit, before-interceptor guardrails, typed tool / MCP-tool hierarchies, vendor-neutral observability bridges (OTel / LangSmith / Langfuse), constrained decoding for @Generable outputs, DeepSeek as a fourth provider, and onTokenUsage telemetry. Existing consumers see no behavior change unless they opt into the new surfaces.
Added
Permission manifest — the 0.6.0 hero feature (#1912)
:agents-kt-manifestmodule —agentManifest(agent)returns a deterministic capability graph: every agent, skill, tool, knowledge entry, MCP endpoint, provider, budget, and policy boundary in a system, in YAML or JSON, with stable ordering and masked provider secrets.verifyAgentManifestGradle task — diffs the current manifest against a checked-in baseline; fails the build on capability widening (new tools, new MCP endpoints, broader policies) so reviewers always see surface-area changes before they merge.- Manifest SHA-256 propagates into the runtime — every
PipelineEvent/AgentEventcarries themanifestHashof the agent that emitted it, so static manifest and dynamic audit trace tie back to the same approved capability set. - Provider secrets masked — API keys, base URLs containing credentials, and any field marked
@SecretSafeare redacted from the emitted manifest.
Runtime event context (#1913)
manifestHash,requestId,sessionIdon every runtime event —PipelineEventandAgentEventboth carry them, so JSONL audit / OTel / LangSmith / Langfuse downstreams all bind events to the manifest hash that was authoritative at invocation time.withAgentRuntimeContext { ... }extension — Kotlin-coroutines-context-aware threading so nested compositions (then,branch,loop,forum,wrap) inherit the outer request/session/manifest correlation without re-derivation.
JSONL audit exporter (#1914)
:agents-kt-observabilityJsonlAuditExporter— append-only, one-line-per-event audit format withrequestId,sessionId,manifestHash, agent/skill/tool ids, event type, provider, and model. Raw arguments and results are omitted by default; opt-in viaincludeRawArgs = true/includeRawResults = truewhen the audit consumer needs them.- Stable canonical field ordering — same audit row produces the same JSON line on every run, so the file is grep-friendly and diff-able.
- PII-safe defaults — designed for the regulated-deployment workflow in
docs/regulated-deployment.md.
Before-interceptor guardrails (#1907)
onBeforeSkill/onBeforeToolCall/onBeforeTurn— Rails-style interceptors returning a sealedDecision { Proceed | ProceedWith(...) | Deny(reason) | Substitute(result) }. Sibling to the post-hoconToolUse/onSkillChosen/onErrorobserver hooks already in 0.4.x.- Chain semantics — interceptors run in registration order; every interceptor runs; the first non-
Proceedwins;Denyshort-circuits with anonUnauthorizedToolCall-shaped audit event;Substituteskips the model and returns the substituted value. - Unified use cases — per-client tool policy (McpServer per-principal allowlists), action confirmation (
Escalate(reason, reviewerRole)resumed by the host app), prompt-injection filtering as a one-liner, uniformperToolTimeoutwrapping. Seedocs/interceptors.md.
Declarative tool policy (#1915)
ToolPolicyDSL ontool { policy { … } }— declares tool risk (LOW/MEDIUM/HIGH/CRITICAL) plus filesystem / network / environment declarations. Consumed by the permission manifest and by audit-row formatters.- No runtime enforcement yet — the sandbox-enforcement work is deferred to 0.7.0 (#1916). 0.6.0 ships the declaration surface so manifest reviewers can already see "this tool reads
~/.ssh" or "this tool calls*.openai.com" at policy-review time.
Typed tool + MCP-tool hierarchies (#1948)
Tool<IN, OUT>typed handles —tool<Args, Result>("name", "desc") { args -> ... }returns aTool<Args, Result>with phantom types soSkill.tools(addTool, divideTool, …)is compile-time-checked instead of stringly-typed.McpTool<IN, OUT>— every MCP-imported tool also gets a typed handle viaMcpClient.tools(prefix). Composes with the sameSkill.tools(...)builder. Additive alongside the existingMCP-as-skilladapter.
MCP server hardening (#1902)
- Inbound bearer auth —
McpServer.tokens(...)configures principal → token mappings; unauthenticated requests get a structured 401.McpStdioServershares the same authn surface for stdio deployments. - Host / Origin allowlists — DNS-rebinding and CSRF defenses against browser-side
localhostexploits; explicit allowlist required for non-loopback hosts. - Per-principal tool policy — each principal can have its own subset of agent skills exposed as MCP tools. Policy decisions flow through the
onBefore*chain and into audit events. - Default-deny — unconfigured server rejects everything except
initialize/tools/list; opt-in for each authorization grant.
Stdio MCP server transport (#2045)
McpStdioServer.from(agent)— exposes the same agent surface (tools, prompts, resources,tools/listChanged: false) over line-delimited stdio instead of HTTP. Same authentication + policy plumbing as the HTTP server.McpRunner --stdio— picocli-style one-liner for shipping agents as stdio-MCP services without a Gradle dependency on:server-style infrastructure.
LiveShow line editing (#985)
LineEditor— line-discipline-aware input handling for the LiveShow runner: cursor movement, history, kill-line, basic readline-style navigation, all while the agent streams events to the display.- Cancellation-safe — collector cancellation propagates through the editor; no orphaned threads.
Runtime observability bridge (#1908)
ObservabilityBridgein:agents-kt-observability— vendor-neutral bridge contract withonPipelineEvent,onAgentEvent, andonInterceptorDecision, plus.observe(bridge)for one-call wiring.:agents-kt-otelmodule — OpenTelemetry adapter that maps agent sessions toagent.invokespans, model turns togen_ai.chatspans, tool calls togen_ai.toolchild spans, errors to span status, usage to GenAI attrs, and before-interceptor decisions to span events.:agents-kt-langsmithmodule — LangSmith run-tree adapter that maps skill invocations tochainruns, model turns to childllmruns, tool calls to childtoolruns, failures to run errors, budget threshold events to run extras, and interceptor decisions to run tags. Dispatch is asynchronous, batched, oldest-drop under backpressure, and never throws into the agent path.:agents-kt-langfusemodule — Langfuse trace adapter that maps skill invocations to traces, model turns to generations, tool calls to spans, runtime events to Langfuse events, and interceptor decisions to tags plusinterceptor.decisionobservations. Dispatch is asynchronous, batched, oldest-drop under backpressure, and uses Langfuse's native ingestion endpoint without a vendor SDK.- Core remains vendor-free — OTel, LangSmith, and Langfuse integration code is isolated to adapter modules.
Provider constrained decoding (#1949)
@Generableschemas are threaded into provider payloads — OpenAI receivesresponse_format.json_schema, Ollama receivesformat, and Anthropic receives a structured-output tool path for typed agentic outputs.- Provider capability detection —
ModelClient.supportsConstrainedDecodinggates schema forwarding so unsupported adapters keep the existing repair-loop behavior.
DeepSeek provider adapter
model { deepseek(name); apiKey = ... }— OpenAI-compatible Chat Completions adapter with DeepSeek provider identity, configurabledeepSeekBaseUrl, usage normalization, streaming through the OpenAI-compatible SSE path, and manifest provider metadata.- Constrained decoding stays disabled for DeepSeek — the adapter does not send OpenAI
response_format.json_schemabecause DeepSeek documents JSON-object mode rather than that schema payload.
Token usage telemetry (#2354, #2355, #2356, #2357)
- Public
Agent.onTokenUsage { usage: TokenUsage -> }listener — fires once per successful LLM round-trip that reports usage, including streaming paths at end-of-stream. Tool-use cycles fire once per provider response, not once per agent invocation. - Widened
TokenUsage— now carriespromptTokens,completionTokens,cachedInputTokens,provider, andmodel.totalremains prompt + completion; cached tokens are a provider-visible subset of prompt tokens, not an extra addend. - Provider-normalized usage mapping — Anthropic maps
input_tokens/output_tokens/cache_read_input_tokenswithprovider = "claude"; OpenAI mapsprompt_tokens/completion_tokens/prompt_tokens_details.cached_tokenswithprovider = "openai"; Ollama mapsprompt_eval_count/eval_countwithcachedInputTokens = nullandprovider = "ollama". - Listener safety semantics — missing usage does not fire, LLM failures do not fire and remain covered by
onError, multiple listeners run in registration order, and listener exceptions are logged and swallowed so telemetry cannot break the agent run.
Tests
- Added
OnTokenUsageTestcoverage for widened fields, multi-listener ordering, listener-error swallowing, missing-usage skip, model-failure skip withonError, multi-turn tool-use ordering, and streaming single-fire behavior. - Updated Anthropic, OpenAI, and Ollama adapter tests to assert ...