Notes on LLMs in a service discovery ecosystem

Preamble and concepts

Automated "service discovery" has been one of the holy grails of network computing infrastructure since some time in the 1990s. Many protocol schemes and standards have been established or proposed in this area, but only a few of these have gone on to widespread use, typically in specialized domains (e.g., Bonjour, some uses of DHCP). Many others, however, often some of the more ambitious efforts, have been high profile flops (the most salient example in my mind was UDDI).

A lot of these schemes involve the publication of machine-readable API specifications in some kind of IDL, combined with some kind of directory service that maps API availability to service endpoint addresses, possibly with some kind of naming or branding mechanism mixed in.

On the service provisioning end, automation is fairly straightforward (in concept at least; the engineering per se might be substantial and complicated). When a service is implemented, automated tooling either derives IDL descriptions from the implementation code or the developer specifies the IDL and the tooling automatically produces the relevant header files, class definitions, code stubs, configuration descriptors, or whatever other glue or metadata is needed to build and deploy the service in the context of the "service discovery" framework. The key is that in either case the developers are the source of all intentionality: they define what the service is based on what they want it to be.

On the service consumption end, however, affordances for automation have been more constrained. The protocols that have seen success qualify less as service discovery mechanisms than as service location mechanisms -- everything is predicated on the consumer knowing in advance precisely what they are looking for. These mechanisms are typically designed to support two types of consumers:

Users, who wish to make use of some well understood category of service through the mediation of local applications or OS functionality.
Developers, who wish to have a thing they are developing make use of the service.

In either case, everything exists within a pre-established subject domain that everybody involved already understands. A great example of this is printing: Bonjour (generally built into your OS) works very well to let your computer find out what printers exist on the local network and what their properties as printers are, and enables your applications that want to print things to do so. Once it's been established that we're dealing in the realm of printing, the variations that might need to be supported (e.g., print resolution, color vs. B&W, one-side vs. two-sided, collation, etc.) are baked into standardized printer description metadata formats that the various printers know how to present to potential clients and the various applications that print things know how to display to the user and let the user make selections from. This works because all the variations are captured in standards already agreed upon by the interested parties (namely, printer makers and OS developers) on either end of the toolchain.

In particular, the software that is the consumer (or works on behalf of the consumer) typically has the client side of the service APIs coded directly into it, with any published IDL being primarily a documentation resource for the client developer. In this mode, the API description is a development-time resource, whereas the service location is a run-time resource. To the extent that the API documentation is published in machine readable form, the machine doing the reading will be part of the development tooling rather than the software that is actually using the service. Many real world "service discovery" frameworks don't traffic in API descriptions at all, since making this information available at run-time adds cost and complication, while the value of run-time access to this information is low to zero.

These frameworks are fine and even very useful in contexts where the participants already begin with a notion of what they are doing, but it's not really service discovery in the sense of assisting you when you wonder "what kinds of services exist out there and what use might I potentially make of them?" or "I have a task I'm trying to accomplish and I wonder what services might be available to assist in doing it?". In other words, what we're discovering is where the services are and what standardized options they may support, but we're not learning what they are.

Ocaps provide a new perspective on this situation. One of the fundamental dogmas of the capability model is "don't separate designation from authority". In other words, the entity that you employ to make use of some authority is the same entity that indicates which authority it is that you are intending to make use of. Turning this phrasing around, it also means that you can't designate a thing to someone without at the same time giving them access to it. On its surface, this feels like a frustrating new limitation that we are unaccustomed to dealing with, but actually it just highlights a problem we already had but perhaps were unaware of: the older practice of freely passing around designations while relying on confused-deputy-prone ACL mechanisms to secure access to what the designations designate leads to a lot of practical confusion. In particular, we get a lot of confusion surrounding what linguists like to call the use vs. mention distinction. Ocaps, in contrast, compel you to be clear on this distinction by dint of simply decreeing that all references are use references and that mention references aren't a thing. However, though we are now unambiguously clear on what a reference is, we are left without a useful means to make use of that clarity, because we have resolved the confusion by the expedient of pretending that one side of the use/mention distinction doesn't exist. Present and historical ocap systems typically don't have any codified way to talk about a capability without directly referring to the capability itself. This is precisely the same problem that efforts to support the dream of "service discovery" have been trying to address.

The challenge, of course, is that the description of what a service does, what it is useful for, and why it is structured the way it is are all things that don't lend themselves to a precise representation with formal syntax the way that an API does. These things are deeply entangled with the intentions of whomever might be consuming the API rather than type information about the API itself. Because of this open-ended quality, this information generally needs to be presented as free form text to be read in the context of a broader understanding of the world. Consequently, this information is traditionally packaged as documentation targeted at a human audience. While LLMs have proven surprisingly effective at reading and interpreting such documentation, the documentation itself still needs to exist and to be available, which typically only happens if the thing being documented is part of a project or product substantial enough to have, e.g., its own web presence. This could be a sophisticated product support website, but it might be as simple as a Github source repo that can be scanned for JSDoc comments (or even just source code that can be scanned for its meaning). However it is presented, this information is likely to be at a very coarse level of granularity. Furthermore, since none of this is standardized, each act of API consumption typically involves some kind of bespoke activity to actually locate the relevant descriptive information and absorb it. This could be as simple as picking the top choice out of a Google search, but it might entail trolling through Stack Overflow archives or Reddit threads or other social media, or even involve more substantial kinds of digging to puzzle over details or edge cases. LLMs have proven unreasonably proficient at this as well, but even so it tends to be very hit-or-miss (though to be fair to the LLMs, this is also true for human developers confronted with the same information sources).

Outside the context of an established service product (using the word "product" very loosely), if you want to simply expose an ad hoc service endpoint as, e.g., a capability that can be invoked like E(service).method(args...) AND you want external entities to be able understand what it does, you're pretty much out of luck if you're developing with status quo tooling. (In practice, people often do a lot of hacking around from guesses based on method names and parameter types, but this is approximate and error prone.) On the other hand, defining a simple "tell me about this object's interface" protocol that presents free form textual descriptions in addition to formal API interface type information seems straightforward -- and it doesn't just seem that way: we've actually done it, in with two slightly different variants no less, in our current ocap kernel codebase in support of our LLM integration efforts.

However, there is a broader "service discovery" story. In this story, the aforementioned "tell me about this object" mechanism, or something like it, will very probably play a role, but that mechanism is not itself the complete solution. I think I can best explain this by going more deeply into one of the key challenges we were striving puzzle out at the late lamented Electric Communities. (Just to whet your appetite for reading further, I should mention that at EC we referred to this challenge as the "whacking people's heads off problem".)

To set the stage: the goal was to create a decentralized, permissionless graphical virtual world platform. The idea was to have a single, seemingly seamless world in which different parties could host different parts of the world that would each operate according to rules determined by its host. Users could move around in this world via their avatars, including moving between parts of the world with different hosts. The users would be able to carry the various objects they possessed with them as they did this, potentially picking items up from one place and putting them down or using them in another. The big technical challenge was to arrange things so that, whatever these objects actually did, they would continue to do that wherever they happened to be, so long as what they did was consistent with the local rules of the place they were in. The world was intended to be user extensible, in the sense that not only could new objects be created but new kinds of objects could be introduced that might do new kinds of things, and these would also continue to work properly wherever they went, though subject, once again, to being constrained by local rules.

"Constrained by local rules" is the key problem. The motivating scenario was: we consider two parts of the virtual world. One part is a fantasy RPG like Diablo or World of Warcraft. The other part is a virtual stock exchange where traders buy and sell securities online. We don't want a barbarian from the FRP area to be able to go into the stock exchange and start whacking peoples' heads off with her axe, and we don't want the stock broker who has wandered over to the RPG world for some recreation during his lunch hour to have his stock portfolio made off with when he is set upon by brigands.

When we try to introduce an object into a new environment, it needs to get wired up to the capabilities that will enable it to work there. This requires doing two things: (1) match the capabilities it thinks it wants with the capabilities that might theoretically be available, and (2) determine if it is appropriate to grant it those capabilities.

The first of these entails some determination of relevance: if the capability the object wants doesn't exist in this particular environment, then the attempt to match wants with availability will fail. Another way to frame this is, the object and the environment must have commensurate world models. Obviously you won't be whacking peoples' heads off if that's just not a thing here.

Notably, this capability matching operation is essentially just another framing of the service discovery challenge. However, thinking about it in this problem context helps highlight different ways it might be approached:

Offering first: a service publishes a description of what it has to offer, then potential clients match their needs against this.
Requesting first: a potential client presents its wants to a service and the service proposes a subset of what it has available that is responsive to those.

One important difference between these is where it puts the primary burden of matching computation: on the service provider or on the potential client. In particular, in the offering first approach, the service descriptions could be posted to a registration service or harvested by a search engine, putting them into an aggregated catalog that could be searched by prospective clients before actually contacting the service itself for more detailed negotation. In this case, the work of determining relevance could be the joint effort of the potential client and the search engine, where the client presents its requirements to the search engine, which then proposes possible services that might satisfy them.

Whatever the path by which a potential client converges on a potential service, I think the rough form of the matching itself is mostly the same. Since we are talking about comparing descriptions which are presented partially in natural language, it follows that, without a human in the loop, an LLM (or whatever succeeds LLMs in the AI space) will be necessary.

Assuming that we have managed to successfully determine that a potential client wants to be granted a particular capability from a service, we come to the second part of the object introduction handshake: determining whether it will be allowed to have it. This is a decision altogether unlike the matching phase, since it now brings in issues of trust and trustworthiness. I think it's plausible that some kind of AI machinery figure into this part of the story as well, but it's not obvious to me how, so I'm going to breeze past that for the time being.

There are several flavors of permission mechanism imagineable that one might reasonably want here, depending on your use case. I'll start with the most basic and work up from there in rough order of complexity. I don't think any of the more esoteric or difficult possibilities are likely to be needed in the short term, so for the time being I think we can get away with relegating them to future research topics.

Public API

The service is available to all comers. The only concern is API matching, to ensure that the service offered is one that the client is seeking. Any particular operation that is available on the interface can be used by anyone.

Permissioned API

The client must authenticate using some kind of credential that indicates they have been authorized to use the API. Exactly how this authorization was obtained or why is none of our business.

Pay For Service

This is just a permissioned API in which the permission is obtained by paying money to use the service. It is worth calling out as its own case because the (a) the fee collection machinery could be part of the authentication handshake instead of being handled out of band, and (b) fee collection could be on a pay-as-you-go basis where payment for specific actions is part of the API itself. Note that particular arrangements are still points in a very large design space: there might be a one-time payment for access, a one-time payment for a specific amount of time or a specific quantity of use, a billing credential (such as a credit card) that gets automatically debited as needed according to any of many possible patterns. etc.

Validated Client

In the ocap world we don't generally make access control decisions based on user identity. This is because of confused deputy problems and the related issue that there can be a potentially unbounded number of intermediary interests interposed between the initiation of an action and the resulting request for service. Instead, we are more concerned that the wielder of a capability be an entity that we are willing to entrust with the authority the capability represents. One way to do this is by validating the code that will be doing the wielding. In general, when you are communicating over the network to some other computer, you have no way to verify that the software it's running is actually the software it claims to be. However, it is possible to run foreign software in an execution environment that you either control yourself or trust to behave in a manner compatible with your interests.

In such an arrangement, you have the option of wiring an object to its desired capabilities by having its actual coded delivered into the trusted execution environment to run. The actual wielding of the capability takes place there, with more remote parties only having to be trusted to honestly express their intentions, which can be defined tautologically as whatever intentions they actually express.

This opens up the possibility that you can answer the question "do I trust this object with this authority?" via direct analysis of its code or by having it be accompanied by some kind of certification that it has been vetted by some body that you trust to correctly verify that it has the qualities you care about. The latter certification can be the product of arbitrarily sophisticated formal analysis or it could simply have been reviewed by someone you respect to look at things competently and honestly. The decision of how rigorous or paranoid you want to be is up to you, and this may vary depending what the stakes at hand actually are. In addition, some properties you care about might be inately based on subjective or aesthetic criteria rather than some kind of Turing-challenging computational analysis. For example, you might want to have somebody verify that the submitted entity does not contain any NSFW imagery that might upset the family friendly game world you're running.

One common objection I heard to this approach when we went down this path at EC was to observe that what we want is essentially a solution to the halting problem -- in the general case, the combination of open extensibility with strongly enforced local controls almost certainly produces an intractable analysis challenge.

However, the halting problem actually has a practical "solution": though you can't prove the haltingness of every possible piece of code, you can partition the code into three buckets: "provably will halt", "provably won't halt", and "undeterminable within some bounded amount of effort". Then, as a policy choice treat the "undeterminable" category the same as "won't halt", i.e., you exclude it. While this excludes a large (actually, infinite) set of code that could be proven acceptable with additional analysis work, in practice we find it mostly doesn't matter. The easily analyzable set is large enough to encompass code that can do everything we actually care about. In particular, it is often straightfoward to transform code from "undeterminable" into similar code that does the same job but can be shown to terminate, since things that are useful don't usually require pathological coding patterns to realize them.

Realization

Now let's place these ideas into the context of the ocap kernel. For the sake of simplicity, what follows will refer to singular things -- a service, an API, an ocap, etc. I believe the model here readily generalizes to cover multiplicity of any of these, but for the now that's unnecessary complication we can gloss over.

Unfortunately, what follows has some unavoidable circularity, common when trying to describe a system with a bunch of interacting parts. You may have go through this more than once to put together the model in your head. Sorry.

First, some definitions and terminological conventions:

"service": Something that somebody does that somebody else might want done. In the model being presented here, a service is realized by code running in a vat.
"service provider": The somebody who does the thing. The service provider hosts the service vat in an ocap kernel that it trusts, and determines the code that runs in it.
"service vat": A vat running code determined by the service provider, which actually does the work of providing the service.
"contact vat": A vat running code determined by the service provider, which contains the contact endpoint for the service. The service vat and the contact vat may be the same.
"service consumer": The somebody who wants the thing done. The service consumer is presumed to be non-local, that is, it is not presumed to be code hosted in a vat within the kernel that the service provider is using, nor that it is code hosted in a vat at all, nor indeed that it is even code.
"service API": A specific method interface on for the service, via which the service consumer makes requests to the service provider to perform the service.
"service description": A JSON-serializable data object that describes a service, consisting at the top level of three parts:
- A formal specification of the service API: names of methods, the types and order of the arguments to those methods, the types of the return values (if any) from those methods, plus the same kinds of specifications for any objects or ocaps that are passed in as arguments or returned from methods.
- An informal description of the service itself (i.e., what it does and what you'd use it for) and of the various elements in the API specification, all presented in natural language that can be understood by a human or LLM.
- An ocap URL for the service's contact endpoint. Note that this is a URL rather than a direct ocap reference so that the service description can be published outside the ocap ecosystem as a pure data object.
"service endpoint": a service provider ocap implementing the service API.
"contact endpoint": a service provider ocap implementing the contact protocol, allowing a possible service consumer to learn about the service and initiate contact with it.
"contact protocol": the method interface of contact endpoints. It is the same for all contact endpoints and is specified by this design. It provides methods to:
- Obtain the service description.
- Initiate (or attempt to initiate) contact with the service itself.
- Learn about affiliated, related, and alternative services (optional).
"service matcher": something that collects service descriptions and enables potential service consumers to match their needs and desires against what's in the collection. The address of the service matcher and the protocols for interacting with it are expected to be well known to the specific population of service providers and service consumers that it is intended to serve.

Note that this definition is deliberately vague with respect to the scope of services collected, the scope of availability of the service matcher itself, and in particular the mechanism by which this matching operation is performed. While I expect the latter to be some kind of LLM-like or LLM-entangled process finding associations between service descriptions and comparably loose descriptions of requirements or desiderata, the details are intentionally left open to experimentation. The details of the protocols for interacting with the service matcher are also left open. I do expect this document to eventually include a draft design for some concrete realization of this in order to actually implement a first cut of the service discovery model I'm advocating here. I simply want to emphasize that the fundamental model itself is generally agnostic as to what that design is.

How this might work

In describing specific scenarios, the labels Provider, Consumer, and Matcher will be used for the service provider, the (potential) service consumer, and the service matcher respectively.

Provider makes the service known to Matcher. While there are many ways this could work, they all come down to communicating the service description, the contact endpoint URL, or a direct ocap reference to the contact endpoint. In the latter two cases, Matcher interacts with the contact endpoint to obtain the service description.

Consumer interacts with Matcher, providing a description of what it is looking for or otherwise somehow querying for possible services of interest, possibly engaging in an extended dialog to narrow down to a particular service or set of services that satisfy the interest (the particulars of this dialog and how it works are very interesting and very important, but out of scope here). Ultimately, assuming success in this, the Consumer ends up with a reference to a service contact endpoint.

Consumer sends a message to the contact endpoint initiating contact with the service, beginning a process that should end up with the consumer holding a capability to the service itself. How this handshake goes will vary depending on the service's specific access model. The (It's unclear to me whether we want to include some representation of the access model in the service description, or whether this should be part of the service initiation protocol. At the moment I'm leaning towards the latter, because my intuition is that it provides a cleaner extensibility hook given the dynamism that some access models may require. That is what this example will describe, but we should consider that this remains a design choice that probably warrants further attention.)

Public API: Provider returns a service endpoint.
Permissioned API: Provider returns a descriptor that indicates the flavor of permission credentials that will required plus and an ocap to submit these to. The result of that submission will in turn, asssuming success, be a service endpoint. In principle this could be extended to encompass multiple rounds of message exchange. For example if access requires payment, the Provider might give a price quote before the Consumer submits payment authorization.
Validated client: Provider returns an ocap to which Consumer submits a vat code bundle, possibly accompanied by various validation certificates. Provider verifies these certificates, if any, and performs whatever additional validation checks of its own that it wants. Assuming these all pass muster, Provider launches a vat containing the submitted code bundle, passing it the service endpoint as one of its endowments. It then returns a reference to the vat root (or possibly another ocap returned by some API call specified by whatever standard we have for how a validated client vat is configured) to the Consumer.

All of the above is speculative and conditional, and makes use of a lot of abstractions that are only vaguely or partially defined. The next step is to translate this into a set of types and interface definitions that render something like what is described into concrete form.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on LLMs in a service discovery ecosystem

Notes on LLMs in a service discovery ecosystem

Preamble and concepts

Public API

Permissioned API

Pay For Service

Validated Client

Realization

How this might work

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally