refactor(web): move and rename SearchResult as TokenResultMapping 🚂#15814
refactor(web): move and rename SearchResult as TokenResultMapping 🚂#15814jahorton wants to merge 3 commits intoepic/autocorrectfrom
Conversation
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
Build-bot: skip build:web Test-bot: skip
c92585c to
77b8af2
Compare
ermshiperete
left a comment
There was a problem hiding this comment.
LGTM. One comment with a potential bug devin.ai identified - the current code is probably correct with the upcoming changes, but thought I'd point it out.
| currentReturns[node.resultKey] = node; | ||
| // Do not track yielded time. | ||
| return new SearchResult(node, newResult.spaceId); | ||
| return new TokenResultMapping(newResult.finalNode); |
There was a problem hiding this comment.
Devin.ai marked this as a potential bug. I can see that this might become a problem if newResult.finalNode.spaceId != newResult.spaceId can ever happen.
Dropped spaceId argument in getBestMatches during rename refactoring
The refactoring from SearchResult to TokenResultMapping in getBestMatches dropped the explicit newResult.spaceId argument. The old code was new SearchResult(node, newResult.spaceId) but the new code is new TokenResultMapping(newResult.finalNode), which falls back to node.spaceId via the constructor default. When a SearchQuotientCluster is the search module, SearchQuotientCluster.handleNextNode() overrides the result's spaceId to this.spaceId (the cluster's own ID) at search-quotient-cluster.ts:148, which differs from finalNode.spaceId (the child spur's ID). The downstream consumer at predict-helpers.ts:539 uses match.spaceId to look up the correct tokenization via tokenizations.find(t => t.spaceId == match.spaceId) — a wrong spaceId would cause tokenization to be undefined, leading to errors in buildAndMapPredictions. In the current production flow, only LegacyQuotientSpur/LegacyQuotientRoot are passed directly (where node.spaceId == result.spaceId), so the bug is latent, but the semantic contract is broken and would fail if SearchQuotientCluster is ever used as a direct search module.
There was a problem hiding this comment.
Note #15861 - I'm working to resolve that, though it'll become its own PR.
This is the first of four PRs designed to, in sequence, add an abstraction layer to the correction-search algorithm so that we can use the same core code to search for corrections in different ways.
In particular, up until now, we've always searched for corrections for just one token at a time - the current token. To better support whitespace fat-fingering, we'll need the ability to answer the following question: "Which tokenization pattern is the closest to matching the current text after corrections?" The core logic of the search is the same, but the backing data will be different - something that abstraction can handle well.
To accomplish the goal of abstracting the core search mechanisms, work will proceed in the following manner:
Build-bot: skip build:web
Test-bot: skip