[Feature] Add generate_until() support for lm-eval by yannicks1 · Pull Request #514 · foundation-model-stack/foundation-model-stack

yannicks1 · 2026-03-09T15:38:44Z

This PR implements the generate_until() method for FMS models in lm-eval, enabling generation-based evaluation tasks.

Features:

Greedy decoding with configurable max_gen_toks
Stop string detection (earliest match)
EOS token handling
KV cache support
Robust logits shape handling (2D/3D)
Proper logging throughout using tqdm
passing lm-eval argument num_samples to run a subset (n samples) of the task
passing lm-eval argument confirm_run_unsafe_code which is needed for some task

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

maxdebayser

There are a lot of similarities between generate_until and utils.generation.generate. The main differences seem to be that the list of requests is processed on by one, instead of batched, that tqdm is added in the generation loop and that stop sequences are supported. It seems to be that generate_until could be a wrapper of generate and implement tqdm logging and stop sequences with the post_iteration_hook.

maxdebayser · 2026-03-25T18:18:32Z

+            # Tokenize the prompt
+            context_tokens = self.tokenizer.tokenize(prompt)
+            context_ids = self.tokenizer.convert_tokens_to_ids(context_tokens)
+            if not len(context_ids):


Can an error be raised here instead or would that break a test run? Because not all tokenizers have a bos_token_id, for example Qwen/Qwen3-0.6B doesn't

maxdebayser · 2026-03-25T18:19:22Z

+            input_ids = torch.tensor(context_ids, dtype=torch.long, device=self.device)
+
+            # model requires batch dimension
+            if not len(input_ids.shape) > 1:


This is always True

maxdebayser · 2026-03-25T18:21:04Z

+        result: List[str] = []
+        kwargs: MutableMapping[str, Any] = dict()
+
+        # KV caching settings


If this method is expanded for batch inference in the future, use_cache will be required because without it the attention mask not generated correctly for batch size > 1

maxdebayser · 2026-03-25T18:22:11Z

+                    logits = out
+
+                # Handle both 3D and 2D logits
+                if logits.dim() == 3:


Since the unsqueeze above always adds a batch dimension, it should always enter here.

yannicks1 · 2026-03-30T07:48:21Z

thanks for the great feedback @maxdebayser. makes a lot of sense to reuse these functions if possible. I will have a look at this and rework the PR!

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

yannicks1 added 7 commits March 9, 2026 15:41

implementing generate_until() for lm-eval

23d8260

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

better handling of max_gen_toks

e5a7641

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

passing num_samples to be run in benchmark

0090d4c

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

proper handling of kv caching

167c04c

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

some fixes, cleanup

f185a5b

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

Apply ruff formatting

735557e

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

Remove debug logging

031977f

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

yannicks1 force-pushed the generate-until-feature branch from ce86e03 to 031977f Compare March 9, 2026 15:41

Fix MyPy errors: add tqdm and fix tensor type issues

12fd7bb

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

yannicks1 force-pushed the generate-until-feature branch from 2bddeb5 to 12fd7bb Compare March 9, 2026 16:18

yannicks1 and others added 2 commits March 9, 2026 16:25

add types-tqdm to finally silence the linter

cb56006

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

Merge branch 'main' into generate-until-feature

f61676d

maxdebayser reviewed Mar 25, 2026

View reviewed changes

Merge branch 'main' into generate-until-feature

5424472

yannicks1 marked this pull request as draft March 30, 2026 07:48

yannicks1 added 4 commits April 19, 2026 15:39

merge main

90487a6

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

rmv redundant import

4ff42b4

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

fix ruff

468287a

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

remove function redefinition

c266b22

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add generate_until() support for lm-eval#514

[Feature] Add generate_until() support for lm-eval#514
yannicks1 wants to merge 15 commits into
foundation-model-stack:mainfrom
yannicks1:generate-until-feature

yannicks1 commented Mar 9, 2026

Uh oh!

maxdebayser left a comment

Uh oh!

maxdebayser Mar 25, 2026

Uh oh!

maxdebayser Mar 25, 2026

Uh oh!

maxdebayser Mar 25, 2026

Uh oh!

maxdebayser Mar 25, 2026

Uh oh!

yannicks1 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yannicks1 commented Mar 9, 2026

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

maxdebayser Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

maxdebayser Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

maxdebayser Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

maxdebayser Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

yannicks1 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants