Skip to content

[Feature] Add generate_until() support for lm-eval#514

Draft
yannicks1 wants to merge 15 commits into
foundation-model-stack:mainfrom
yannicks1:generate-until-feature
Draft

[Feature] Add generate_until() support for lm-eval#514
yannicks1 wants to merge 15 commits into
foundation-model-stack:mainfrom
yannicks1:generate-until-feature

Conversation

@yannicks1
Copy link
Copy Markdown
Contributor

This PR implements the generate_until() method for FMS models in lm-eval, enabling generation-based evaluation tasks.

Features:

  • Greedy decoding with configurable max_gen_toks
  • Stop string detection (earliest match)
  • EOS token handling
  • KV cache support
  • Robust logits shape handling (2D/3D)
  • Proper logging throughout using tqdm
  • passing lm-eval argument num_samples to run a subset (n samples) of the task
  • passing lm-eval argument confirm_run_unsafe_code which is needed for some task

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
@yannicks1 yannicks1 force-pushed the generate-until-feature branch from ce86e03 to 031977f Compare March 9, 2026 15:41
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
@yannicks1 yannicks1 force-pushed the generate-until-feature branch from 2bddeb5 to 12fd7bb Compare March 9, 2026 16:18
yannicks1 and others added 2 commits March 9, 2026 16:25
Copy link
Copy Markdown
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of similarities between generate_until and utils.generation.generate. The main differences seem to be that the list of requests is processed on by one, instead of batched, that tqdm is added in the generation loop and that stop sequences are supported. It seems to be that generate_until could be a wrapper of generate and implement tqdm logging and stop sequences with the post_iteration_hook.

Comment thread fms/utils/evaluation.py
# Tokenize the prompt
context_tokens = self.tokenizer.tokenize(prompt)
context_ids = self.tokenizer.convert_tokens_to_ids(context_tokens)
if not len(context_ids):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an error be raised here instead or would that break a test run? Because not all tokenizers have a bos_token_id, for example Qwen/Qwen3-0.6B doesn't

Comment thread fms/utils/evaluation.py
input_ids = torch.tensor(context_ids, dtype=torch.long, device=self.device)

# model requires batch dimension
if not len(input_ids.shape) > 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always True

Comment thread fms/utils/evaluation.py
result: List[str] = []
kwargs: MutableMapping[str, Any] = dict()

# KV caching settings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method is expanded for batch inference in the future, use_cache will be required because without it the attention mask not generated correctly for batch size > 1

Comment thread fms/utils/evaluation.py
logits = out

# Handle both 3D and 2D logits
if logits.dim() == 3:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the unsqueeze above always adds a batch dimension, it should always enter here.

@yannicks1
Copy link
Copy Markdown
Contributor Author

thanks for the great feedback @maxdebayser. makes a lot of sense to reuse these functions if possible. I will have a look at this and rework the PR!

@yannicks1 yannicks1 marked this pull request as draft March 30, 2026 07:48
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants