[Feature] Add generate_until() support for lm-eval#514
Conversation
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
ce86e03 to
031977f
Compare
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2bddeb5 to
12fd7bb
Compare
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
maxdebayser
left a comment
There was a problem hiding this comment.
There are a lot of similarities between generate_until and utils.generation.generate. The main differences seem to be that the list of requests is processed on by one, instead of batched, that tqdm is added in the generation loop and that stop sequences are supported. It seems to be that generate_until could be a wrapper of generate and implement tqdm logging and stop sequences with the post_iteration_hook.
| # Tokenize the prompt | ||
| context_tokens = self.tokenizer.tokenize(prompt) | ||
| context_ids = self.tokenizer.convert_tokens_to_ids(context_tokens) | ||
| if not len(context_ids): |
There was a problem hiding this comment.
Can an error be raised here instead or would that break a test run? Because not all tokenizers have a bos_token_id, for example Qwen/Qwen3-0.6B doesn't
| input_ids = torch.tensor(context_ids, dtype=torch.long, device=self.device) | ||
|
|
||
| # model requires batch dimension | ||
| if not len(input_ids.shape) > 1: |
| result: List[str] = [] | ||
| kwargs: MutableMapping[str, Any] = dict() | ||
|
|
||
| # KV caching settings |
There was a problem hiding this comment.
If this method is expanded for batch inference in the future, use_cache will be required because without it the attention mask not generated correctly for batch size > 1
| logits = out | ||
|
|
||
| # Handle both 3D and 2D logits | ||
| if logits.dim() == 3: |
There was a problem hiding this comment.
Since the unsqueeze above always adds a batch dimension, it should always enter here.
|
thanks for the great feedback @maxdebayser. makes a lot of sense to reuse these functions if possible. I will have a look at this and rework the PR! |
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
This PR implements the
generate_until()method for FMS models in lm-eval, enabling generation-based evaluation tasks.Features:
max_gen_tokstqdmnum_samplesto run a subset (n samples) of the taskconfirm_run_unsafe_codewhich is needed for some task