Fix: Limit the number of generated questions#13596
Conversation
nerdai
left a comment
There was a problem hiding this comment.
Thanks @tuomastik! Looks good. Though let's not update legacy llama-index at this point (I've marked in my review where we can revert your changes).
I guess this doesn't cover the case where less than the desired questions were generated, but this problem existed even before your PR
…e questions are generated than requested by parameter 'num_questions_per_chunk'
50904de to
a8e074f
Compare
That's correct. One approach to solving that issue would be to regenerate questions until the desired number of questions has been generated. |
|
@tuomastik that makes sense. Maybe for now, we should at least raise a warning to the user that the LLM call resulted in less than the desired amount of questions per chunk. Similar to here: |
✅ Done |
Description
Limit the number of generated questions to avoid cases where more questions are generated than requested by the parameter
num_questions_per_chunk.Fixes #10694 and #10089
Added the fix to all the modules mentioning
num_questions_per_chunkexcept the following modules that contain deprecated classes that possibly do not have this issue due to theirnumparameter in_agenerate_datasetmethod:llama-index-core/llama_index/core/evaluation/dataset_generation.pyQueryResponseDatasetthat is deprecated in favor ofLabelledRagDatasetllama-index-legacy/llama_index/legacy/evaluation/dataset_generation.pyDatasetGeneratorthat is deprecated in favor ofRagDatasetGeneratorNew Package?
Version Bump?
Type of Change
How Has This Been Tested?
Suggested Checklist:
make format; make lintto appease the lint gods