human_eval

HumanEval Evaluation Script for DeepSpeed-FastGen

DISCLAIMER

This human-eval evaluation will execute untrusted model-generated code. As per the OpenAI warning, we strongly recommend you sandbox your environment as described in the human-eval paper.

Setup

Running the human-eval evaluation requires installation of human_eval with the execution code enabled, which requires local changes to execution.py. The following steps will setup human-eval for execution:

git clone https://github.com/openai/human-eval.git
sed -i '/exec(check_program, exec_globals)/ s/^# //' human-eval/human_eval/execution.py
cd human-eval
python -m pip install -e .

This evaluation also requires the installation of DeepSpeed-MII:

python -m pip install deepspeed-mii

Additional DeepSpeed-MII installation details can be found here.

Run the Evaluation

The following command shows how to run a benchmark using the codellama/CodeLlama-7b-Python-hf model:

python run_human_eval.py --model codellama/CodeLlama-7b-Python-hf --max-tokens 512 --num-samples-per-task 20

Run Evaluation on Samples

Once samples have been generated, they can be evaluated independently using the evaluate_functional_correctness command. For example, the following command will evaluate mii_samples.jsonl:

evaluate_functional_correctness mii_samples.jsonl

The evaluation results will be saved to mii_samples.jsonl_results.jsonl.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
run_human_eval.py		run_human_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

HumanEval Evaluation Script for DeepSpeed-FastGen

DISCLAIMER

Setup

Run the Evaluation

Run Evaluation on Samples

FilesExpand file tree

human_eval

Directory actions

More options

Directory actions

More options

Latest commit

History

human_eval

Folders and files

parent directory

README.md

HumanEval Evaluation Script for DeepSpeed-FastGen

DISCLAIMER

Setup

Run the Evaluation

Run Evaluation on Samples