LooGLE v2: A novel real-world benchmark for long-dependency understanding
First, create a conda environment and install the required dependencies:
conda create -n loogle python=3.10
conda activate loogle
pip install vllmThen, clone the benchmark repository:
git clone https://github.com/GraphPKU/LooGLE-v2.git
cd LooGLE-v2You can download the benchmark dataset into the ./datasets directory with the following command:
git clone https://huggingface.co/datasets/GraphPKU/LooGLE-v2 ./datasets/LooGLE-v2We take Llama-3.1-8B-Instruct as an example for inference.
First, launch the model server using vllm serve:
vllm serve meta-llama/Llama-3.1-8B-Instruct \
--api-key GraphPKU \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.95 \
--max_model_len 131072 \
--trust-remote-codeNote:
--tensor-parallel-sizeshould be set to the number of available GPUs.
To run predictions on the benchmark using your model:
python predict.py \
--model Llama-3.1-8B-Instruct \
--data_dir ./datasets/LooGLE-v2After inference is complete, run the evaluation script:
python eval/eval.py \
--input_path ./results/Llama-3.1-8B-Instruct.jsonlThis will compute accuracy and other metrics for the model's performance on LooGLE-v2.