Skip to content

zaffnet/qareen

Repository files navigation

qareen

A multimodal few-shot companion that balances relevance and diversity for LLM-as-a-Judge workflows.

Python Hugging Face ChromaDB Pydantic

Overview

qareen (قرين) means "constant companion"—a guide that subtly shapes decisions. The project plays the same role for Large Language Models: it supplies the right few-shot examples at the right moment, guiding model judgments for multimodal tasks that mix text and images.

Why it matters

Few-shot selection significantly influences LLM-as-a-Judge quality. Position bias, redundant examples, and modality imbalance can all distort evaluations. qareen addresses these pitfalls by extending Maximum Marginal Relevance (MMR) to multimodal retrieval with a tunable alpha parameter that controls text–image weighting.

Key features

  • Multimodal MMR retrieval: Balance relevance and diversity across text and image signals.
  • Model flexibility: Swap between CLIP, SIGLIP, or other Hugging Face embeddings via transformers and sentence-transformers.
  • Interactive exploration: Adjust modality weights live through a Gradio UI to see how examples shift.
  • GPU-aware runtime: Detects CUDA availability and guides you to install a compatible PyTorch build when acceleration is possible.

Demo

We demonstrate qareen on the Shopping Queries Image Dataset (SQID) Al Ghossein et al. (2024), part of Amazon's ESCI benchmark for product search.

Getting started

Installation

Install the base package from PyPI:

pip install qareen

To set up the full development environment with linting, type-checking, and testing tools:

Python 3.13+ Compatibility: The package supports Python 3.13 and 3.14. Both sentencepiece and PyTorch now provide prebuilt cp313 wheels on PyPI. If you encounter rare environment-specific build issues, Python 3.11 or 3.12 remain stable fallback options. CI testing currently targets Python 3.11 and 3.12.

Usage

Note on GPU support: The gpu extra is currently a placeholder and does not install any GPU-specific packages. For GPU support, install a CUDA-enabled PyTorch build from the official PyTorch installation guide before installing qareen. The package works with CPU-only PyTorch as well.

Marqo Fashion Experiment: Run ./experiments/marqo_fashion/run_experiment.sh to compare 4 embedding models across 9 alpha values on fashion dataset.

Configuration

See docs/CONFIGURATION.md for environment variables, logging setup, and telemetry settings.

See docs/DISTANCE_METRIC.md for details on the cosine distance metric used for similarity search.

Quick Start

Run the end-to-end example (requires dataset and environment configuration):

# Set required environment variables (or use a .env config file)
export QAREEN_EMBEDDING_MODELS='["google/siglip-base-patch16-224"]'
export QAREEN_ALPHA_VALUES='[0.0, 0.5, 1.0]'
export QAREEN_ENVIRONMENT="dev"
# ... see docs/CONFIGURATION.md for all required settings

uv run python scripts/build_index.py --dataset-name <data_dir>

See docs/LOCAL_DATA_GUIDE.md for detailed steps.

References

  1. Zheng, L., et al. (2024). Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge. arXiv preprint arXiv:2406.07791. https://arxiv.org/abs/2406.07791
  2. Tang, Y., et al. (2025). The Few-shot Dilemma: Over-prompting Large Language Models. arXiv preprint arXiv:2509.13196. https://arxiv.org/abs/2509.13196
  3. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of SIGIR '98, 335-336. https://doi.org/10.1145/290941.291025
  4. Al Ghossein, M., Chen, C.-W., & Tang, J. (2024). Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search. Part of the Shopping Queries Dataset by Amazon. Paper Link
  5. Zhao, T. Z., et al. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. Proceedings of the International Conference on Machine Learning (ICML). https://arxiv.org/abs/2102.09690