Skip to content

Conversation

@claucondor
Copy link
Contributor

@claucondor claucondor commented Jan 18, 2026

Following up on #13, I ran more experiments to find an alternative to RRF+MMR.

Background

RRF fusion on LoCoMo: -7.9% worse than baseline. Makes sense - LoCoMo queries are conversational ("When did X happen?", "What did Y say?"). No exact terms to match.

Keyword search helps with different stuff: function names (parseJWT), error codes (CVE-2017-3156), versions (Oracle 12c). So I tried an adaptive approach - let the LLM decide when to use it.

What I tested

Fusion method: Convex Combination vs RRF

CC:  S = α · S_sem + (1-α) · S_kw
RRF: S = Σ 1/(k + rank)

CC keeps score magnitude, RRF only uses ranks. CC worked better for this.

Alpha values (LoCoMo, 60 questions):

α F1
0.7 0.3298 ← best
0.9 0.2999
0.5 0.2822

Adaptive alpha (varying α by keyword_importance): performed worse than fixed 0.7. Even for technical queries, aggressive keyword weighting hurts. Planning already captures the terms semantically.

Results

Dataset Type Boost triggered F1 change
LoCoMo Conversational 0% No change
TechQA Technical 75% +8.5%

System correctly skips boost for conversational queries, applies it for technical ones.

Limitations

  • Small samples (20 TechQA, 5 LoCoMo) - hardware constraints
  • Tested with Mistral-7B and all-MiniLM-L6-v2
  • Larger benchmarks needed for statistical confidence

References

Adds adaptive hybrid retrieval that activates BM25 keyword search
only when the planning LLM detects technical terms requiring exact
lexical matching (function names, error codes, versions, etc.).

Changes:
- Modified planning prompt to detect exact_match_terms
- Added _convex_combination_fusion method: S = α·S_sem + (1-α)·S_kw
- Added keyword_search_with_scores for BM25 FTS with scoring
- New config: CC_ALPHA (default 0.7)

Related to aiming-lab#13.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant