Skip to content

Persona Evaluation — Ground truth ablation: BGT only (skipped)

Field Value
Persona Nimesh Kulatunga
Judge model openai/gpt-4.1-mini
Embed model text-embedding-3-small
Rubric draws (k) 3
Total suggestions 27
Pipeline mode hall_pass=skipped (active=False)

Bucket Distribution

Bucket Count % of total
Task Critical 20 74.1%
Quality Of Life 4 14.8%
Noise 3 11.1%

Set-Level Diversity Metrics

Metric Value Interpretation
DPP log-det -23.4229 Higher = more diverse + high-quality set
Cluster coverage 1.000 Fraction of BGT clusters with a task-critical hit
ILAD 0.6463 Mean pairwise distance; higher = more diverse
Redundancy rate 0.000 Fraction of near-duplicate suggestions (cos > 0.9)

Composite Score

Component Weight Value
DPP set score (normalised) 0.5
Cluster coverage 0.3 1.000
Mean quality (non-hallucinated) 0.2
Hallucination penalty alpha=0.5 x 1.0000

Composite score: 0.6026

Hallucination Summary

Filter: skipped. Hallucination pass skipped (no anti-GT). Penalty pinned to 1.0 so the composite formula and weights stay identical to the active mode and scores remain comparable across runs.

Filtered Hallucinations

Filter inactive — no suggestions were inspected for anti-GT hallucinations.

Top 5 Task-Critical Suggestions

# ID Quality Title
1 4 0.980 Auto-populate research doc with per-org summary stubs
2 8 0.970 Summarise SOB Student Guide pages user read into an appli...
3 9 0.970 Offer a proposal draft outline after 60+ minutes of SOB r...
4 7 0.947 Explain Bitcoin-Fuzz Issues #295 and #490 that user opened
5 3 0.937 Detect that 'Summer of Bitcoin Interested Projects' Googl...

Top 5 Quality-of-Life Suggestions

# ID Quality Title
1 17 0.870 Offer to resume GitHub docs task when Canva work wraps
2 27 0.807 Recognise Google Docs as primary note-taking surface for ...
3 23 0.610 Detect WhatsApp Web photo downloads and offer folder orga...
4 11 0.597 Offer to open Google Calendar to block SOB proposal writi...