Persona Evaluation — Ground truth ablation: BGT + IGT (skipped)¶

Field	Value
Persona	Nimesh Kulatunga
Judge model	`openai/gpt-4.1-mini`
Embed model	`text-embedding-3-small`
Rubric draws (k)	3
Total suggestions	43
Pipeline mode	hall_pass=skipped (active=False)

Bucket Distribution¶

Bucket	Count	% of total
Task Critical	29	67.4%
Quality Of Life	8	18.6%
Noise	6	14.0%

Set-Level Diversity Metrics¶

Metric	Value	Interpretation
DPP log-det	-41.3785	Higher = more diverse + high-quality set
Cluster coverage	1.000	Fraction of BGT clusters with a task-critical hit
ILAD	0.6742	Mean pairwise distance; higher = more diverse
Redundancy rate	0.000	Fraction of near-duplicate suggestions (cos > 0.9)

Composite Score¶

Component	Weight	Value
DPP set score (normalised)	0.5	—
Cluster coverage	0.3	1.000
Mean quality (non-hallucinated)	0.2	—
Hallucination penalty	alpha=0.5	x 1.0000

Composite score: 0.5878

Hallucination Summary¶

Filter: skipped. Hallucination pass skipped (no anti-GT). Penalty pinned to 1.0 so the composite formula and weights stay identical to the active mode and scores remain comparable across runs.

Filtered Hallucinations¶

Filter inactive — no suggestions were inspected for anti-GT hallucinations.

Top 5 Task-Critical Suggestions¶

#	ID	Quality	Title
1	4	0.980	Auto-populate research doc with per-org summary stubs
2	3	0.970	Detect that 'Summer of Bitcoin Interested Projects' Googl...
3	8	0.970	Summarise SOB Student Guide pages user read into an appli...
4	9	0.970	Offer a proposal draft outline after 60+ minutes of SOB r...
5	10	0.970	Remind that SOB requires a competency test before proposa...

Top 5 Quality-of-Life Suggestions¶

#	ID	Quality	Title
1	33	0.937	Auto-draft SOB April 30 deadline and 4 proposal work-bloc...
2	17	0.870	Offer to resume GitHub docs task when Canva work wraps
3	27	0.807	Recognise Google Docs as primary note-taking surface for ...
4	40	0.807	Flag that caption writing is a deferred task (your usual ...
5	30	0.670	Produce maintainer dossier with contribution preferences ...