Persona Evaluation — condition2__screen_with_metadata__sample500_seed99¶

Field	Value
Persona	Nimesh Kulatunga
Judge model	`openai/gpt-4.1-mini`
Embed model	`text-embedding-3-small`
Rubric draws (k)	3
Total suggestions	50
Pipeline mode	hall_pass=skipped (active=False)

Bucket Distribution¶

Bucket	Count	% of total
Task Critical	15	30.0%
Quality Of Life	23	46.0%
Noise	12	24.0%

Set-Level Diversity Metrics¶

Metric	Value	Interpretation
DPP log-det	-31.9247	Higher = more diverse + high-quality set
Cluster coverage	1.000	Fraction of BGT clusters with a task-critical hit
ILAD	0.7399	Mean pairwise distance; higher = more diverse
Redundancy rate	0.000	Fraction of near-duplicate suggestions (cos > 0.9)

Composite Score¶

Component	Weight	Value
DPP set score (normalised)	0.5	—
Cluster coverage	0.3	1.000
Mean quality (non-hallucinated)	0.2	—
Hallucination penalty	alpha=0.5	x 1.0000

Composite score: 0.6274

Hallucination Summary¶

Filter: skipped. Hallucination pass skipped (no anti-GT). Penalty pinned to 1.0 so the composite formula and weights stay identical to the active mode and scores remain comparable across runs.

Filtered Hallucinations¶

Filter inactive — no suggestions were inspected for anti-GT hallucinations.

Top 5 Task-Critical Suggestions¶

#	ID	Quality	Title
1	16	1.000	Automate WhatsApp Export Parsing with Python
2	22	1.000	Automate WhatsApp Ingestion for GUMBO via Python Script
3	36	1.000	Map Python Skills to Nostr NIP-01 Implementation
4	41	1.000	Implement RPKI-to-ASmap Validation Logic
5	2	0.980	Configure Cursor 'Rules for AI' for GUMBO Context

Top 5 Quality-of-Life Suggestions¶

#	ID	Quality	Title
1	7	1.000	Automate DB Isolation with venv-aware Shell Script
2	28	1.000	Implement Local Polling for WhatsApp Export Parsing
3	39	1.000	Optimize RPKI Validation with Routinator Filters
4	42	0.980	Automate BGP Data Fetching via RIS-Live
5	31	0.970	Implement Erlay (PR #21515) to Mitigate Eclipse Attacks