Persona Evaluation — condition7__focus_only__sample500_seed99
| Field |
Value |
| Persona |
Nimesh Kulatunga |
| Judge model |
openai/gpt-4.1-mini |
| Embed model |
text-embedding-3-small |
| Rubric draws (k) |
3 |
| Total suggestions |
30 |
| Pipeline mode |
hall_pass=skipped (active=False) |
Bucket Distribution
| Bucket |
Count |
% of total |
| Task Critical |
6 |
20.0% |
| Quality Of Life |
19 |
63.3% |
| Noise |
5 |
16.7% |
Set-Level Diversity Metrics
| Metric |
Value |
Interpretation |
| DPP log-det |
-27.6616 |
Higher = more diverse + high-quality set |
| Cluster coverage |
1.000 |
Fraction of BGT clusters with a task-critical hit |
| ILAD |
0.7023 |
Mean pairwise distance; higher = more diverse |
| Redundancy rate |
0.000 |
Fraction of near-duplicate suggestions (cos > 0.9) |
Composite Score
| Component |
Weight |
Value |
| DPP set score (normalised) |
0.5 |
— |
| Cluster coverage |
0.3 |
1.000 |
| Mean quality (non-hallucinated) |
0.2 |
— |
| Hallucination penalty |
alpha=0.5 |
x 1.0000 |
Composite score: 0.5777
Hallucination Summary
Filter: skipped. Hallucination pass skipped (no anti-GT). Penalty pinned to 1.0 so the composite formula and weights stay identical to the active mode and scores remain comparable across runs.
Filtered Hallucinations
Filter inactive — no suggestions were inspected for anti-GT hallucinations.
Top 5 Task-Critical Suggestions
| # |
ID |
Quality |
Title |
| 1 |
11 |
0.980 |
ASMap Project Core: Focus on Erlay and P2P Efficiency |
| 2 |
27 |
0.970 |
Pre-emptive Setup for Summer of Bitcoin 2026 Proposal |
| 3 |
28 |
0.937 |
Access ASMap Data Sources for Stable Channels Research |
| 4 |
30 |
0.870 |
Automate CSV to GitHub Markdown for Project Selection |
| 5 |
7 |
0.750 |
Bridge Theory of Computing with GUMBO Logic |
Top 5 Quality-of-Life Suggestions
| # |
ID |
Quality |
Title |
| 1 |
17 |
0.970 |
Automate CSV Validation in Cursor with Python |
| 2 |
2 |
0.887 |
Implement 'GUMBO' Documentation via Screen-to-Markdown |
| 3 |
6 |
0.870 |
Implement Slack-to-Task Automation for logical-eng-ext |
| 4 |
8 |
0.840 |
Automate GUMBO Documentation via Screen Recording |
| 5 |
18 |
0.837 |
Sync Slack/Gmail Updates via 'Logical Data' Labels |