Skip to content

Persona Evaluation — condition7__focus_only__sample500_seed99

Field Value
Persona Nimesh Kulatunga
Judge model openai/gpt-4.1-mini
Embed model text-embedding-3-small
Rubric draws (k) 3
Total suggestions 30
Pipeline mode hall_pass=anti_gt (active=True)

Bucket Distribution

Bucket Count % of total
Task Critical 6 20.0%
Quality Of Life 18 60.0%
Noise 5 16.7%
Hallucinated 1 3.3%

Set-Level Diversity Metrics

Metric Value Interpretation
DPP log-det -25.7966 Higher = more diverse + high-quality set
Cluster coverage 1.000 Fraction of BGT clusters with a task-critical hit
ILAD 0.7088 Mean pairwise distance; higher = more diverse
Redundancy rate 0.000 Fraction of near-duplicate suggestions (cos > 0.9)

Composite Score

Component Weight Value
DPP set score (normalised) 0.5
Cluster coverage 0.3 1.000
Mean quality (non-hallucinated) 0.2
Hallucination penalty alpha=0.5 x 0.9833

Composite score: 0.5716

Hallucination Summary

Filter: anti_gt (active). Flagged 1 / 30 suggestions (rate 3.3%). Composite hallucination penalty: 0.9833.

Filtered Hallucinations

ID Failure mode Title Reasoning
23 over_elaboration Sync CS3063 Logic with Lightning Network Peer A... Extends from a valid proposition to fabricated technical specifics linking CS...

Top 5 Task-Critical Suggestions

# ID Quality Title
1 11 0.980 ASMap Project Core: Focus on Erlay and P2P Efficiency
2 27 0.970 Pre-emptive Setup for Summer of Bitcoin 2026 Proposal
3 28 0.937 Access ASMap Data Sources for Stable Channels Research
4 30 0.870 Automate CSV to GitHub Markdown for Project Selection
5 7 0.750 Bridge Theory of Computing with GUMBO Logic

Top 5 Quality-of-Life Suggestions

# ID Quality Title
1 17 0.970 Automate CSV Validation in Cursor with Python
2 2 0.887 Implement 'GUMBO' Documentation via Screen-to-Markdown
3 6 0.870 Implement Slack-to-Task Automation for logical-eng-ext
4 8 0.840 Automate GUMBO Documentation via Screen Recording
5 18 0.837 Sync Slack/Gmail Updates via 'Logical Data' Labels