Skip to content

Persona Evaluation — condition5__discord_only__sample500_seed99

Field Value
Persona Nimesh Kulatunga
Judge model openai/gpt-4.1-mini
Embed model text-embedding-3-small
Rubric draws (k) 3
Total suggestions 50
Pipeline mode hall_pass=anti_gt (active=True)

Bucket Distribution

Bucket Count % of total
Task Critical 7 14.0%
Quality Of Life 17 34.0%
Noise 6 12.0%
Hallucinated 20 40.0%

Set-Level Diversity Metrics

Metric Value Interpretation
DPP log-det -21.1009 Higher = more diverse + high-quality set
Cluster coverage 1.000 Fraction of BGT clusters with a task-critical hit
ILAD 0.6134 Mean pairwise distance; higher = more diverse
Redundancy rate 0.000 Fraction of near-duplicate suggestions (cos > 0.9)

Composite Score

Component Weight Value
DPP set score (normalised) 0.5
Cluster coverage 0.3 1.000
Mean quality (non-hallucinated) 0.2
Hallucination penalty alpha=0.5 x 0.8000

Composite score: 0.4994

Hallucination Summary

Filter: anti_gt (active). Flagged 20 / 50 suggestions (rate 40.0%). Composite hallucination penalty: 0.8000.

Filtered Hallucinations

ID Failure mode Title Reasoning
1 passive_viewing_as_active_interest Access Essential Bitcoin Developer Resources The suggestion prescribes Bitcoin Dev Kit and Bitcoin Optech newsletters base...
2 passive_viewing_as_active_interest Master Bitcoin Core Onboarding Terminology Memorizing Bitcoin Core terms and using PR Review Club is prescriptive withou...
3 passive_viewing_as_active_interest Sync Open-Source Contributions with Global Main... Scheduling coding tasks to align with Bitcoin Core maintainers' active hours ...
4 passive_viewing_as_active_interest Automate Your Internship Application Tracking Prescribing a specific Notion structure and linking GitHub to Discord bio is ...
5 passive_viewing_as_active_interest Optimize Discord for Deep Work and Networking Workflow improvements for Discord notifications and status are generic produc...
6 passive_viewing_as_active_interest Access Essential Bitcoin Developer Resources The suggestion prescribes specific Bitcoin Core libraries and newsletters bas...
9 over_elaboration Prepare for the 'Proof of Work' Application Phase The suggestion fabricates specific project recommendations and GitHub repos w...
10 passive_viewing_as_active_interest Master Summer of Bitcoin Onboarding Terminology The suggestion prescribes onboarding terminology and project focus without us...
11 over_elaboration Secure Portal Access and Profile Optimization S... The suggestion prescribes specific Bitcoin libraries and profile updates with...
12 over_elaboration Master Coinswap & Privacy Tech Terminology Prescriptive use of technical terms and newsletter references not grounded in...
14 over_elaboration Apply 'Strategic Under-Crowding' to Project Sel... Recommending niche Bitcoin Core and Lightning tooling focus without user indi...
17 passive_viewing_as_active_interest Access Bitcoin Dev Resources for Competitive Edge References to specific Bitcoin Dev Kit and Lightning Dev Kit libraries are pr...
25 over_elaboration Deep-Dive into Coinswap and Mercury Layer Extends from user's interest in Coinswap to detailed technical topics and spe...
33 passive_viewing_as_active_interest Master Bitcoin Technical Terminology for Proposals References Bitcoin Optech newsletter and technical terms without evidence use...
34 over_elaboration Prepare Proposal Drafts Using Standard Templates Prescribes specific Bitcoin libraries and detailed proposal structure not gro...
41 over_elaboration Strict Communication Compliance Protocol The suggestion prescribes specific communication protocols and channels for t...
42 over_elaboration Master Lightning Network Core Terminology The suggestion prescribes detailed Lightning Network terminology and resource...
43 over_elaboration Access Open-Source Lightning Repositories The suggestion prescribes specific open-source repos and project focus strate...
44 over_elaboration Optimize Proposal Workflow with Polar The suggestion recommends a specific tool (Polar) and workflow optimization n...
45 over_elaboration Leverage Peak Cognitive Load for Protocol Design The suggestion prescribes detailed timing and cognitive load strategies witho...

Top 5 Task-Critical Suggestions

# ID Quality Title
1 22 1.000 Apply Proposal Formatting for April 25 Deadline
2 21 0.970 Pre-Verify Stipend Wallet Requirements
3 32 0.970 Access Core Summer of Bitcoin Guide Resources
4 39 0.970 Prepare Technical Proposal Components in Advance
5 50 0.970 Apply Matt Morehouse’s Qualitative Judging Criteria

Top 5 Quality-of-Life Suggestions

# ID Quality Title
1 16 0.970 Prepare Round 2 Pivot List by April 11
2 18 0.937 Apply 'High-Demand' Strategy for Round 1 Selection
3 48 0.937 Prepare for Round 2 Registration (April 12–14)
4 30 0.927 Project Selection Strategy: The '2-1-1' Method
5 31 0.920 Optimize Discord Monitoring with Keyword Alerts