Skip to content

Persona Evaluation — condition8__all_sources__sample500_seed99

Field Value
Persona Nimesh Kulatunga
Judge model openai/gpt-4.1-mini
Embed model text-embedding-3-small
Rubric draws (k) 3
Total suggestions 50
Pipeline mode hall_pass=anti_gt (active=True)

Bucket Distribution

Bucket Count % of total
Task Critical 2 4.0%
Quality Of Life 10 20.0%
Noise 24 48.0%
Hallucinated 14 28.0%

Set-Level Diversity Metrics

Metric Value Interpretation
DPP log-det -7.3036 Higher = more diverse + high-quality set
Cluster coverage 1.000 Fraction of BGT clusters with a task-critical hit
ILAD 0.6627 Mean pairwise distance; higher = more diverse
Redundancy rate 0.000 Fraction of near-duplicate suggestions (cos > 0.9)

Composite Score

Component Weight Value
DPP set score (normalised) 0.5
Cluster coverage 0.3 1.000
Mean quality (non-hallucinated) 0.2
Hallucination penalty alpha=0.5 x 0.8600

Composite score: 0.5600

Hallucination Summary

Filter: anti_gt (active). Flagged 14 / 50 suggestions (rate 28.0%). Composite hallucination penalty: 0.8600.

Filtered Hallucinations

ID Failure mode Title Reasoning
4 passive_viewing_as_active_interest Prepare for the 'Proof of Work' Initial Task The suggestion prescribes Bitcoin Core setup based on generic onboarding cont...
8 passive_viewing_as_active_interest Automate Bitcoin Dev Environment Setup Prescriptive environment setup advice assumes active Bitcoin development inte...
17 over_elaboration Use a Structured Project Proposal Template The suggestion over-elaborates by prescribing use of Bitcoin Dev Kit and stru...
18 over_elaboration Pre-emptive Setup of Bitcoin Core Regtest Envir... The suggestion fabricates specific technical requirements (lnprototest, Braid...
22 passive_viewing_as_active_interest Set Up Polar for Local Lightning Testing Recommending Polar for Lightning testing assumes active Lightning Network foc...
32 passive_viewing_as_active_interest Implement 'LDK-Sample' for Rapid Lightning Prot... The suggestion prescribes specific Lightning Dev Kit usage based on generic m...
33 passive_viewing_as_active_interest Automate Discord Milestone Tracking with Webhooks Automating Discord milestone tracking is a prescriptive workflow improvement ...
34 passive_viewing_as_active_interest Master Lightning Fuzzing Terminology Providing detailed fuzzing terminology assumes active user interest from pass...
35 over_elaboration Schedule Lightning Fuzzing Tasks for 04:00-07:0... Scheduling specific fuzzing tasks with detailed timing and tool recommendatio...
36 passive_viewing_as_active_interest Automate Discord Resource Archiving to Google D... Automating resource archiving based on monitoring a channel infers active wor...
37 over_elaboration Essential Bioinformatics Fuzzing Resources The suggestion fabricates a link between Bitcoin fuzzing and DNA research too...
38 over_elaboration Apply LibFuzzer to Bioinformatics Sequence Parsers Prescribes specific LibFuzzer code for DNA parsing based on unrelated Bitcoin...
49 source_context_confusion Apply Bitcoin Design Principles to DNA Visualiz... Applying Bitcoin design principles to DNA visualization reflects source/conte...
102 passive_viewing_as_active_interest Master Bitcoin Open-Source Terminology Suggestion prescribes Bitcoin Core terminology and resources based on generic...

Top 5 Task-Critical Suggestions

# ID Quality Title
1 107 0.970 Pre-empt Proposal Feedback with Edge Case Testing
2 119 0.870 Automate Summer of Bitcoin GitHub Monitoring

Top 5 Quality-of-Life Suggestions

# ID Quality Title
1 81 0.967 Implement Pre-Merge Conflict Checks
2 27 0.920 Sync Selection Timeline to Calendar with Buffer
3 25 0.903 Optimize Discord for Role-Based Opportunities
4 87 0.903 Standardize 'Merge Meeting' Agenda for guna_012_53896
5 30 0.870 Automate Email-to-Discord Alerts via Zapier