Persona Evaluation — condition3__email_only__sample500_seed99¶

Bucket Distribution¶

Metric	Value	Interpretation
DPP log-det	-21.1657	Higher = more diverse + high-quality set
Cluster coverage	0.000	Fraction of BGT clusters with a task-critical hit
ILAD	0.6872	Mean pairwise distance; higher = more diverse
Redundancy rate	0.100	Fraction of near-duplicate suggestions (cos > 0.9)

Component	Weight	Value
DPP set score (normalised)	0.5	—
Cluster coverage	0.3	0.000
Mean quality (non-hallucinated)	0.2	—
Hallucination penalty	alpha=0.5	x 0.9900

Composite score: 0.2786

Filter: anti_gt (active). Flagged 1 / 50 suggestions (rate 2.0%). Composite hallucination penalty: 0.9900.

ID	Failure mode	Title	Reasoning
14	over_elaboration	Elicit Onboarding Milestone Checklist	Suggestion invents specific onboarding steps and Customer.io triggers not sup...

#	ID	Quality	Title
1	32	0.690	Fix CSE Pipeline: Update GitHub Actions YAML

#	ID	Quality	Title
1	7	0.947	Master Elicit's 'Extract Data' for Literature Reviews
2	1	0.903	Master Elicit's 'Extract Data' for Literature Reviews
3	31	0.853	Sync Pipeline Maintenance with CSE Market Close
4	8	0.840	Use Semantic Scholar API for Citation Mapping
5	19	0.837	Automate Research-to-Trade Workflow with Elicit & IBKR