| Field |
Value |
| Persona |
Nimesh Kulatunga |
| Judge model |
openai/gpt-4.1-mini |
| Embed model |
text-embedding-3-small |
| Rubric draws (k) |
3 |
| Total suggestions |
50 |
| Pipeline mode |
hall_pass=anti_gt (active=True) |
Bucket Distribution
| Bucket |
Count |
% of total |
| Task Critical |
14 |
28.0% |
| Quality Of Life |
22 |
44.0% |
| Noise |
10 |
20.0% |
| Hallucinated |
4 |
8.0% |
Set-Level Diversity Metrics
| Metric |
Value |
Interpretation |
| DPP log-det |
-29.9868 |
Higher = more diverse + high-quality set |
| Cluster coverage |
1.000 |
Fraction of BGT clusters with a task-critical hit |
| ILAD |
0.7387 |
Mean pairwise distance; higher = more diverse |
| Redundancy rate |
0.000 |
Fraction of near-duplicate suggestions (cos > 0.9) |
Composite Score
| Component |
Weight |
Value |
| DPP set score (normalised) |
0.5 |
— |
| Cluster coverage |
0.3 |
1.000 |
| Mean quality (non-hallucinated) |
0.2 |
— |
| Hallucination penalty |
alpha=0.5 |
x 0.9600 |
Composite score: 0.6030
Hallucination Summary
Filter: anti_gt (active). Flagged 4 / 50 suggestions (rate 8.0%). Composite hallucination penalty: 0.9600.
Filtered Hallucinations
| ID |
Failure mode |
Title |
Reasoning |
| 14 |
over_elaboration |
Master CS3243: The 1982 CMU Vending Machine Cas... |
Fabricates the 'Finger' protocol and ARPANET connection details not supported... |
| 21 |
passive_viewing_as_active_interest |
Target High-Impact Bitcoin Repositories for 2026 |
The suggestion prescribes specific libraries and projects (LDK, Stratum V2, F... |
| 23 |
over_elaboration |
Optimize Cursor/Sonnet 4.6 for Bitcoin Protocol... |
The suggestion over-elaborates by prescribing a detailed Sonnet prompt with s... |
| 40 |
empty_fallback |
Schedule Deep-Work for GitHub Issue Contributions |
Suggestion is a generic productivity prompt lacking actionable value, matchin... |
Top 5 Task-Critical Suggestions
| # |
ID |
Quality |
Title |
| 1 |
16 |
1.000 |
Automate WhatsApp Export Parsing with Python |
| 2 |
22 |
1.000 |
Automate WhatsApp Ingestion for GUMBO via Python Script |
| 3 |
36 |
1.000 |
Map Python Skills to Nostr NIP-01 Implementation |
| 4 |
41 |
1.000 |
Implement RPKI-to-ASmap Validation Logic |
| 5 |
2 |
0.980 |
Configure Cursor 'Rules for AI' for GUMBO Context |
Top 5 Quality-of-Life Suggestions
| # |
ID |
Quality |
Title |
| 1 |
7 |
1.000 |
Automate DB Isolation with venv-aware Shell Script |
| 2 |
28 |
1.000 |
Implement Local Polling for WhatsApp Export Parsing |
| 3 |
39 |
1.000 |
Optimize RPKI Validation with Routinator Filters |
| 4 |
42 |
0.980 |
Automate BGP Data Fetching via RIS-Live |
| 5 |
31 |
0.970 |
Implement Erlay (PR #21515) to Mitigate Eclipse Attacks |