v0.1.0 Release Notes¶
Release Date: 2026-01-26
Overview¶
Initial release of structured-evaluation, a reusable evaluation framework for LLM-as-Judge and multi-agent workflows.
Highlights¶
- Evaluation Report type for detailed LLM-as-Judge assessments with weighted category scores
- Summary Report type for GO/NO-GO deterministic checks
- InfoSec severity levels (Critical, High, Medium, Low, Info) with blocking thresholds
- DAG-based aggregation for multi-agent coordination using topological sort
- Terminal renderers with box-format and detailed output
- CLI tool (
sevaluation) for rendering, validation, and pass/fail checks
Packages¶
evaluation/¶
Core types for LLM-as-Judge evaluations:
EvaluationReport- Main report type with metadata, categories, findings, decisionCategoryScore- Weighted scores (0-10) with pass/warn/fail statusFinding- Issues with severity, recommendations, and ownershipSeverity- InfoSec levels with blocking rulesPassCriteria- Configurable approval thresholdsDecision- Pass/fail/conditional/human_review outcomes
summary/¶
Types for deterministic GO/NO-GO checks:
SummaryReport- Aggregated team resultsTeamSection- Agent/team outputs with dependenciesTaskResult- Individual check outcomesStatus- GO/WARN/NO-GO/SKIP with emoji icons
combine/¶
Multi-agent coordination:
SortByDAG()- Topological sort using Kahn's algorithmAggregateResults()- Combine agent outputsAggregateWithDAG()- Combine with explicit dependencies
render/¶
Terminal output:
box.Renderer- Box-format for summary reportsdetailed.TerminalRenderer- Detailed format for evaluation reports
schema/¶
JSON Schema support:
GenerateEvaluationSchema()- Generate from Go typesGenerateSummarySchema()- Generate from Go types- Embedded schemas via
//go:embed
CLI Commands¶
sevaluation render <file.json> --format=box|detailed|json
sevaluation check <file.json> # Exit 0=pass, 1=fail
sevaluation validate <file.json>
sevaluation schema generate -o <dir>
Pass Criteria¶
Default criteria for approval:
| Threshold | Value |
|---|---|
| Max Critical | 0 |
| Max High | 0 |
| Max Medium | Unlimited |
| Min Score | 7.0 |
Installation¶
Full Changelog¶
See the CHANGELOG for complete details.