Skip to content

v0.1.0 Release Notes

Release Date: 2026-01-26

Overview

Initial release of structured-evaluation, a reusable evaluation framework for LLM-as-Judge and multi-agent workflows.

Highlights

  • Evaluation Report type for detailed LLM-as-Judge assessments with weighted category scores
  • Summary Report type for GO/NO-GO deterministic checks
  • InfoSec severity levels (Critical, High, Medium, Low, Info) with blocking thresholds
  • DAG-based aggregation for multi-agent coordination using topological sort
  • Terminal renderers with box-format and detailed output
  • CLI tool (sevaluation) for rendering, validation, and pass/fail checks

Packages

evaluation/

Core types for LLM-as-Judge evaluations:

  • EvaluationReport - Main report type with metadata, categories, findings, decision
  • CategoryScore - Weighted scores (0-10) with pass/warn/fail status
  • Finding - Issues with severity, recommendations, and ownership
  • Severity - InfoSec levels with blocking rules
  • PassCriteria - Configurable approval thresholds
  • Decision - Pass/fail/conditional/human_review outcomes

summary/

Types for deterministic GO/NO-GO checks:

  • SummaryReport - Aggregated team results
  • TeamSection - Agent/team outputs with dependencies
  • TaskResult - Individual check outcomes
  • Status - GO/WARN/NO-GO/SKIP with emoji icons

combine/

Multi-agent coordination:

  • SortByDAG() - Topological sort using Kahn's algorithm
  • AggregateResults() - Combine agent outputs
  • AggregateWithDAG() - Combine with explicit dependencies

render/

Terminal output:

  • box.Renderer - Box-format for summary reports
  • detailed.TerminalRenderer - Detailed format for evaluation reports

schema/

JSON Schema support:

  • GenerateEvaluationSchema() - Generate from Go types
  • GenerateSummarySchema() - Generate from Go types
  • Embedded schemas via //go:embed

CLI Commands

sevaluation render <file.json> --format=box|detailed|json
sevaluation check <file.json>      # Exit 0=pass, 1=fail
sevaluation validate <file.json>
sevaluation schema generate -o <dir>

Pass Criteria

Default criteria for approval:

Threshold Value
Max Critical 0
Max High 0
Max Medium Unlimited
Min Score 7.0

Installation

go get github.com/plexusone/structured-evaluation@v0.1.0

Full Changelog

See the CHANGELOG for complete details.