Changelog¶
All notable changes to structured-evaluation are documented here.
For the canonical changelog, see CHANGELOG.md in the repository.
v0.4.0 - 2026-05-23¶
Breaking Changes: Switch from numeric scores to categorical pass/partial/fail values.
Highlights¶
- Switch from numeric scores to categorical pass/partial/fail values
- New terminal and markdown renderers for evaluation reports
Changed¶
- Breaking:
CategoryScorerenamed toCategoryResultwithScorefield (pass/partial/fail) - Breaking:
ScoreStatusrenamed toScoreValuewith values pass/partial/fail - Breaking: Removed
WeightedScorefromEvaluationReport - Added
CategoryCountstoDecision(pass/partial/fail counts) - Updated detailed renderer to display category counts
Added¶
render/terminalpackage with ANSI-colored output and UTF8 iconsrender/markdownpackage for Markdown report generation- CLI
terminalandmarkdownrender format options
v0.3.1 - 2026-05-19¶
Build¶
- Updated CI workflows to use shared workflows
- Renamed workflow files to standard filenames
Dependencies¶
- Bump github.com/invopop/jsonschema from 0.13.0 to 0.14.0
v0.3.0 - 2026-03-01¶
Changed¶
- Breaking: Module path changed from
github.com/agentplexus/structured-evaluationtogithub.com/plexusone/structured-evaluation
v0.2.0 - 2026-01-26¶
Highlights¶
- Rubric definitions with score anchors
- Judge metadata tracking
- Pairwise comparison mode
- Multi-judge aggregation
Added¶
- Rubric and ScoreAnchor types
- JudgeMetadata for tracking LLM configuration
- PairwiseComparison for relative evaluation
- MultiJudgeResult for aggregating evaluations
v0.1.0 - 2026-01-26¶
Highlights¶
- Initial release
- LLM-as-Judge evaluation reports
- GO/NO-GO summary reports
- DAG-based aggregation
Added¶
evaluationpackage with core typessummarypackage for deterministic checkscombinepackage with DAG aggregationrenderpackages for terminal and box outputsevaluationCLI tool