v0.4.0 Release Notes¶
Release Date: 2026-05-23
Overview¶
v0.4.0 introduces categorical scoring as a replacement for numeric scores. This is a breaking change that better aligns with how LLM judges naturally assess quality.
Breaking Changes¶
CategoryScore → CategoryResult¶
The CategoryScore type has been renamed to CategoryResult with a different structure:
Before (v0.3.x):
type CategoryScore struct {
Category string `json:"category"`
Weight float64 `json:"weight"`
Score float64 `json:"score"`
MaxScore float64 `json:"max_score"`
Status ScoreStatus `json:"status"`
Justification string `json:"justification"`
}
After (v0.4.0):
type CategoryResult struct {
Category string `json:"category"`
Score ScoreValue `json:"score"` // "pass", "partial", "fail"
Reasoning string `json:"reasoning"`
}
ScoreStatus → ScoreValue¶
Before:
const (
ScoreStatusPass ScoreStatus = "pass"
ScoreStatusWarn ScoreStatus = "warn"
ScoreStatusFail ScoreStatus = "fail"
)
After:
const (
ScorePass ScoreValue = "pass"
ScorePartial ScoreValue = "partial"
ScoreFail ScoreValue = "fail"
)
Removed WeightedScore¶
The WeightedScore field has been removed from EvaluationReport. Category counts are now used instead:
// Before
fmt.Printf("Score: %.1f/10\n", report.WeightedScore)
// After
counts := report.Decision.CategoryCounts
fmt.Printf("Results: %d pass, %d partial, %d fail\n",
counts.Pass, counts.Partial, counts.Fail)
Migration Guide¶
Updating Category Creation¶
Before:
report.AddCategory(evaluation.NewCategoryScore(
"problem_definition",
0.20, // weight
8.5, // score
"Clear problem statement",
))
After:
report.AddCategory(evaluation.CategoryResult{
Category: "problem_definition",
Score: evaluation.ScorePass,
Reasoning: "Clear problem statement with measurable goals",
})
Updating Decision Checks¶
Before:
After:
if report.Decision.Passed {
// Passed
}
// Or check category counts
if report.Decision.CategoryCounts.Fail == 0 {
// No failing categories
}
Updating Renderers¶
The render/detailed package has been updated. If you were using numeric scores in custom rendering, update to use categorical values:
// Before
fmt.Printf("%.1f/%.0f", cs.Score, cs.MaxScore)
// After
fmt.Printf("%s", cr.Score) // "pass", "partial", or "fail"
New Features¶
Terminal Renderer¶
New ANSI-colored terminal renderer with UTF8 icons:
import "github.com/plexusone/structured-evaluation/render/terminal"
renderer := terminal.New(os.Stdout)
renderer.Render(&report)
Markdown Renderer¶
New Markdown renderer for documentation:
import "github.com/plexusone/structured-evaluation/render/markdown"
renderer := markdown.New(os.Stdout)
renderer.Render(&report)
CLI Formats¶
New CLI render formats:
Why Categorical Scoring?¶
- Clearer semantics - "pass" is unambiguous; "7.5" requires interpretation
- Better LLM alignment - LLMs naturally reason in categories
- Simpler aggregation - Majority voting vs. weighted averages
- Reduced bias - No artificial precision (7.2 vs 7.3)
Full Changelog¶
See the CHANGELOG for complete details.