Likert Scales¶

Structured-evaluation supports Likert scales (1-5 numeric ratings) alongside categorical scoring. This enables human comparison studies and inter-rater reliability analysis.

When to Use Likert Scales¶

Use Case	Recommended Scale
LLM-as-Judge decisions	Categorical (pass/partial/fail)
Human comparison studies	Likert (1-5)
Inter-rater reliability	Likert (1-5)
Calibration analysis	Likert (1-5)
Simple automation	Categorical

Creating Likert Categories¶

import "github.com/plexusone/structured-evaluation/rubric"

// Using standard 1-5 anchors
cat := rubric.NewCategory("quality", "Content Quality", "Overall quality assessment").
    WithLikert5(rubric.StandardLikert5Anchors())

// Custom anchors
cat := rubric.NewCategory("clarity", "Clarity", "How clear is the writing").
    WithLikert5([]rubric.LikertAnchor{
        {Value: 5, Label: "Crystal Clear", Description: "No ambiguity, easy to understand"},
        {Value: 4, Label: "Clear", Description: "Minor clarifications needed"},
        {Value: 3, Label: "Adequate", Description: "Understandable with effort"},
        {Value: 2, Label: "Unclear", Description: "Significant confusion"},
        {Value: 1, Label: "Incomprehensible", Description: "Cannot understand"},
    })

Standard Anchors¶

The StandardLikert5Anchors() helper provides:

Score	Label	Description
5	Excellent	Exceeds all expectations
4	Good	Meets expectations with minor improvements possible
3	Adequate	Meets minimum requirements
2	Needs Improvement	Below expectations
1	Poor	Does not meet requirements

Automatic Categorical Mapping¶

Likert scores are automatically mapped to categorical for decisions:

Likert Score	Categorical
4-5	Pass
3	Partial
1-2	Fail

Thresholds are configurable:

passThreshold := 4
partialThreshold := 3
config := &rubric.LikertConfig{
    Min:              1,
    Max:              5,
    PassThreshold:    &passThreshold,
    PartialThreshold: &partialThreshold,
}
cat.WithLikert(config)

Recording Results¶

From Likert Score¶

// Categorical score is derived automatically
result := rubric.NewCategoryResultFromLikert(
    "quality",   // category ID
    4,           // Likert score
    config,      // LikertConfig
    "Good overall quality with minor issues",
)
// result.Score = ScorePass
// result.NumericScore = 4.0

Dual Scores¶

// Record both categorical and numeric
result := rubric.NewCategoryResultWithNumeric(
    "quality",
    rubric.ScorePass,
    4.5,  // numeric for human comparison
    "Reasoning here",
)

Adding Numeric to Existing¶

result := rubric.NewCategoryResult("quality", rubric.ScorePass, "Good").
    SetNumericScore(4.5)

Accessing Numeric Scores¶

if result.HasNumericScore() {
    score := result.GetNumericScore()
    fmt.Printf("Numeric: %.1f\n", score)
}

Validation¶

Rubric validation checks Likert configurations:

rs := rubric.NewRubricSet("test", "Test", "1.0")
rs.AddCategory(*cat)

issues := rs.Validate()
// Checks:
// - LikertConfig is present for likert scale type
// - Min < Max

Best Practices¶

Use categorical for automation - Pass/partial/fail is cleaner for decision-making
Use Likert for calibration - Compare LLM ratings with human ground truth
Include anchors - Detailed anchor descriptions improve rater consistency
Store both when needed - Use NumericScore field for analysis while keeping categorical for decisions