Agent Experience (AX) Integration Case Study¶

Executive Summary¶

This case study documents the integration of Agent Experience (AX) principles into opik-go, an LLM observability SDK. The integration enables AI agents to reliably trace their own behavior with machine-readable error handling, explicit retry policies, and pre-flight validation.

Key Results:

19 domain-specific error codes defined
201 operations mapped with retry policies
67 operations have required field definitions
7 capability types including streaming and evaluation

Background¶

The Observability Challenge¶

AI agents need to observe their own behavior to learn and improve. This creates a unique challenge: the observability system itself must be agent-friendly.

Aspect	Human Developer	AI Agent
Error handling	Read logs, debug	Programmatic recovery
Missing traces	Manually investigate	Auto-create resources
Evaluation	Review dashboards	Run programmatically
Retries	Intuitive judgment	Explicit policies needed

The Project¶

opik-go is a Go SDK for Comet ML's Opik observability platform:

201 API endpoints covering traces, spans, datasets, experiments, and evaluations
14,820 line OpenAPI specification
LLM evaluation framework with heuristic and model-based scorers
Streaming support for large result sets
Used by AI agents to observe and improve their own behavior

The Problem¶

When tracing fails, agents face ambiguous errors:

{
  "status": 404,
  "message": "Not found"
}

Questions the agent cannot answer:

What wasn't found? — Trace? Span? Dataset? Project?
Should it retry? — Will it help or create duplicates?
How to recover? — Create the resource? Use a different one?
Is it permanent? — Or a transient failure?

Before AX Integration¶

trace, err := client.GetTrace(ctx, traceID)
if err != nil {
    if apiErr, ok := err.(*APIError); ok {
        if apiErr.StatusCode == 404 {
            // Which resource is missing?
            // The trace? The project? The workspace?
            // No way to determine programmatically
        }
    }
    return nil, err
}

Solution: AX Integration¶

Error Code Design¶

19 domain-specific error codes organized by category:

Category	Error Codes	HTTP Status
not_found	TRACE_NOT_FOUND, SPAN_NOT_FOUND, DATASET_NOT_FOUND, EXPERIMENT_NOT_FOUND, PROMPT_NOT_FOUND, PROJECT_NOT_FOUND, FEEDBACK_NOT_FOUND, ATTACHMENT_NOT_FOUND, WORKSPACE_NOT_FOUND, EVALUATOR_NOT_FOUND, ALERT_NOT_FOUND, QUEUE_NOT_FOUND, DASHBOARD_NOT_FOUND	404
auth	UNAUTHORIZED, FORBIDDEN	401, 403
validation	INVALID_INPUT	400
conflict	CONFLICT	409
rate_limit	RATE_LIMITED	429
server	INTERNAL_ERROR	500

Retry Policy Mapping¶

All 201 operations mapped with retry safety:

var RetryPolicy = map[string]bool{
    // Safe to retry (GET operations)
    "getTraceById":      true,
    "findTraces":        true,
    "getSpanById":       true,
    "findDatasets":      true,
    "streamExperiments": true,

    // Not safe (mutations)
    "createTrace":       false,
    "createSpan":        false,
    "createExperiment":  false,
    "deleteTraceById":   false,
    "evaluateTraces":    false,
}

Distribution:

Category	Count	Retryable
GET (read)	78	Yes
POST (create)	62	No
PUT/PATCH (update)	31	No
DELETE	30	No

Required Fields Extraction¶

67 operations have required field definitions:

var RequiredFields = map[string][]string{
    "createTrace":       {"name"},
    "createSpan":        {"trace_id", "name"},
    "createExperiment":  {"dataset_name", "name"},
    "createDataset":     {"name"},
    "createPrompt":      {"name", "template"},
    "evaluateTraces":    {"trace_ids", "evaluator_ids"},
    "evaluateSpans":     {"span_ids", "evaluator_ids"},
    // ... 60 more
}

Capability Mapping¶

7 capability types for observability operations:

const (
    CapRead      Capability = "read"      // Data retrieval
    CapWrite     Capability = "write"     // Data creation/modification
    CapDelete    Capability = "delete"    // Data removal
    CapAdmin     Capability = "admin"     // Administrative operations
    CapStream    Capability = "stream"    // Streaming responses
    CapEvaluate  Capability = "evaluate"  // LLM evaluation
    CapAnalytics Capability = "analytics" // Metrics and BI
)

Results¶

Error Handling Improvement¶

Before:

trace, err := client.GetTrace(ctx, traceID)
if err != nil {
    // Generic error handling
    log.Printf("Error: %v", err)
    return nil, err
}

After:

trace, err := client.GetTrace(ctx, traceID)
if err != nil {
    code, ok := opik.GetAXErrorCode(err)
    if !ok {
        return nil, err
    }

    switch code {
    case ax.ErrTraceNotFound:
        // Create the trace first
        return client.CreateTrace(ctx, &Trace{ID: traceID, Name: "auto-created"})

    case ax.ErrProjectNotFound:
        // Create the project, then retry
        client.CreateProject(ctx, projectName)
        return client.GetTrace(ctx, traceID)

    case ax.ErrUnauthorized:
        // Re-authenticate
        return nil, ErrNeedsAuth

    case ax.ErrRateLimited:
        // Back off and retry
        time.Sleep(time.Second)
        return client.GetTrace(ctx, traceID)
    }

    return nil, err
}

Self-Healing Tracing Pattern¶

func (a *Agent) recordAction(ctx context.Context, action Action) error {
    trace := &Trace{
        Name:  action.Name,
        Input: action.Input,
    }

    err := a.client.CreateTrace(ctx, trace)
    if err == nil {
        return nil
    }

    // Self-healing based on AX metadata
    info := opik.GetAXErrorInfo(err)
    if info == nil {
        return err
    }

    switch info.Category {
    case "not_found":
        // Create missing resource
        if code, _ := opik.GetAXErrorCode(err); code == ax.ErrProjectNotFound {
            a.client.CreateProject(ctx, a.projectName)
            return a.client.CreateTrace(ctx, trace)
        }

    case "rate_limit":
        // Exponential backoff (retryable)
        if info.Retryable {
            time.Sleep(time.Second * 2)
            return a.recordAction(ctx, action)
        }

    case "conflict":
        // Resource exists, fetch it instead
        existing, _ := a.client.GetTrace(ctx, trace.ID)
        return a.updateTrace(ctx, existing, action)
    }

    return err
}

Pre-flight Validation¶

func validateRequest(operationID string, req interface{}) error {
    // Extract present fields via reflection or manual mapping
    present := extractPresentFields(req)

    if msg := ax.ValidateFields(operationID, present); msg != "" {
        return fmt.Errorf("validation failed: %s", msg)
    }
    return nil
}

// Usage
func (c *Client) CreateExperiment(ctx context.Context, req *ExperimentRequest) error {
    if err := validateRequest("createExperiment", req); err != nil {
        return err // Fail fast without API call
    }
    return c.api.CreateExperiment(ctx, req)
}

Capability-Based Discovery¶

// Find operations that support streaming
streamOps := ax.GetOperationsByCapability(ax.CapStream)
// ["streamDatasetItems", "streamExperimentItems", "streamExperiments"]

// Check if evaluation is available
if ax.HasCapability("evaluateTraces", ax.CapEvaluate) {
    // Run automatic quality evaluation
    scores, _ := client.EvaluateTraces(ctx, traceIDs, evaluatorIDs)
}

// Find analytics operations for dashboards
analyticsOps := ax.GetOperationsByCapability(ax.CapAnalytics)
// ["getProjectMetrics", "getProjectStats", "costsSummary", ...]

Metrics¶

Code Changes¶

Component	Files	Lines
ax package	6 new files	~950 lines
errors.go	1 modified	~80 lines
Total	7 files	~1,030 lines

Coverage¶

Metadata Type	Count	Coverage
Error codes	19	Domain-complete
Retry policies	201	100% of operations
Required fields	67	33% (mutation operations)
Capabilities	~100	Key operations

Test Results¶

$ go test -v ./ax/...
=== RUN   TestIsErrorCode
--- PASS: TestIsErrorCode (0.00s)
=== RUN   TestContainsErrorCode
--- PASS: TestContainsErrorCode (0.00s)
=== RUN   TestGetErrorInfo
--- PASS: TestGetErrorInfo (0.00s)
=== RUN   TestErrorCategoryHelpers
--- PASS: TestErrorCategoryHelpers (0.00s)
=== RUN   TestIsRetryable
--- PASS: TestIsRetryable (0.00s)
=== RUN   TestRetryableCount
--- PASS: TestRetryableCount (0.00s)
=== RUN   TestGetRequiredFields
--- PASS: TestGetRequiredFields (0.00s)
=== RUN   TestCapabilities
--- PASS: TestCapabilities (0.00s)
...
PASS
ok      github.com/plexusone/opik-go/ax

Key Learnings¶

1. Observability Needs Precision¶

Generic "not found" errors are insufficient when agents trace themselves. Each resource type needs its own error code:

TRACE_NOT_FOUND — The trace doesn't exist
PROJECT_NOT_FOUND — The project doesn't exist (create it first)
SPAN_NOT_FOUND — The span doesn't exist (but trace might)

2. Self-Healing Patterns are Essential¶

Agents that observe themselves must handle their own tracing failures:

// Bad: Agent stops observing itself on first error
// Good: Agent recovers and continues observing

if code == ax.ErrProjectNotFound {
    createProject()
    retryTrace()
}

3. Domain-Specific Capabilities Matter¶

Observability has unique capabilities that generic CRUD doesn't capture:

CapEvaluate — LLM quality evaluation
CapStream — Large result streaming
CapAnalytics — Metrics and BI operations

4. Retry Policies Prevent Data Corruption¶

Observability data is append-only. Retrying creates risks:

createTrace (not retryable) — Would create duplicate traces
evaluateTraces (not retryable) — Would run evaluation twice
getTraceById (retryable) — Safe to retry reads

5. HTTP Status is Not Enough¶

Status 404 could mean:

Trace not found
Span not found
Project not found
Workspace not found
Any of 13 other "not found" conditions

AX error codes disambiguate completely.

Comparison with elevenlabs-go¶

Aspect	elevenlabs-go	opik-go
Domain	Voice generation	LLM observability
Endpoints	204	201
Error codes	9 (API discovered)	19 (domain defined)
Retry policies	236	201
Required fields	72	67
Special capabilities	-	Stream, Evaluate, Analytics
Self-healing	Media errors	Tracing errors

Both SDKs benefit from AX, but the specific error codes and capabilities differ by domain.

Future Work¶

API Discovery¶

Run ax-spec discovery against the real Opik API to find additional error codes not documented in the spec.

Idempotency Support¶

Add x-ax-idempotent extension support for safe retry of create operations with idempotency keys.

Batch Operation Handling¶

Define patterns for partial success in batch operations (some items succeed, some fail).

Evaluation Metadata¶

Expose evaluator capabilities (what metrics they produce, what inputs they need).

Conclusion¶

The AX integration transforms opik-go from a basic observability SDK to an agent-friendly one:

Aspect	Before	After
Error handling	HTTP status parsing	19 typed error codes
Retry decisions	Hardcoded or missing	201 operations mapped
Validation	Runtime API errors	Pre-flight validation
Capabilities	Unknown	Domain-specific discovery
Agent behavior	Fragile tracing	Self-healing observability

For AI agents that need to observe themselves, reliable observability is foundational. AX makes that reliability achievable.