Skip to content

Lifecycle States

Root causes and remediations progress through defined lifecycle states. Understanding these states is essential for tracking issue resolution and measuring operational health.

RootCause Lifecycle

stateDiagram-v2
    [*] --> new: Signal mapped
    new --> active: Confirmed
    active --> mitigating: Fix in progress
    mitigating --> validating: Fix deployed
    validating --> stable: Efficacy confirmed
    validating --> regressed: Signals resurged
    stable --> resolved: No recurrence
    regressed --> active: Re-investigate
    stable --> regressed: New signals

States

State Description Typical Actions
new Recently identified, not yet confirmed Review, assign owner, confirm validity
active Confirmed issue requiring attention Prioritize, plan remediation
mitigating Remediation in progress Implement fix, review PRs
validating Fix deployed, measuring effectiveness Monitor signal rate, collect validation signals
stable Fix appears effective, observing Continue monitoring for regression
regressed Issue has recurred after stabilization Re-analyze root cause, revise fix
resolved Issue confirmed fixed, no recurrence Archive, close tickets

State Transitions

new → active

  • Owner assigned
  • Impact assessed
  • Confirmed as valid issue (not noise)

active → mitigating

  • Remediation created and linked
  • Implementation started
  • Code changes in progress

mitigating → validating

  • Fix deployed to production
  • Monitoring period started
  • Validation signals being collected

validating → stable

  • Signal rate dropped significantly (e.g., >80%)
  • No new related signals for defined period
  • Efficacy metric meets threshold

validating → regressed

  • Signals continued or increased after deployment
  • Efficacy below threshold
  • New symptoms appeared

stable → resolved

  • Extended period without recurrence
  • Related tickets closed
  • Documentation updated

stable → regressed / regressed → active

  • New signals matching pattern
  • Recurrence detected
  • Re-investigation required

Remediation Lifecycle

stateDiagram-v2
    [*] --> planned: Created
    planned --> in_progress: Work started
    in_progress --> deployed: Released
    deployed --> validated: Efficacy confirmed
    deployed --> failed: Did not work
    validated --> [*]
    failed --> planned: Revise approach

States

State Description
planned Remediation identified and scoped
in_progress Implementation underway
deployed Released to production
validated Confirmed effective via validation signals
failed Did not achieve desired outcome

Signal Status

Signals have simpler states reflecting their processing status:

Status Description
raw Received but not yet processed
mapped Successfully mapped to a root cause
duplicate Identified as duplicate of existing signal
noise Determined to be false positive or irrelevant

Best Practices

Use Status Transitions for Metrics

Track time spent in each state to identify bottlenecks:

  • Mean Time to Mitigate (new → mitigating)
  • Mean Time to Validate (mitigating → stable)
  • Regression Rate (stable → regressed)

Don't Skip States

Always progress through states sequentially. Skipping states (e.g., new → resolved) loses important tracking data and breaks metrics.

Automate Transitions

Where possible, automate state transitions based on signals:

  • new → active: When owner assigned via ticket system
  • mitigating → validating: When PR merged and deployed
  • validating → stable: When signal rate drops below threshold