Skip to content

Release Notes: v0.10.0

Release Date: 2026-06-13

Highlights

  • Voice Gateway Interface: Provider-agnostic interfaces for PSTN and WebRTC voice communication
  • Session Persistence: Call state storage with in-memory and Redis backends
  • Barge-in Detection: User interruption handling during agent speech
  • WAV Codec Support: Audio format conversion for TTS provider compatibility

New Features

Voice Gateway Interface

The new gateway package provides provider-agnostic interfaces for voice communication:

import "github.com/plexusone/omnivoice-core/gateway"

// Gateway handles PSTN phone calls (Twilio, Telnyx, Vonage, Plivo)
type Gateway interface {
    Name() ProviderName
    Start(ctx context.Context) error
    Stop() error
    OnCall(handler CallHandler)
    MakeCall(ctx context.Context, to string) (Session, error)
    GetSession(callID string) (Session, bool)
    ListSessions() []Session
}

// WebRTCGateway handles browser/mobile voice (LiveKit, Daily)
type WebRTCGateway interface {
    Name() ProviderName
    Start(ctx context.Context) error
    Stop() error
    OnParticipantJoined(handler ParticipantHandler)
    JoinRoom(ctx context.Context, roomName string) error
    LeaveRoom() error
    GetSession(participantID string) (WebRTCSession, bool)
    ListSessions() []WebRTCSession
    GenerateClientToken(roomName, identity, displayName string) (string, error)
}

Session Events

The gateway emits events for call lifecycle management:

Event Type Description
SessionStarted Call connected
SessionEnded Call terminated
UserSpeechStart User began speaking
UserSpeechEnd User stopped speaking
UserTranscript STT transcription available
AgentThinking LLM processing input
AgentSpeechStart Agent began speaking
AgentSpeechEnd Agent stopped speaking
AgentTranscript Agent response text
ToolCall Function/tool invocation
Interruption User interrupted agent
Error Error occurred

Session Persistence

The storage package provides call state persistence for session recovery:

import "github.com/plexusone/omnivoice-core/storage"

// In-memory storage (default)
store := storage.NewMemoryStore()

// Redis storage (distributed)
store, err := storage.NewRedisStore("redis://localhost:6379",
    storage.WithTTL(24 * time.Hour),
    storage.WithPrefix("omnivoice:"),
)

// Save session state
err := store.Save(ctx, &storage.SessionState{
    ID:        callSID,
    Provider:  "twilio",
    Direction: "inbound",
    From:      "+14155551234",
    To:        "+14155556789",
    Status:    storage.StatusActive,
    History:   turns,
    Metrics:   metrics,
})

// Recover active sessions after restart
sessions, err := store.ListActive(ctx)
for _, id := range sessions {
    state, _ := store.Load(ctx, id)
    // Reconnect to active call...
}

SessionState Fields

Field Type Description
ID string Unique session identifier
CallID string Provider-specific call ID
Provider string Voice gateway provider name
Direction string "inbound" or "outbound"
From string Caller phone number
To string Called phone number
Status SessionStatus pending, active, ended, failed
History []Turn Conversation transcript
Metrics SessionMetrics Performance metrics
RecoveryData map[string]any Provider-specific recovery data

Barge-in Detection

The bargein package detects when users interrupt agent speech:

import "github.com/plexusone/omnivoice-core/bargein"

detector := bargein.New(bargein.Config{
    Mode:                bargein.ModeImmediate, // or ModeAfterSentence, ModeDisabled
    MinSpeechDurationMs: 200,  // Minimum speech to trigger interruption
    SilenceThresholdMs:  500,  // Silence before considering speech ended
})

// Connect to STT events and TTS pipeline
detector.AttachSTTEvents(sttEvents)
detector.AttachTTS(ttsPipeline)

// Handle interruptions
detector.OnInterrupt(func(event gateway.Event) {
    log.Println("User interrupted agent")
    // TTS automatically stopped
})

detector.Start(ctx)

Interruption Modes

Mode Behavior
ModeImmediate Stop TTS immediately when user speaks
ModeAfterSentence Wait for agent to finish current sentence
ModeDisabled Never interrupt agent speech

WAV Codec Support

The audio/codec package now includes WAV encoding for TTS provider compatibility:

import "github.com/plexusone/omnivoice-core/audio/codec"

// Convert raw μ-law audio to WAV format
// Required for providers like OpenAI Whisper that need WAV containers
wavData := codec.MulawToWAV(mulawAudio)

// WAV format: RIFF header + 8kHz mono μ-law data
// Compatible with most audio processing tools

API Reference

Gateway Package

Type Description
Gateway PSTN voice gateway interface
WebRTCGateway WebRTC voice gateway interface
Session Active PSTN call session
WebRTCSession Active WebRTC session
Event Session lifecycle event
EventType Event type constants
Turn Conversation turn (user/agent)
ToolCall LLM tool invocation
Metrics Performance metrics
CallInfo Incoming call information
CallHandler Incoming call handler function

Storage Package

Type Description
SessionStore Storage interface
MemoryStore In-process storage
RedisStore Redis-backed storage
SessionState Persisted session data
SessionStatus Status constants
Turn Conversation turn
SessionMetrics Aggregated metrics

Bargein Package

Type Description
Detector Barge-in detector
Config Detector configuration
InterruptionMode Mode constants

Codec Package (New)

Function Description
MulawToWAV(data []byte) []byte Convert μ-law to WAV format

Installation

go get github.com/plexusone/omnivoice-core@v0.10.0

Migration Guide

From v0.9.0

No breaking changes. New packages are additive:

  1. Update dependency:
go get github.com/plexusone/omnivoice-core@v0.10.0
  1. Import new packages as needed:
import (
    "github.com/plexusone/omnivoice-core/gateway"
    "github.com/plexusone/omnivoice-core/storage"
    "github.com/plexusone/omnivoice-core/bargein"
)
  1. The Twilio example has moved to the omni-twilio repository.

Dependencies

  • Added github.com/redis/go-redis/v9 for Redis session storage

Full Changelog

See CHANGELOG.md for the complete list of changes.