Skip to content

Voice Gateway Guide

The voice gateway enables full-duplex phone conversations with AI. It supports two pipeline modes optimized for different use cases.

Pipeline Modes

Mode Latency Best For
Text 500-1000ms Custom voices, domain-specific STT, tool calling
Realtime 100-200ms Natural conversation, low latency requirements

Text Mode (STT → LLM → TTS)

Traditional pipeline that chains speech-to-text, language model, and text-to-speech:

Audio In → STT → Text → LLM → Text → TTS → Audio Out

Advantages:

  • Mix and match providers (Deepgram STT + Claude + ElevenLabs TTS)
  • Use custom/cloned voices
  • Full tool/function calling support
  • Domain-specific STT models

Trade-offs:

  • Higher latency (500-1000ms round-trip)
  • Multiple API calls per turn

Realtime Mode (Voice-to-Voice)

Native voice-to-voice using OpenAI Realtime API or Gemini Live:

Audio In → Realtime API → Audio Out

Advantages:

  • Ultra-low latency (~100-200ms)
  • Natural conversation flow
  • Single API connection

Trade-offs:

  • Limited to OpenAI or Google voices
  • Provider-specific tool calling

Quick Start

Text Mode

package main

import (
    "context"
    "os"

    "github.com/plexusone/omni-twilio/omnivoice/gateway"
)

func main() {
    gw, err := gateway.New(gateway.Config{
        // Twilio credentials
        AccountSID:  os.Getenv("TWILIO_ACCOUNT_SID"),
        AuthToken:   os.Getenv("TWILIO_AUTH_TOKEN"),
        PhoneNumber: "+15551234567",

        // Server
        PublicURL:  "https://your-server.com",
        ListenAddr: ":8080",

        // STT provider
        STTProvider: "deepgram",
        STTAPIKey:   os.Getenv("DEEPGRAM_API_KEY"),

        // LLM provider
        LLMProvider:     "anthropic",
        LLMAPIKey:       os.Getenv("ANTHROPIC_API_KEY"),
        LLMModel:        "claude-sonnet-4-20250514",
        LLMSystemPrompt: "You are a helpful voice assistant. Keep responses brief.",

        // TTS provider
        TTSProvider: "elevenlabs",
        TTSAPIKey:   os.Getenv("ELEVENLABS_API_KEY"),
        TTSVoiceID:  "21m00Tcm4TlvDq8ikWAM", // Rachel
    })
    if err != nil {
        panic(err)
    }

    // Handle incoming calls
    gw.OnCall(func(call *gateway.CallInfo) error {
        return nil // Accept all calls
    })

    // Start gateway
    ctx := context.Background()
    gw.Start(ctx)
}

Realtime Mode

package main

import (
    "context"
    "os"

    "github.com/plexusone/omni-twilio/omnivoice/gateway"
    coregateway "github.com/plexusone/omnivoice-core/gateway"
    openaiRealtime "github.com/plexusone/omni-openai/omnivoice/realtime"
)

func main() {
    gw, err := gateway.New(gateway.Config{
        // Twilio credentials
        AccountSID:  os.Getenv("TWILIO_ACCOUNT_SID"),
        AuthToken:   os.Getenv("TWILIO_AUTH_TOKEN"),
        PhoneNumber: "+15551234567",

        // Server
        PublicURL:  "https://your-server.com",
        ListenAddr: ":8080",

        // Enable realtime mode
        Mode:             coregateway.PipelineModeRealtime,
        RealtimeProvider: openaiRealtime.NewFactory(),
        RealtimeConfig: &coregateway.RealtimeConfig{
            Provider:     "openai",
            APIKey:       os.Getenv("OPENAI_API_KEY"),
            Model:        "gpt-4o-realtime-preview-2024-12-17",
            Voice:        "alloy",
            Instructions: "You are a helpful voice assistant. Keep responses brief.",
        },
    })
    if err != nil {
        panic(err)
    }

    gw.OnCall(func(call *gateway.CallInfo) error {
        return nil
    })

    ctx := context.Background()
    gw.Start(ctx)
}

Provider Configuration

Text Mode Providers

STT Providers

Provider Environment Variable Model
Deepgram DEEPGRAM_API_KEY nova-2 (default)
OpenAI OPENAI_API_KEY whisper-1
ElevenLabs ELEVENLABS_API_KEY scribe
// Deepgram (recommended for real-time)
STTProvider: "deepgram",
STTAPIKey:   os.Getenv("DEEPGRAM_API_KEY"),
STTModel:    "nova-2",
STTLanguage: "en",

// OpenAI Whisper
STTProvider: "openai",
STTAPIKey:   os.Getenv("OPENAI_API_KEY"),

TTS Providers

Provider Environment Variable Voices
ElevenLabs ELEVENLABS_API_KEY 100+ voices, cloning
OpenAI OPENAI_API_KEY alloy, echo, fable, onyx, nova, shimmer
Deepgram DEEPGRAM_API_KEY aura voices
// ElevenLabs (recommended for quality)
TTSProvider: "elevenlabs",
TTSAPIKey:   os.Getenv("ELEVENLABS_API_KEY"),
TTSVoiceID:  "21m00Tcm4TlvDq8ikWAM", // Rachel

// OpenAI
TTSProvider: "openai",
TTSAPIKey:   os.Getenv("OPENAI_API_KEY"),
TTSVoiceID:  "alloy",

LLM Providers

Provider Environment Variable Models
Anthropic ANTHROPIC_API_KEY claude-sonnet-4, claude-opus-4
OpenAI OPENAI_API_KEY gpt-4o, gpt-4-turbo
// Anthropic Claude
LLMProvider:     "anthropic",
LLMAPIKey:       os.Getenv("ANTHROPIC_API_KEY"),
LLMModel:        "claude-sonnet-4-20250514",
LLMSystemPrompt: "You are a helpful assistant...",

// OpenAI
LLMProvider:     "openai",
LLMAPIKey:       os.Getenv("OPENAI_API_KEY"),
LLMModel:        "gpt-4o",

LLM Provider Injection (Advanced)

By default, the gateway uses thin providers from omnillm-core (native HTTP implementations). For applications that need thick providers (official SDKs with additional features), you can inject a pre-configured LLM client.

Thin vs Thick Providers:

Type Package Features
Thin omnillm-core Basic chat completion, minimal dependencies
Thick omnillm Official SDKs, streaming, better error handling, full API support

Injecting a Thick Provider:

import (
    "github.com/plexusone/omni-twilio/omnivoice/gateway"
    "github.com/plexusone/omnillm"
)

// Create omnillm client (imports thick providers automatically)
llmClient, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {
            Provider: omnillm.ProviderNameAnthropic,
            APIKey:   os.Getenv("ANTHROPIC_API_KEY"),
        },
    },
})
if err != nil {
    panic(err)
}

// Inject the provider into gateway config
gw, err := gateway.New(gateway.Config{
    // ... Twilio config ...

    // Inject thick provider (LLMProvider/LLMAPIKey are ignored when LLMClient is set)
    LLMClient:       llmClient.Provider(),
    LLMModel:        "claude-sonnet-4-20250514",
    LLMSystemPrompt: "You are a helpful voice assistant.",
})

When to use thick providers:

  • You need streaming responses
  • You want official SDK error handling and retries
  • You need features like conversation memory or caching from omnillm
  • You're building a larger application that already uses omnillm

Realtime Providers

OpenAI Realtime (~100ms latency)

import openaiRealtime "github.com/plexusone/omni-openai/omnivoice/realtime"

Mode:             coregateway.PipelineModeRealtime,
RealtimeProvider: openaiRealtime.NewFactory(),
RealtimeConfig: &coregateway.RealtimeConfig{
    Provider:     "openai",
    APIKey:       os.Getenv("OPENAI_API_KEY"),
    Model:        "gpt-4o-realtime-preview-2024-12-17",
    Voice:        "alloy", // alloy, echo, fable, onyx, nova, shimmer
    Instructions: "You are a helpful assistant.",
},

Gemini Live (~200ms latency)

import googleRealtime "github.com/plexusone/omni-google/omnivoice/realtime"

Mode:             coregateway.PipelineModeRealtime,
RealtimeProvider: googleRealtime.NewFactory(),
RealtimeConfig: &coregateway.RealtimeConfig{
    Provider:     "gemini",
    APIKey:       os.Getenv("GOOGLE_API_KEY"),
    Model:        "gemini-2.0-flash-live",
    Voice:        "Puck", // Puck, Charon, Kore, Fenrir, Aoede
    Instructions: "You are a helpful assistant.",
},

Session Management

Handling Calls

// Accept/reject incoming calls
gw.OnCall(func(call *gateway.CallInfo) error {
    log.Printf("Call from %s to %s", call.From, call.To)

    // Return nil to accept, error to reject
    if isBlocked(call.From) {
        return fmt.Errorf("blocked caller")
    }
    return nil
})

Session Events

session, ok := gw.GetSession(callSID)
if !ok {
    return
}

for event := range session.Events() {
    switch event.Type {
    case gateway.EventSessionStarted:
        log.Println("Call connected")

    case gateway.EventUserTranscript:
        log.Printf("User: %s", event.Data)

    case gateway.EventAgentTranscript:
        log.Printf("Agent: %s", event.Data)

    case gateway.EventToolCall:
        log.Printf("Tool called: %v", event.Data)

    case gateway.EventInterruption:
        log.Println("User interrupted")

    case gateway.EventSessionEnded:
        log.Println("Call ended")
        return

    case gateway.EventError:
        log.Printf("Error: %v", event.Error)
    }
}

Session Metrics

metrics := session.Metrics()
log.Printf("Duration: %s", metrics.Duration)
log.Printf("User turns: %d", metrics.UserTurnCount)
log.Printf("Agent turns: %d", metrics.AgentTurnCount)
log.Printf("Interruptions: %d", metrics.InterruptionCount)

Tool Calling (Text Mode)

Define tools for the LLM to call during conversation:

gw, err := gateway.New(gateway.Config{
    // ... other config ...

    Tools: []gateway.ToolDefinition{
        {
            Name:        "get_weather",
            Description: "Get current weather for a location",
            Parameters: map[string]any{
                "type": "object",
                "properties": map[string]any{
                    "location": map[string]any{
                        "type":        "string",
                        "description": "City name",
                    },
                },
                "required": []string{"location"},
            },
        },
        {
            Name:        "schedule_appointment",
            Description: "Schedule an appointment",
            Parameters: map[string]any{
                "type": "object",
                "properties": map[string]any{
                    "date": map[string]any{"type": "string"},
                    "time": map[string]any{"type": "string"},
                },
                "required": []string{"date", "time"},
            },
        },
    },

    ToolHandlers: map[string]gateway.ToolHandler{
        "get_weather": func(ctx context.Context, args map[string]any) (string, error) {
            location := args["location"].(string)
            // Call weather API
            return fmt.Sprintf("It's 72°F and sunny in %s", location), nil
        },
        "schedule_appointment": func(ctx context.Context, args map[string]any) (string, error) {
            // Schedule the appointment
            return "Appointment scheduled successfully", nil
        },
    },
})

Greeting Message

Play a greeting when the call connects:

gw, err := gateway.New(gateway.Config{
    // ... other config ...

    Greeting: "Hello! Thank you for calling. How can I help you today?",
})

Session Limits

Configure session behavior:

gw, err := gateway.New(gateway.Config{
    // ... other config ...

    // Maximum call duration (default: 30 minutes)
    MaxSessionDuration: 15 * time.Minute,

    // Interruption handling
    // "immediate" - Stop speaking immediately when user talks
    // "after_sentence" - Finish current sentence
    // "disabled" - Don't allow interruptions
    InterruptionMode: "immediate",
})

Choosing a Pipeline Mode

Use Text Mode When:

  • You need custom or cloned voices (ElevenLabs)
  • You require domain-specific STT (medical, legal terminology)
  • You need complex tool calling workflows
  • Latency of 500-1000ms is acceptable
  • You want to mix best-of-breed providers

Use Realtime Mode When:

  • Low latency is critical (<300ms)
  • Natural conversation flow is important
  • OpenAI or Google voices are acceptable
  • You want simpler architecture (single API)

Environment Variables

Text Mode

# Twilio
export TWILIO_ACCOUNT_SID="ACxxxxxxxx"
export TWILIO_AUTH_TOKEN="your-token"

# STT (choose one)
export DEEPGRAM_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

# LLM (choose one)
export ANTHROPIC_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

# TTS (choose one)
export ELEVENLABS_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

Realtime Mode

# Twilio
export TWILIO_ACCOUNT_SID="ACxxxxxxxx"
export TWILIO_AUTH_TOKEN="your-token"

# Realtime provider (choose one)
export OPENAI_API_KEY="your-key"
export GOOGLE_API_KEY="your-key"

Registry Pattern

Instead of direct construction with gateway.New(), you can use omnivoice-core's provider registry for automatic discovery. This is useful when building applications that work with multiple gateway providers.

import (
    omnivoice "github.com/plexusone/omnivoice-core"
    "github.com/plexusone/omnivoice-core/registry"
    _ "github.com/plexusone/omni-twilio/omnivoice/gateway" // Auto-register
)

// Get gateway via registry
gw, err := omnivoice.GetGatewayProvider("twilio",
    registry.WithAPIKey(os.Getenv("TWILIO_AUTH_TOKEN")),
    registry.WithExtension("accountSID", os.Getenv("TWILIO_ACCOUNT_SID")),
    registry.WithExtension("phoneNumber", "+15551234567"),
    registry.WithExtension("publicURL", "https://your-server.com"),
    registry.WithExtension("listenAddr", ":8080"),
)

Type-Safe Options

For provider-specific configuration, use the type-safe option functions:

import "github.com/plexusone/omni-twilio/omnivoice/gateway"

gw, err := omnivoice.GetGatewayProvider("twilio",
    registry.WithAPIKey(os.Getenv("TWILIO_AUTH_TOKEN")),
    // Type-safe options
    gateway.WithTools([]gateway.ToolDefinition{...}),
    gateway.WithToolHandlers(handlers),
    gateway.WithLLMClient(myLLMProvider),
    gateway.WithRealtimeProviderFactory(openaiRealtime.NewFactory()),
    gateway.WithRealtimeConfig(&coregateway.RealtimeConfig{...}),
)

Accessing the Underlying Gateway

The registry returns a wrapper type. To access the full Twilio Gateway API:

wrapper := gw.(*gateway.GatewayWrapper)
twilioGw := wrapper.Gateway()

// Now use Twilio-specific methods
twilioGw.OnCall(func(call *gateway.CallInfo) error {
    return nil
})

Next Steps