Skip to content

Release Notes: v0.12.0

Release Date: 2026-06-14

Highlights

  • Realtime Provider Interface: Unified interface for native voice-to-voice LLMs (OpenAI Realtime, Gemini Live)
  • Audio Format Conversion: Automatic conversion between telephony (mulaw 8kHz) and realtime provider formats (PCM16 16/24kHz)
  • RealtimeBridge: Seamless bridge between telephony WebSockets and realtime providers
  • Dual Pipeline Support: Choose between text (STT→LLM→TTS, ~500-1000ms) or realtime (voice-to-voice, ~100-200ms) modes

New Features

Realtime Provider Interface

The new realtime package provides a unified interface for voice-to-voice LLM providers:

import "github.com/plexusone/omnivoice-core/realtime"

// Provider interface for voice-to-voice processing
type Provider interface {
    ProcessAudioStream(ctx context.Context, audioIn <-chan []byte, config ProcessConfig) (
        audioCh <-chan AudioChunk, transcriptCh <-chan Transcript, err error)
    Name() string
    Close() error
}

Implementations:

  • github.com/plexusone/omni-openai/omnivoice/realtime — OpenAI Realtime API
  • github.com/plexusone/omni-google/omnivoice/realtime — Gemini Live API

Audio Format Package

Common audio formats for voice gateways:

import "github.com/plexusone/omnivoice-core/audio/format"

var (
    format.Twilio       // mulaw 8kHz mono (Twilio, Telnyx)
    format.OpenAI       // PCM16 24kHz mono (OpenAI Realtime)
    format.GeminiInput  // PCM16 16kHz mono (Gemini Live input)
    format.GeminiOutput // PCM16 24kHz mono (Gemini Live output)
)

Audio Converter

Automatic audio format conversion:

import "github.com/plexusone/omnivoice-core/audio/converter"

// One-shot conversion
conv := converter.New()
pcm24k, err := conv.Convert(mulaw8k, format.Twilio, format.OpenAI)

// Convenience functions
pcm24k, _ := converter.TwilioToOpenAI(mulawAudio)
mulaw8k, _ := converter.OpenAIToTwilio(pcmAudio)
pcm16k, _ := converter.TwilioToGemini(mulawAudio)
mulaw8k, _ := converter.GeminiToTwilio(pcmAudio)

// Streaming conversion
sc := converter.NewStreamConverter(format.Twilio, format.OpenAI)
for chunk := range audioChunks {
    converted, _ := sc.Convert(chunk)
}

RealtimeBridge

Bridges telephony WebSocket audio to realtime providers:

import (
    "github.com/plexusone/omnivoice-core/gateway"
    "github.com/plexusone/omnivoice-core/realtime"
)

// Create bridge for Twilio + OpenAI Realtime
bridge := gateway.NewRealtimeBridgeForTwilio(provider, realtime.ProcessConfig{
    Instructions: "You are a helpful voice assistant...",
    Voice:        "alloy",
})

// Start processing
bridge.Start(ctx)

// Forward telephony audio to bridge
bridge.SendAudio(twilioAudio)

// Receive converted audio for telephony
for audio := range bridge.AudioOut() {
    sendToTwilio(audio)
}

// Monitor events
for event := range bridge.Events() {
    switch event.Type {
    case gateway.EventUserTranscript:
        fmt.Println("User:", event.Data)
    case gateway.EventAgentTranscript:
        fmt.Println("Agent:", event.Data)
    }
}

Pipeline Mode Selection

Gateway now supports dual pipeline modes:

import "github.com/plexusone/omnivoice-core/gateway"

// Text pipeline: STT → LLM → TTS (~500-1000ms latency)
cfg := gateway.Config{
    Mode: gateway.PipelineModeText,
    // STT, LLM, TTS configuration...
}

// Realtime pipeline: voice-to-voice (~100-200ms latency)
cfg := gateway.Config{
    Mode:             gateway.PipelineModeRealtime,
    RealtimeProvider: realtimeFactory,
    RealtimeConfig: &gateway.RealtimeConfig{
        Provider:     "openai",
        APIKey:       os.Getenv("OPENAI_API_KEY"),
        Model:        "gpt-4o-realtime-preview-2024-12-17",
        Voice:        "alloy",
        Instructions: "You are a helpful voice assistant...",
    },
}

Multi-Provider Client

Client with automatic fallback between realtime providers:

import "github.com/plexusone/omnivoice-core/realtime"

client := realtime.NewClient(primaryProvider, fallbackProvider1, fallbackProvider2)
audioCh, transcriptCh, err := client.ProcessAudioStream(ctx, audioIn, config)

API Reference

realtime Package

Type Description
Provider Interface for voice-to-voice LLM providers
ProcessConfig Configuration for audio processing (instructions, voice, functions)
AudioChunk Audio output from provider with finality flag
Transcript Text transcript with input/final flags
FunctionDeclaration Tool/function declaration for model
Client Multi-provider client with fallback support

audio/format Package

Constant Encoding Sample Rate Channels
Twilio mulaw 8000 Hz 1
Telnyx mulaw 8000 Hz 1
Vonage PCM16 16000 Hz 1
OpenAI PCM16 24000 Hz 1
GeminiInput PCM16 16000 Hz 1
GeminiOutput PCM16 24000 Hz 1

audio/converter Package

Function Description
New() Create new converter
Convert(audio, from, to) Convert between formats
TwilioToOpenAI(audio) mulaw 8kHz → PCM16 24kHz
OpenAIToTwilio(audio) PCM16 24kHz → mulaw 8kHz
TwilioToGemini(audio) mulaw 8kHz → PCM16 16kHz
GeminiToTwilio(audio) PCM16 24kHz → mulaw 8kHz
NewStreamConverter(from, to) Create streaming converter

gateway Package Additions

Type Description
PipelineMode "text" or "realtime"
RealtimeConfig Realtime provider configuration
RealtimeProviderFactory Factory for creating realtime providers
RealtimeBridge Bridge between telephony and realtime provider
AudioFormat Audio encoding/sample rate descriptor
AudioConverter Interface for format conversion

Installation

go get github.com/plexusone/omnivoice-core@v0.12.0

Migration Guide

From v0.11.0

No breaking changes. The realtime package is additive.

  1. Update dependency:
go get github.com/plexusone/omnivoice-core@v0.12.0
  1. To use realtime mode, import the realtime provider:
import (
    "github.com/plexusone/omnivoice-core/gateway"
    openaiRealtime "github.com/plexusone/omni-openai/omnivoice/realtime"
)

factory := openaiRealtime.NewFactory()
cfg := gateway.Config{
    Mode:             gateway.PipelineModeRealtime,
    RealtimeProvider: factory,
    RealtimeConfig: &gateway.RealtimeConfig{
        Provider: "openai",
        APIKey:   os.Getenv("OPENAI_API_KEY"),
        Voice:    "alloy",
    },
}
  1. Existing text pipeline code continues to work unchanged.

Full Changelog

See CHANGELOG.md for the complete list of changes.