Release Notes: v0.12.0¶
Release Date: 2026-06-14
Highlights¶
- Realtime Provider Interface: Unified interface for native voice-to-voice LLMs (OpenAI Realtime, Gemini Live)
- Audio Format Conversion: Automatic conversion between telephony (mulaw 8kHz) and realtime provider formats (PCM16 16/24kHz)
- RealtimeBridge: Seamless bridge between telephony WebSockets and realtime providers
- Dual Pipeline Support: Choose between text (STT→LLM→TTS, ~500-1000ms) or realtime (voice-to-voice, ~100-200ms) modes
New Features¶
Realtime Provider Interface¶
The new realtime package provides a unified interface for voice-to-voice LLM providers:
import "github.com/plexusone/omnivoice-core/realtime"
// Provider interface for voice-to-voice processing
type Provider interface {
ProcessAudioStream(ctx context.Context, audioIn <-chan []byte, config ProcessConfig) (
audioCh <-chan AudioChunk, transcriptCh <-chan Transcript, err error)
Name() string
Close() error
}
Implementations:
github.com/plexusone/omni-openai/omnivoice/realtime— OpenAI Realtime APIgithub.com/plexusone/omni-google/omnivoice/realtime— Gemini Live API
Audio Format Package¶
Common audio formats for voice gateways:
import "github.com/plexusone/omnivoice-core/audio/format"
var (
format.Twilio // mulaw 8kHz mono (Twilio, Telnyx)
format.OpenAI // PCM16 24kHz mono (OpenAI Realtime)
format.GeminiInput // PCM16 16kHz mono (Gemini Live input)
format.GeminiOutput // PCM16 24kHz mono (Gemini Live output)
)
Audio Converter¶
Automatic audio format conversion:
import "github.com/plexusone/omnivoice-core/audio/converter"
// One-shot conversion
conv := converter.New()
pcm24k, err := conv.Convert(mulaw8k, format.Twilio, format.OpenAI)
// Convenience functions
pcm24k, _ := converter.TwilioToOpenAI(mulawAudio)
mulaw8k, _ := converter.OpenAIToTwilio(pcmAudio)
pcm16k, _ := converter.TwilioToGemini(mulawAudio)
mulaw8k, _ := converter.GeminiToTwilio(pcmAudio)
// Streaming conversion
sc := converter.NewStreamConverter(format.Twilio, format.OpenAI)
for chunk := range audioChunks {
converted, _ := sc.Convert(chunk)
}
RealtimeBridge¶
Bridges telephony WebSocket audio to realtime providers:
import (
"github.com/plexusone/omnivoice-core/gateway"
"github.com/plexusone/omnivoice-core/realtime"
)
// Create bridge for Twilio + OpenAI Realtime
bridge := gateway.NewRealtimeBridgeForTwilio(provider, realtime.ProcessConfig{
Instructions: "You are a helpful voice assistant...",
Voice: "alloy",
})
// Start processing
bridge.Start(ctx)
// Forward telephony audio to bridge
bridge.SendAudio(twilioAudio)
// Receive converted audio for telephony
for audio := range bridge.AudioOut() {
sendToTwilio(audio)
}
// Monitor events
for event := range bridge.Events() {
switch event.Type {
case gateway.EventUserTranscript:
fmt.Println("User:", event.Data)
case gateway.EventAgentTranscript:
fmt.Println("Agent:", event.Data)
}
}
Pipeline Mode Selection¶
Gateway now supports dual pipeline modes:
import "github.com/plexusone/omnivoice-core/gateway"
// Text pipeline: STT → LLM → TTS (~500-1000ms latency)
cfg := gateway.Config{
Mode: gateway.PipelineModeText,
// STT, LLM, TTS configuration...
}
// Realtime pipeline: voice-to-voice (~100-200ms latency)
cfg := gateway.Config{
Mode: gateway.PipelineModeRealtime,
RealtimeProvider: realtimeFactory,
RealtimeConfig: &gateway.RealtimeConfig{
Provider: "openai",
APIKey: os.Getenv("OPENAI_API_KEY"),
Model: "gpt-4o-realtime-preview-2024-12-17",
Voice: "alloy",
Instructions: "You are a helpful voice assistant...",
},
}
Multi-Provider Client¶
Client with automatic fallback between realtime providers:
import "github.com/plexusone/omnivoice-core/realtime"
client := realtime.NewClient(primaryProvider, fallbackProvider1, fallbackProvider2)
audioCh, transcriptCh, err := client.ProcessAudioStream(ctx, audioIn, config)
API Reference¶
realtime Package¶
| Type | Description |
|---|---|
Provider |
Interface for voice-to-voice LLM providers |
ProcessConfig |
Configuration for audio processing (instructions, voice, functions) |
AudioChunk |
Audio output from provider with finality flag |
Transcript |
Text transcript with input/final flags |
FunctionDeclaration |
Tool/function declaration for model |
Client |
Multi-provider client with fallback support |
audio/format Package¶
| Constant | Encoding | Sample Rate | Channels |
|---|---|---|---|
Twilio |
mulaw | 8000 Hz | 1 |
Telnyx |
mulaw | 8000 Hz | 1 |
Vonage |
PCM16 | 16000 Hz | 1 |
OpenAI |
PCM16 | 24000 Hz | 1 |
GeminiInput |
PCM16 | 16000 Hz | 1 |
GeminiOutput |
PCM16 | 24000 Hz | 1 |
audio/converter Package¶
| Function | Description |
|---|---|
New() |
Create new converter |
Convert(audio, from, to) |
Convert between formats |
TwilioToOpenAI(audio) |
mulaw 8kHz → PCM16 24kHz |
OpenAIToTwilio(audio) |
PCM16 24kHz → mulaw 8kHz |
TwilioToGemini(audio) |
mulaw 8kHz → PCM16 16kHz |
GeminiToTwilio(audio) |
PCM16 24kHz → mulaw 8kHz |
NewStreamConverter(from, to) |
Create streaming converter |
gateway Package Additions¶
| Type | Description |
|---|---|
PipelineMode |
"text" or "realtime" |
RealtimeConfig |
Realtime provider configuration |
RealtimeProviderFactory |
Factory for creating realtime providers |
RealtimeBridge |
Bridge between telephony and realtime provider |
AudioFormat |
Audio encoding/sample rate descriptor |
AudioConverter |
Interface for format conversion |
Installation¶
Migration Guide¶
From v0.11.0¶
No breaking changes. The realtime package is additive.
- Update dependency:
- To use realtime mode, import the realtime provider:
import (
"github.com/plexusone/omnivoice-core/gateway"
openaiRealtime "github.com/plexusone/omni-openai/omnivoice/realtime"
)
factory := openaiRealtime.NewFactory()
cfg := gateway.Config{
Mode: gateway.PipelineModeRealtime,
RealtimeProvider: factory,
RealtimeConfig: &gateway.RealtimeConfig{
Provider: "openai",
APIKey: os.Getenv("OPENAI_API_KEY"),
Voice: "alloy",
},
}
- Existing text pipeline code continues to work unchanged.
Full Changelog¶
See CHANGELOG.md for the complete list of changes.