Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, this project adheres to Semantic Versioning, commits follow Conventional Commits, and this changelog is generated by Structured Changelog.

Unreleased ¶

v0.14.0 - 2026-06-15¶

Highlights¶

Gateway provider registry for voice gateways (Twilio, Telnyx)
Realtime provider registry for native voice-to-voice (OpenAI Realtime, Gemini Live)
Side-effect registration via init() for provider discovery

Added¶

RegisterGatewayProvider() for registering voice gateway providers (609dc45)
GetGatewayProvider() for retrieving registered gateway providers (609dc45)
ListGatewayProviders() for discovering available gateway providers (609dc45)
HasGatewayProvider() for checking gateway registration (609dc45)
RegisterRealtimeProvider() for registering native voice-to-voice providers (609dc45)
GetRealtimeProvider() for retrieving registered realtime providers (609dc45)
ListRealtimeProviders() for discovering available realtime providers (609dc45)
HasRealtimeProvider() for checking realtime registration (609dc45)
registry.Gateway interface for gateway provider implementations (609dc45)
registry.RealtimeProvider interface for realtime provider implementations (609dc45)
registry.GatewayProviderFactory and registry.RealtimeProviderFactory types (609dc45)
WithListener() option for pre-configured gateway listeners (609dc45)
WithPublicURL() option for gateway webhook callbacks (609dc45)
WithListenAddr() option for gateway server addresses (609dc45)
WithConnectionID() option for Telnyx connection IDs (609dc45)
WithVoice() option for realtime provider voice selection (609dc45)
WithModel() option for realtime provider model selection (609dc45)
WithInstructions() option for realtime provider system prompts (609dc45)

v0.13.0 - 2026-06-15¶

Highlights¶

Typed Encoding system for audio format handling
Encoding normalization handles pcm16, ulaw, wav, and other aliases
IsRawEncoding() for detecting formats that need explicit parameters

Added¶

format.Encoding type with constants: Linear16, MuLaw, ALaw, MP3, Opus, FLAC, AAC, Speex, WebM (48d4fe4)
Encoding.Normalize() method for canonical encoding names with alias mapping (48d4fe4)
Encoding.IsRaw() method to detect raw audio formats (48d4fe4)
IsRawEncoding(string) convenience function (48d4fe4)
WAV normalization to Linear16 (WAV is a container for raw PCM) (5451f73)
Speex and WebM encoding constants (5451f73)

Changed¶

audio/converter uses format.Encoding type for codec selection (48d4fe4)
Predefined AudioFormat vars use encoding constants instead of strings (48d4fe4)

Tests¶

Comprehensive tests for Encoding.Normalize() with all aliases (48d4fe4)
Tests for IsRawEncoding() with raw and container formats (48d4fe4)

v0.12.1 - 2026-06-14¶

Highlights¶

Internal refactoring to reduce code duplication
Generic provider.Client[T] for multi-provider management

Added¶

format.PCM16_24kHz and format.PCM16_16kHz aliases for provider-agnostic usage (33b946a)

v0.12.0 - 2026-06-14¶

Highlights¶

Realtime Provider interface for native voice-to-voice LLMs (OpenAI Realtime, Gemini Live)
Audio format conversion between telephony (mulaw 8kHz) and realtime providers (PCM16 16/24kHz)
RealtimeBridge for seamless telephony-to-realtime provider integration
Dual pipeline modes: text (STT→LLM→TTS) or realtime (voice-to-voice)

Added¶

realtime package with Provider interface for voice-to-voice LLM providers (55c873a)
ProcessConfig with instructions, voice, functions, and callbacks (55c873a)
AudioChunk and Transcript types for provider output (55c873a)
FunctionDeclaration for model function calling (55c873a)
Client for multi-provider management with automatic fallback (226915d)
audio/format package with common audio format constants (Twilio, OpenAI, Gemini) (4d1b2e2)
audio/converter package for audio format conversion (4e4d1b6)
TwilioToOpenAI(), OpenAIToTwilio(), TwilioToGemini(), GeminiToTwilio() convenience functions (4e4d1b6)
StreamConverter for efficient streaming audio conversion (4e4d1b6)
PipelineMode type with PipelineModeText and PipelineModeRealtime constants (235584e)
RealtimeConfig for configuring realtime providers (235584e)
RealtimeProviderFactory interface for creating realtime providers from config (235584e)
RealtimeBridge for bridging telephony WebSocket to realtime providers (df38137)
NewRealtimeBridgeForTwilio() and NewRealtimeBridgeForTwilioGemini() convenience constructors (df38137)

Tests¶

Unit tests for realtime.Provider interface and Client (c009f3d)
Tests for audio/converter with codec and sample rate conversion (4e4d1b6)
Tests for RealtimeBridge lifecycle and event handling (df38137)

v0.11.0 - 2026-06-13¶

Highlights¶

Global provider registry with priority-based registration
Architecture documentation for provider implementation

Added¶

Global RegisterSTTProvider(), RegisterTTSProvider(), RegisterCallSystemProvider() functions at package level (49c11e3)
GetSTTProvider(), GetTTSProvider(), GetCallSystemProvider() for retrieving registered providers (49c11e3)
ListSTTProviders(), ListTTSProviders(), ListCallSystemProviders() for discovering available providers (49c11e3)
HasSTTProvider(), HasTTSProvider(), HasCallSystemProvider() for checking registration (49c11e3)
GetSTTProviderPriority(), GetTTSProviderPriority(), GetCallSystemProviderPriority() for querying priority (49c11e3)
PriorityThin (0) and PriorityThick (10) constants for provider layering (49c11e3)

Documentation¶

CLAUDE.md with provider registry architecture and implementation guidelines (51f6883)
Dependency architecture diagram (omnivoice-core → providers → omnivoice) (51f6883)
Step-by-step guide for adding new providers (51f6883)

v0.10.0 - 2026-06-13¶

Highlights¶

Voice Gateway interface for provider-agnostic PSTN and WebRTC communication
Session persistence with in-memory and Redis backends
Barge-in detection for user interruption handling

Added¶

gateway package with Gateway interface for PSTN telephony providers (Twilio, Telnyx, Vonage, Plivo) (8f375df)
WebRTCGateway interface for browser/mobile voice (LiveKit, Daily) (8f375df)
Session and WebRTCSession interfaces for active call management (8f375df)
Event types for call lifecycle: SessionStarted, SessionEnded, UserSpeechStart, AgentSpeechStart, Interruption, etc. (8f375df)
storage package with SessionStore interface for call state persistence (b1c8cf9)
MemoryStore for in-process session storage (b1c8cf9)
RedisStore for distributed session storage with configurable TTL (b1c8cf9)
SessionState type with conversation history, metrics, and recovery data (b1c8cf9)
bargein package with Detector for user interruption detection (0ab0e4b)
Interruption modes: ModeImmediate, ModeAfterSentence, ModeDisabled (0ab0e4b)
Automatic TTS cancellation on user speech detection (0ab0e4b)
MulawToWAV() function in audio/codec for WAV container encoding (4a34433)

Changed¶

Twilio example moved to omni-twilio repository (26d2703)

Dependencies¶

Added github.com/redis/go-redis/v9 for Redis session storage (b1c8cf9)
Bump github.com/grokify/mogo from 0.74.3 to 0.74.6 (e22b0e9)

Documentation¶

Voice architecture guide with system design overview (b2b6215)
Barge-in detection usage guide (b2b6215)
Session storage configuration guide (b2b6215)

Tests¶

Storage interface conformance tests for MemoryStore and RedisStore (b1c8cf9)
Barge-in detector tests for interruption modes and timing (0ab0e4b)

v0.9.0 - 2026-05-02¶

Highlights¶

Canonical Transcript format for STT output with embedded JSON Schema
DurationMilliseconds type for JSON-friendly duration serialization

Added¶

Canonical Transcript type in stt package for standardized STT output across providers (af49b8b)
TranscriptSegment and TranscriptWord types with timing information (af49b8b)
TranscriptMetadata for provenance (provider, model, options) (af49b8b)
NewTranscript() constructor from TranscriptionResult (af49b8b)
LoadTranscript() and SaveJSON() for file I/O (af49b8b)
TotalDuration(), SegmentDuration(), WordDuration() convenience methods (af49b8b)
schema package with embedded TranscriptV1Schema JSON Schema (d794eb0)
Duration fields use duration.DurationMilliseconds for integer millisecond JSON serialization (af49b8b)

Dependencies¶

Added github.com/grokify/mogo v0.74.3 for DurationMilliseconds type (0f83fc8)

Tests¶

TestNewTranscript validates transcript creation from transcription results (92755a8)
TestTranscriptJSONRoundTrip validates JSON serialization and deserialization (92755a8)

v0.8.0 - 2026-04-03¶

Highlights¶

New resilience package for provider-agnostic error classification and retry logic
Smart fallback in TTS and STT clients - only switch providers on permanent errors
8 error categories for actionable error handling decisions

Added¶

resilience package with error categorization system (53a72b1)
ErrorCategory type with 8 categories: transient, rate_limit, validation, auth, not_found, server, quota, unknown (53a72b1)
ErrorInfo struct with category, retryability, code, message, suggestion, and retry-after hint (53a72b1)
ProviderError type wrapping provider errors with classification metadata (53a72b1)
ErrorClassifier interface for provider-specific error classification (53a72b1)
HTTPStatusClassifier for HTTP status code classification (53a72b1)
Retry() and RetryWithResult[T]() generic retry functions with configurable backoff (53a72b1)
RetryConfig with max attempts, backoff strategy, classifier, and OnRetry callback (53a72b1)
RetryError type for exhausted retry attempts with attempt count (53a72b1)
Backoff strategies: ExponentialBackoff, LinearBackoff, ConstantBackoff, NoBackoff (53a72b1)
DefaultBackoff() and DefaultRetryConfig() with sensible defaults (53a72b1)
IsProviderError() helper to extract ProviderError from error chain (53a72b1)
IsRetryable() helper to check error retryability (53a72b1)

Changed¶

TTS client now uses smart fallback - only switches providers on permanent (non-retryable) errors (53a72b1)
STT client now uses smart fallback - only switches providers on permanent (non-retryable) errors (53a72b1)
Fallback behavior is now determined by error classification, not error occurrence (53a72b1)

Documentation¶

Package documentation with examples in resilience/doc.go (53a72b1)

Tests¶

44 tests in resilience package covering retry logic, backoff strategies, error types, and categories (53a72b1)
4 TTS fallback behavior tests in tts/fallback_test.go (53a72b1)
4 STT fallback behavior tests in stt/fallback_test.go (53a72b1)

v0.7.0 - 2026-03-30¶

Highlights¶

Enhanced TTS mock fixtures with WAV audio generation
Provider-specific mocks for ElevenLabs, Deepgram, and OpenAI
Configurable mock behaviors for testing (latency, errors, timing)

Added¶

GenerateWAVFixture(durationMs, sampleRate) creates valid WAV files with proper RIFF headers for testing (17ef8b9)
GenerateShortWAV() and GenerateOneSecondWAV() convenience fixtures (17ef8b9)
MockProviderOption functional options pattern for configurable mock behavior (17ef8b9)
WithFixedDuration(ms) option for mocks returning fixed-length audio (17ef8b9)
WithRealisticTiming() option for text-length-proportional audio duration (17ef8b9)
WithError(err) option for error injection testing (17ef8b9)
WithLatency(duration) option for simulating network delays (17ef8b9)
WithFailAfterN(n, err) option for testing retry/failover logic (17ef8b9)
NewElevenLabsMock() with 3 preconfigured voices (Rachel, Bella, Antoni) (17ef8b9)
NewDeepgramMock() with 3 preconfigured voices (Asteria, Luna, Orion) (17ef8b9)
NewOpenAIMock() with 6 preconfigured voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer) (17ef8b9)
Common test errors: ErrMockRateLimit, ErrMockQuotaExceeded, ErrMockNetworkError, ErrMockInvalidVoice (17ef8b9)

Fixed¶

Fixed gofmt formatting and added G115 nolint annotations for bounded integer conversions in WAV generation (aa28bc8)

Documentation¶

Testing with Mocks guide covering mock providers, WAV fixtures, and common test patterns (1980a53)

Tests¶

Comprehensive tests for WAV fixture generation and header validation (17ef8b9)
Tests for all MockProviderOption behaviors including context cancellation (17ef8b9)
Provider conformance tests for ElevenLabs, Deepgram, and OpenAI mocks (17ef8b9)

v0.6.0 - 2026-03-21¶

Highlights¶

Observability package for voice instrumentation with hooks and events
Registry package for provider discovery and registration
CallSystem client with multi-provider failover support
SMS messaging support via SMSProvider interface

Added¶

observability package with VoiceEvent, VoiceObserver, TTSHook, and STTHook interfaces (dda0212)
Event types for call lifecycle (initiated, ringing, answered, ended, failed) (dda0212)
NoOpTTSHook and NoOpSTTHook for optional instrumentation (dda0212)
registry package with Registry interface for provider discovery (0ec559f)
Factory types for TTS, STT, and CallSystem providers (0ec559f)
callsystem.Client for managing multiple CallSystem providers with automatic failover (c12a16a)
SMSProvider interface and SMSMessage type for SMS support (d7d8c63)
Hook field in SynthesisConfig for TTS observability (8b3c38a)
Hook field in TranscriptionConfig for STT observability (8b3c38a)
Observer field in CallSystemConfig and CallOptions for call events (8b3c38a)
WithObserver CallOption for per-call observability (8b3c38a)
ObservableCallSystem interface combining CallSystem with Observable (8b3c38a)
SetHook() and Hook() methods on TTS and STT clients (8b3c38a)

Fixed¶

Resolved gosec G120 warnings in Twilio webhook example by adding http.MaxBytesReader (ac115b7)

v0.5.0 - 2026-02-28¶

Highlights¶

Organization rename from agentplexus to plexusone

Changed¶

Breaking: Go module path changed from github.com/agentplexus/omnivoice to github.com/plexusone/omnivoice-core (bf46b07)

v0.4.3 - 2026-02-15¶

Highlights¶

Comprehensive tests for English and Chinese subtitle generation

Tests¶

TestWordsToSubtitleCues_EnglishWordGrouping for word-based cue grouping (0ddb8bc)
TestWordsToSubtitleCues_ChineseCharacters for character-by-character tokenization (0ddb8bc)
TestWordsToSubtitleCues_MixedChineseEnglish for mixed language content (0ddb8bc)
TestWordsToSubtitleCues_LongChineseText for multi-cue splitting (0ddb8bc)

v0.4.2 - 2026-02-15¶

Highlights¶

Fixed subtitle word cutoff at line boundaries

Fixed¶

Subtitle cue chunking now checks actual wrapped line count instead of total character count, preventing words from being cut off when they would appear on a third line (a301897)

Tests¶

TestWordsToSubtitleCues_LineCountLimit verifies cues split correctly at line boundaries (a301897)

v0.4.1 - 2026-02-14¶

Highlights¶

STT conformance tests for TranscribeFile and TranscribeURL batch transcription methods

Tests¶

TranscribeFile conformance test for local file transcription (c441944)
TranscribeURL conformance test for remote URL transcription (c441944)

v0.4.0 - 2026-02-14¶

Highlights¶

Subtitle generation from STT transcription results
Extensible config maps for provider-specific settings

Added¶

Subtitle package for SRT/VTT generation from transcription results (17730a7)
Configurable max characters per line and lines per cue for subtitles (17730a7)
Word-level timestamp-based cue splitting (17730a7)
Extensions map in TranscriptionConfig for provider-specific STT settings (84c37f5)
Extensions map in SynthesisConfig for provider-specific TTS settings (665c3be)

Fixed¶

Subtitle wrapText no longer clips words when text exceeds line limit (63144bb)

Documentation¶

Voice cloning guide with recording tips and phonetically balanced text (1f0cdd8)

Tests¶

Call system provider conformance tests (MakeCall, ListCalls, OnIncomingCall) (9683ca2)
Transport provider conformance tests (Listen, Connect, Protocol) (9683ca2)

v0.3.0 - 2026-01-24¶

Highlights¶

Provider conformance test suites for TTS and STT implementations

Added¶

TTS provider conformance test suite (Synthesize, SynthesizeStream, SynthesizeFromReader) (e3705c7)
Mock TTS provider for self-testing with configurable audio format responses (e3705c7)
STT provider conformance test suite (Transcribe, TranscribeStream) (69cfd20)
Mock STT provider with streaming transcription simulation (69cfd20)

Fixed¶

MCP session and tool handlers now log Close() errors instead of discarding (6099072)

Documentation¶

Provider conformance testing TRD describing test categories and API design (58a9697)

Build¶

MIT LICENSE file (f124dcf)

v0.2.0 - 2026-01-18¶

Highlights¶

Audio codec package with PCM, mu-law, and a-law support for telephony
MCP server enabling Claude Code to make voice calls
Pipeline components connecting STT, TTS, and transport providers

Added¶

Audio codec package with PCM sample conversions (int16, float32, float64, bytes) (f64fe1e)
Mu-law encoding/decoding for Twilio Media Streams (f64fe1e)
A-law encoding/decoding for international telephony (f64fe1e)
Audio resampling, normalization, and analysis utilities (f64fe1e)
MCP server with stdio transport for voice interactions (721cbac)
Voice interaction tools: initiate_call, continue_call, speak_to_user, end_call (721cbac)
Session management for tracking active voice calls (721cbac)
TTSPipeline for streaming TTS output to transport connections (11c906d)
StreamingTTSPipeline for connecting streaming LLM text to TTS to transport (11c906d)
STTPipeline for streaming audio from transport to STT with event callbacks (11c906d)

Documentation¶

Voice integration PRD outlining goals, user stories, and success metrics (fd86611)
Twilio integration TRD detailing Media Streams architecture (fd86611)

Tests¶

Comprehensive unit tests for audio codec functions (mu-law, a-law, PCM) (f64fe1e)

v0.1.0 - 2025-12-28¶

Highlights¶

Initial OmniVoice voice abstraction layer for multi-provider telephony

Added¶

Voice abstraction layer with provider-agnostic interfaces (8a54bc2)
STT (speech-to-text) provider interface with streaming support (8a54bc2)
TTS (text-to-speech) provider interface with streaming support (8a54bc2)
Transport interface for audio connections (Twilio, Zoom, etc.) (8a54bc2)
Export CallOptions for provider implementations (7e1b52d)

Documentation¶

README with project overview and shields (4f298df)
Marp presentation for OmniVoice (d2d67cf)

Build¶

GitHub Actions CI workflow (4bad35d)
golangci-lint configuration and fixes (3693297)

Changelog¶

Unreleased¶

v0.14.0 - 2026-06-15¶

Highlights¶

Added¶

v0.13.0 - 2026-06-15¶

Highlights¶

Added¶

Changed¶

Tests¶

v0.12.1 - 2026-06-14¶

Highlights¶

Added¶

v0.12.0 - 2026-06-14¶

Highlights¶

Added¶

Tests¶

v0.11.0 - 2026-06-13¶

Highlights¶

Added¶

Documentation¶

v0.10.0 - 2026-06-13¶

Highlights¶

Added¶

Changed¶

Dependencies¶

Documentation¶

Tests¶

v0.9.0 - 2026-05-02¶

Highlights¶

Added¶

Dependencies¶

Tests¶

v0.8.0 - 2026-04-03¶

Highlights¶

Added¶

Changed¶

Documentation¶

Tests¶

v0.7.0 - 2026-03-30¶

Highlights¶

Added¶

Fixed¶

Documentation¶

Tests¶

v0.6.0 - 2026-03-21¶

Highlights¶

Added¶

Fixed¶

v0.5.0 - 2026-02-28¶

Highlights¶

Changed¶

v0.4.3 - 2026-02-15¶

Highlights¶

Tests¶

v0.4.2 - 2026-02-15¶

Highlights¶

Fixed¶

Tests¶

v0.4.1 - 2026-02-14¶

Highlights¶

Tests¶

v0.4.0 - 2026-02-14¶

Highlights¶

Added¶

Fixed¶

Documentation¶

Tests¶

v0.3.0 - 2026-01-24¶

Highlights¶

Added¶

Fixed¶

Documentation¶

Build¶

v0.2.0 - 2026-01-18¶

Highlights¶

Added¶

Documentation¶

Tests¶

v0.1.0 - 2025-12-28¶

Unreleased ¶