Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, this project adheres to Semantic Versioning, commits follow Conventional Commits, and this changelog is generated by Structured Changelog.
Unreleased¶
v0.14.0 - 2026-06-15¶
Highlights¶
- Gateway provider registry for voice gateways (Twilio, Telnyx)
- Realtime provider registry for native voice-to-voice (OpenAI Realtime, Gemini Live)
- Side-effect registration via init() for provider discovery
Added¶
RegisterGatewayProvider()for registering voice gateway providers (609dc45)GetGatewayProvider()for retrieving registered gateway providers (609dc45)ListGatewayProviders()for discovering available gateway providers (609dc45)HasGatewayProvider()for checking gateway registration (609dc45)RegisterRealtimeProvider()for registering native voice-to-voice providers (609dc45)GetRealtimeProvider()for retrieving registered realtime providers (609dc45)ListRealtimeProviders()for discovering available realtime providers (609dc45)HasRealtimeProvider()for checking realtime registration (609dc45)registry.Gatewayinterface for gateway provider implementations (609dc45)registry.RealtimeProviderinterface for realtime provider implementations (609dc45)registry.GatewayProviderFactoryandregistry.RealtimeProviderFactorytypes (609dc45)WithListener()option for pre-configured gateway listeners (609dc45)WithPublicURL()option for gateway webhook callbacks (609dc45)WithListenAddr()option for gateway server addresses (609dc45)WithConnectionID()option for Telnyx connection IDs (609dc45)WithVoice()option for realtime provider voice selection (609dc45)WithModel()option for realtime provider model selection (609dc45)WithInstructions()option for realtime provider system prompts (609dc45)
v0.13.0 - 2026-06-15¶
Highlights¶
- Typed Encoding system for audio format handling
- Encoding normalization handles pcm16, ulaw, wav, and other aliases
- IsRawEncoding() for detecting formats that need explicit parameters
Added¶
format.Encodingtype with constants:Linear16,MuLaw,ALaw,MP3,Opus,FLAC,AAC,Speex,WebM(48d4fe4)Encoding.Normalize()method for canonical encoding names with alias mapping (48d4fe4)Encoding.IsRaw()method to detect raw audio formats (48d4fe4)IsRawEncoding(string)convenience function (48d4fe4)- WAV normalization to Linear16 (WAV is a container for raw PCM) (
5451f73) SpeexandWebMencoding constants (5451f73)
Changed¶
audio/converterusesformat.Encodingtype for codec selection (48d4fe4)- Predefined
AudioFormatvars use encoding constants instead of strings (48d4fe4)
Tests¶
- Comprehensive tests for
Encoding.Normalize()with all aliases (48d4fe4) - Tests for
IsRawEncoding()with raw and container formats (48d4fe4)
v0.12.1 - 2026-06-14¶
Highlights¶
- Internal refactoring to reduce code duplication
- Generic provider.Client[T] for multi-provider management
Added¶
format.PCM16_24kHzandformat.PCM16_16kHzaliases for provider-agnostic usage (33b946a)
v0.12.0 - 2026-06-14¶
Highlights¶
- Realtime Provider interface for native voice-to-voice LLMs (OpenAI Realtime, Gemini Live)
- Audio format conversion between telephony (mulaw 8kHz) and realtime providers (PCM16 16/24kHz)
- RealtimeBridge for seamless telephony-to-realtime provider integration
- Dual pipeline modes: text (STT→LLM→TTS) or realtime (voice-to-voice)
Added¶
realtimepackage withProviderinterface for voice-to-voice LLM providers (55c873a)ProcessConfigwith instructions, voice, functions, and callbacks (55c873a)AudioChunkandTranscripttypes for provider output (55c873a)FunctionDeclarationfor model function calling (55c873a)Clientfor multi-provider management with automatic fallback (226915d)audio/formatpackage with common audio format constants (Twilio, OpenAI, Gemini) (4d1b2e2)audio/converterpackage for audio format conversion (4e4d1b6)TwilioToOpenAI(),OpenAIToTwilio(),TwilioToGemini(),GeminiToTwilio()convenience functions (4e4d1b6)StreamConverterfor efficient streaming audio conversion (4e4d1b6)PipelineModetype withPipelineModeTextandPipelineModeRealtimeconstants (235584e)RealtimeConfigfor configuring realtime providers (235584e)RealtimeProviderFactoryinterface for creating realtime providers from config (235584e)RealtimeBridgefor bridging telephony WebSocket to realtime providers (df38137)NewRealtimeBridgeForTwilio()andNewRealtimeBridgeForTwilioGemini()convenience constructors (df38137)
Tests¶
- Unit tests for
realtime.Providerinterface andClient(c009f3d) - Tests for
audio/converterwith codec and sample rate conversion (4e4d1b6) - Tests for
RealtimeBridgelifecycle and event handling (df38137)
v0.11.0 - 2026-06-13¶
Highlights¶
- Global provider registry with priority-based registration
- Architecture documentation for provider implementation
Added¶
- Global
RegisterSTTProvider(),RegisterTTSProvider(),RegisterCallSystemProvider()functions at package level (49c11e3) GetSTTProvider(),GetTTSProvider(),GetCallSystemProvider()for retrieving registered providers (49c11e3)ListSTTProviders(),ListTTSProviders(),ListCallSystemProviders()for discovering available providers (49c11e3)HasSTTProvider(),HasTTSProvider(),HasCallSystemProvider()for checking registration (49c11e3)GetSTTProviderPriority(),GetTTSProviderPriority(),GetCallSystemProviderPriority()for querying priority (49c11e3)PriorityThin(0) andPriorityThick(10) constants for provider layering (49c11e3)
Documentation¶
- CLAUDE.md with provider registry architecture and implementation guidelines (
51f6883) - Dependency architecture diagram (omnivoice-core → providers → omnivoice) (
51f6883) - Step-by-step guide for adding new providers (
51f6883)
v0.10.0 - 2026-06-13¶
Highlights¶
- Voice Gateway interface for provider-agnostic PSTN and WebRTC communication
- Session persistence with in-memory and Redis backends
- Barge-in detection for user interruption handling
Added¶
gatewaypackage withGatewayinterface for PSTN telephony providers (Twilio, Telnyx, Vonage, Plivo) (8f375df)WebRTCGatewayinterface for browser/mobile voice (LiveKit, Daily) (8f375df)SessionandWebRTCSessioninterfaces for active call management (8f375df)- Event types for call lifecycle:
SessionStarted,SessionEnded,UserSpeechStart,AgentSpeechStart,Interruption, etc. (8f375df) storagepackage withSessionStoreinterface for call state persistence (b1c8cf9)MemoryStorefor in-process session storage (b1c8cf9)RedisStorefor distributed session storage with configurable TTL (b1c8cf9)SessionStatetype with conversation history, metrics, and recovery data (b1c8cf9)bargeinpackage withDetectorfor user interruption detection (0ab0e4b)- Interruption modes:
ModeImmediate,ModeAfterSentence,ModeDisabled(0ab0e4b) - Automatic TTS cancellation on user speech detection (
0ab0e4b) MulawToWAV()function inaudio/codecfor WAV container encoding (4a34433)
Changed¶
- Twilio example moved to
omni-twiliorepository (26d2703)
Dependencies¶
- Added
github.com/redis/go-redis/v9for Redis session storage (b1c8cf9) - Bump
github.com/grokify/mogofrom 0.74.3 to 0.74.6 (e22b0e9)
Documentation¶
- Voice architecture guide with system design overview (
b2b6215) - Barge-in detection usage guide (
b2b6215) - Session storage configuration guide (
b2b6215)
Tests¶
- Storage interface conformance tests for MemoryStore and RedisStore (
b1c8cf9) - Barge-in detector tests for interruption modes and timing (
0ab0e4b)
v0.9.0 - 2026-05-02¶
Highlights¶
- Canonical Transcript format for STT output with embedded JSON Schema
- DurationMilliseconds type for JSON-friendly duration serialization
Added¶
- Canonical
Transcripttype insttpackage for standardized STT output across providers (af49b8b) TranscriptSegmentandTranscriptWordtypes with timing information (af49b8b)TranscriptMetadatafor provenance (provider, model, options) (af49b8b)NewTranscript()constructor fromTranscriptionResult(af49b8b)LoadTranscript()andSaveJSON()for file I/O (af49b8b)TotalDuration(),SegmentDuration(),WordDuration()convenience methods (af49b8b)schemapackage with embeddedTranscriptV1SchemaJSON Schema (d794eb0)- Duration fields use
duration.DurationMillisecondsfor integer millisecond JSON serialization (af49b8b)
Dependencies¶
- Added
github.com/grokify/mogo v0.74.3forDurationMillisecondstype (0f83fc8)
Tests¶
TestNewTranscriptvalidates transcript creation from transcription results (92755a8)TestTranscriptJSONRoundTripvalidates JSON serialization and deserialization (92755a8)
v0.8.0 - 2026-04-03¶
Highlights¶
- New
resiliencepackage for provider-agnostic error classification and retry logic - Smart fallback in TTS and STT clients - only switch providers on permanent errors
- 8 error categories for actionable error handling decisions
Added¶
resiliencepackage with error categorization system (53a72b1)ErrorCategorytype with 8 categories: transient, rate_limit, validation, auth, not_found, server, quota, unknown (53a72b1)ErrorInfostruct with category, retryability, code, message, suggestion, and retry-after hint (53a72b1)ProviderErrortype wrapping provider errors with classification metadata (53a72b1)ErrorClassifierinterface for provider-specific error classification (53a72b1)HTTPStatusClassifierfor HTTP status code classification (53a72b1)Retry()andRetryWithResult[T]()generic retry functions with configurable backoff (53a72b1)RetryConfigwith max attempts, backoff strategy, classifier, and OnRetry callback (53a72b1)RetryErrortype for exhausted retry attempts with attempt count (53a72b1)- Backoff strategies:
ExponentialBackoff,LinearBackoff,ConstantBackoff,NoBackoff(53a72b1) DefaultBackoff()andDefaultRetryConfig()with sensible defaults (53a72b1)IsProviderError()helper to extractProviderErrorfrom error chain (53a72b1)IsRetryable()helper to check error retryability (53a72b1)
Changed¶
- TTS client now uses smart fallback - only switches providers on permanent (non-retryable) errors (
53a72b1) - STT client now uses smart fallback - only switches providers on permanent (non-retryable) errors (
53a72b1) - Fallback behavior is now determined by error classification, not error occurrence (
53a72b1)
Documentation¶
- Package documentation with examples in
resilience/doc.go(53a72b1)
Tests¶
- 44 tests in resilience package covering retry logic, backoff strategies, error types, and categories (
53a72b1) - 4 TTS fallback behavior tests in
tts/fallback_test.go(53a72b1) - 4 STT fallback behavior tests in
stt/fallback_test.go(53a72b1)
v0.7.0 - 2026-03-30¶
Highlights¶
- Enhanced TTS mock fixtures with WAV audio generation
- Provider-specific mocks for ElevenLabs, Deepgram, and OpenAI
- Configurable mock behaviors for testing (latency, errors, timing)
Added¶
GenerateWAVFixture(durationMs, sampleRate)creates valid WAV files with proper RIFF headers for testing (17ef8b9)GenerateShortWAV()andGenerateOneSecondWAV()convenience fixtures (17ef8b9)MockProviderOptionfunctional options pattern for configurable mock behavior (17ef8b9)WithFixedDuration(ms)option for mocks returning fixed-length audio (17ef8b9)WithRealisticTiming()option for text-length-proportional audio duration (17ef8b9)WithError(err)option for error injection testing (17ef8b9)WithLatency(duration)option for simulating network delays (17ef8b9)WithFailAfterN(n, err)option for testing retry/failover logic (17ef8b9)NewElevenLabsMock()with 3 preconfigured voices (Rachel, Bella, Antoni) (17ef8b9)NewDeepgramMock()with 3 preconfigured voices (Asteria, Luna, Orion) (17ef8b9)NewOpenAIMock()with 6 preconfigured voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer) (17ef8b9)- Common test errors:
ErrMockRateLimit,ErrMockQuotaExceeded,ErrMockNetworkError,ErrMockInvalidVoice(17ef8b9)
Fixed¶
- Fixed gofmt formatting and added G115 nolint annotations for bounded integer conversions in WAV generation (
aa28bc8)
Documentation¶
- Testing with Mocks guide covering mock providers, WAV fixtures, and common test patterns (
1980a53)
Tests¶
- Comprehensive tests for WAV fixture generation and header validation (
17ef8b9) - Tests for all MockProviderOption behaviors including context cancellation (
17ef8b9) - Provider conformance tests for ElevenLabs, Deepgram, and OpenAI mocks (
17ef8b9)
v0.6.0 - 2026-03-21¶
Highlights¶
- Observability package for voice instrumentation with hooks and events
- Registry package for provider discovery and registration
- CallSystem client with multi-provider failover support
- SMS messaging support via SMSProvider interface
Added¶
observabilitypackage with VoiceEvent, VoiceObserver, TTSHook, and STTHook interfaces (dda0212)- Event types for call lifecycle (initiated, ringing, answered, ended, failed) (
dda0212) - NoOpTTSHook and NoOpSTTHook for optional instrumentation (
dda0212) registrypackage with Registry interface for provider discovery (0ec559f)- Factory types for TTS, STT, and CallSystem providers (
0ec559f) callsystem.Clientfor managing multiple CallSystem providers with automatic failover (c12a16a)SMSProviderinterface andSMSMessagetype for SMS support (d7d8c63)Hookfield inSynthesisConfigfor TTS observability (8b3c38a)Hookfield inTranscriptionConfigfor STT observability (8b3c38a)Observerfield inCallSystemConfigandCallOptionsfor call events (8b3c38a)WithObserverCallOption for per-call observability (8b3c38a)ObservableCallSysteminterface combining CallSystem with Observable (8b3c38a)SetHook()andHook()methods on TTS and STT clients (8b3c38a)
Fixed¶
- Resolved gosec G120 warnings in Twilio webhook example by adding http.MaxBytesReader (
ac115b7)
v0.5.0 - 2026-02-28¶
Highlights¶
- Organization rename from agentplexus to plexusone
Changed¶
- Breaking: Go module path changed from
github.com/agentplexus/omnivoicetogithub.com/plexusone/omnivoice-core(bf46b07)
v0.4.3 - 2026-02-15¶
Highlights¶
- Comprehensive tests for English and Chinese subtitle generation
Tests¶
TestWordsToSubtitleCues_EnglishWordGroupingfor word-based cue grouping (0ddb8bc)TestWordsToSubtitleCues_ChineseCharactersfor character-by-character tokenization (0ddb8bc)TestWordsToSubtitleCues_MixedChineseEnglishfor mixed language content (0ddb8bc)TestWordsToSubtitleCues_LongChineseTextfor multi-cue splitting (0ddb8bc)
v0.4.2 - 2026-02-15¶
Highlights¶
- Fixed subtitle word cutoff at line boundaries
Fixed¶
- Subtitle cue chunking now checks actual wrapped line count instead of total character count, preventing words from being cut off when they would appear on a third line (
a301897)
Tests¶
TestWordsToSubtitleCues_LineCountLimitverifies cues split correctly at line boundaries (a301897)
v0.4.1 - 2026-02-14¶
Highlights¶
- STT conformance tests for
TranscribeFileandTranscribeURLbatch transcription methods
Tests¶
TranscribeFileconformance test for local file transcription (c441944)TranscribeURLconformance test for remote URL transcription (c441944)
v0.4.0 - 2026-02-14¶
Highlights¶
- Subtitle generation from STT transcription results
- Extensible config maps for provider-specific settings
Added¶
- Subtitle package for SRT/VTT generation from transcription results (
17730a7) - Configurable max characters per line and lines per cue for subtitles (
17730a7) - Word-level timestamp-based cue splitting (
17730a7) Extensionsmap inTranscriptionConfigfor provider-specific STT settings (84c37f5)Extensionsmap inSynthesisConfigfor provider-specific TTS settings (665c3be)
Fixed¶
- Subtitle
wrapTextno longer clips words when text exceeds line limit (63144bb)
Documentation¶
- Voice cloning guide with recording tips and phonetically balanced text (
1f0cdd8)
Tests¶
- Call system provider conformance tests (
MakeCall,ListCalls,OnIncomingCall) (9683ca2) - Transport provider conformance tests (
Listen,Connect,Protocol) (9683ca2)
v0.3.0 - 2026-01-24¶
Highlights¶
- Provider conformance test suites for TTS and STT implementations
Added¶
- TTS provider conformance test suite (
Synthesize,SynthesizeStream,SynthesizeFromReader) (e3705c7) - Mock TTS provider for self-testing with configurable audio format responses (
e3705c7) - STT provider conformance test suite (
Transcribe,TranscribeStream) (69cfd20) - Mock STT provider with streaming transcription simulation (
69cfd20)
Fixed¶
- MCP session and tool handlers now log
Close()errors instead of discarding (6099072)
Documentation¶
- Provider conformance testing TRD describing test categories and API design (
58a9697)
Build¶
- MIT LICENSE file (
f124dcf)
v0.2.0 - 2026-01-18¶
Highlights¶
- Audio codec package with PCM, mu-law, and a-law support for telephony
- MCP server enabling Claude Code to make voice calls
- Pipeline components connecting STT, TTS, and transport providers
Added¶
- Audio codec package with PCM sample conversions (int16, float32, float64, bytes) (
f64fe1e) - Mu-law encoding/decoding for Twilio Media Streams (
f64fe1e) - A-law encoding/decoding for international telephony (
f64fe1e) - Audio resampling, normalization, and analysis utilities (
f64fe1e) - MCP server with stdio transport for voice interactions (
721cbac) - Voice interaction tools:
initiate_call,continue_call,speak_to_user,end_call(721cbac) - Session management for tracking active voice calls (
721cbac) TTSPipelinefor streaming TTS output to transport connections (11c906d)StreamingTTSPipelinefor connecting streaming LLM text to TTS to transport (11c906d)STTPipelinefor streaming audio from transport to STT with event callbacks (11c906d)
Documentation¶
- Voice integration PRD outlining goals, user stories, and success metrics (
fd86611) - Twilio integration TRD detailing Media Streams architecture (
fd86611)
Tests¶
- Comprehensive unit tests for audio codec functions (mu-law, a-law, PCM) (
f64fe1e)
v0.1.0 - 2025-12-28¶
Highlights¶
- Initial OmniVoice voice abstraction layer for multi-provider telephony
Added¶
- Voice abstraction layer with provider-agnostic interfaces (
8a54bc2) - STT (speech-to-text) provider interface with streaming support (
8a54bc2) - TTS (text-to-speech) provider interface with streaming support (
8a54bc2) - Transport interface for audio connections (Twilio, Zoom, etc.) (
8a54bc2) - Export
CallOptionsfor provider implementations (7e1b52d)