Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, this project adheres to Semantic Versioning, commits follow Conventional Commits, and this changelog is generated by Structured Changelog.

Unreleased

v0.14.0 - 2026-06-15

Highlights

  • Gateway provider registry for voice gateways (Twilio, Telnyx)
  • Realtime provider registry for native voice-to-voice (OpenAI Realtime, Gemini Live)
  • Side-effect registration via init() for provider discovery

Added

  • RegisterGatewayProvider() for registering voice gateway providers (609dc45)
  • GetGatewayProvider() for retrieving registered gateway providers (609dc45)
  • ListGatewayProviders() for discovering available gateway providers (609dc45)
  • HasGatewayProvider() for checking gateway registration (609dc45)
  • RegisterRealtimeProvider() for registering native voice-to-voice providers (609dc45)
  • GetRealtimeProvider() for retrieving registered realtime providers (609dc45)
  • ListRealtimeProviders() for discovering available realtime providers (609dc45)
  • HasRealtimeProvider() for checking realtime registration (609dc45)
  • registry.Gateway interface for gateway provider implementations (609dc45)
  • registry.RealtimeProvider interface for realtime provider implementations (609dc45)
  • registry.GatewayProviderFactory and registry.RealtimeProviderFactory types (609dc45)
  • WithListener() option for pre-configured gateway listeners (609dc45)
  • WithPublicURL() option for gateway webhook callbacks (609dc45)
  • WithListenAddr() option for gateway server addresses (609dc45)
  • WithConnectionID() option for Telnyx connection IDs (609dc45)
  • WithVoice() option for realtime provider voice selection (609dc45)
  • WithModel() option for realtime provider model selection (609dc45)
  • WithInstructions() option for realtime provider system prompts (609dc45)

v0.13.0 - 2026-06-15

Highlights

  • Typed Encoding system for audio format handling
  • Encoding normalization handles pcm16, ulaw, wav, and other aliases
  • IsRawEncoding() for detecting formats that need explicit parameters

Added

  • format.Encoding type with constants: Linear16, MuLaw, ALaw, MP3, Opus, FLAC, AAC, Speex, WebM (48d4fe4)
  • Encoding.Normalize() method for canonical encoding names with alias mapping (48d4fe4)
  • Encoding.IsRaw() method to detect raw audio formats (48d4fe4)
  • IsRawEncoding(string) convenience function (48d4fe4)
  • WAV normalization to Linear16 (WAV is a container for raw PCM) (5451f73)
  • Speex and WebM encoding constants (5451f73)

Changed

  • audio/converter uses format.Encoding type for codec selection (48d4fe4)
  • Predefined AudioFormat vars use encoding constants instead of strings (48d4fe4)

Tests

  • Comprehensive tests for Encoding.Normalize() with all aliases (48d4fe4)
  • Tests for IsRawEncoding() with raw and container formats (48d4fe4)

v0.12.1 - 2026-06-14

Highlights

  • Internal refactoring to reduce code duplication
  • Generic provider.Client[T] for multi-provider management

Added

  • format.PCM16_24kHz and format.PCM16_16kHz aliases for provider-agnostic usage (33b946a)

v0.12.0 - 2026-06-14

Highlights

  • Realtime Provider interface for native voice-to-voice LLMs (OpenAI Realtime, Gemini Live)
  • Audio format conversion between telephony (mulaw 8kHz) and realtime providers (PCM16 16/24kHz)
  • RealtimeBridge for seamless telephony-to-realtime provider integration
  • Dual pipeline modes: text (STT→LLM→TTS) or realtime (voice-to-voice)

Added

  • realtime package with Provider interface for voice-to-voice LLM providers (55c873a)
  • ProcessConfig with instructions, voice, functions, and callbacks (55c873a)
  • AudioChunk and Transcript types for provider output (55c873a)
  • FunctionDeclaration for model function calling (55c873a)
  • Client for multi-provider management with automatic fallback (226915d)
  • audio/format package with common audio format constants (Twilio, OpenAI, Gemini) (4d1b2e2)
  • audio/converter package for audio format conversion (4e4d1b6)
  • TwilioToOpenAI(), OpenAIToTwilio(), TwilioToGemini(), GeminiToTwilio() convenience functions (4e4d1b6)
  • StreamConverter for efficient streaming audio conversion (4e4d1b6)
  • PipelineMode type with PipelineModeText and PipelineModeRealtime constants (235584e)
  • RealtimeConfig for configuring realtime providers (235584e)
  • RealtimeProviderFactory interface for creating realtime providers from config (235584e)
  • RealtimeBridge for bridging telephony WebSocket to realtime providers (df38137)
  • NewRealtimeBridgeForTwilio() and NewRealtimeBridgeForTwilioGemini() convenience constructors (df38137)

Tests

  • Unit tests for realtime.Provider interface and Client (c009f3d)
  • Tests for audio/converter with codec and sample rate conversion (4e4d1b6)
  • Tests for RealtimeBridge lifecycle and event handling (df38137)

v0.11.0 - 2026-06-13

Highlights

  • Global provider registry with priority-based registration
  • Architecture documentation for provider implementation

Added

  • Global RegisterSTTProvider(), RegisterTTSProvider(), RegisterCallSystemProvider() functions at package level (49c11e3)
  • GetSTTProvider(), GetTTSProvider(), GetCallSystemProvider() for retrieving registered providers (49c11e3)
  • ListSTTProviders(), ListTTSProviders(), ListCallSystemProviders() for discovering available providers (49c11e3)
  • HasSTTProvider(), HasTTSProvider(), HasCallSystemProvider() for checking registration (49c11e3)
  • GetSTTProviderPriority(), GetTTSProviderPriority(), GetCallSystemProviderPriority() for querying priority (49c11e3)
  • PriorityThin (0) and PriorityThick (10) constants for provider layering (49c11e3)

Documentation

  • CLAUDE.md with provider registry architecture and implementation guidelines (51f6883)
  • Dependency architecture diagram (omnivoice-core → providers → omnivoice) (51f6883)
  • Step-by-step guide for adding new providers (51f6883)

v0.10.0 - 2026-06-13

Highlights

  • Voice Gateway interface for provider-agnostic PSTN and WebRTC communication
  • Session persistence with in-memory and Redis backends
  • Barge-in detection for user interruption handling

Added

  • gateway package with Gateway interface for PSTN telephony providers (Twilio, Telnyx, Vonage, Plivo) (8f375df)
  • WebRTCGateway interface for browser/mobile voice (LiveKit, Daily) (8f375df)
  • Session and WebRTCSession interfaces for active call management (8f375df)
  • Event types for call lifecycle: SessionStarted, SessionEnded, UserSpeechStart, AgentSpeechStart, Interruption, etc. (8f375df)
  • storage package with SessionStore interface for call state persistence (b1c8cf9)
  • MemoryStore for in-process session storage (b1c8cf9)
  • RedisStore for distributed session storage with configurable TTL (b1c8cf9)
  • SessionState type with conversation history, metrics, and recovery data (b1c8cf9)
  • bargein package with Detector for user interruption detection (0ab0e4b)
  • Interruption modes: ModeImmediate, ModeAfterSentence, ModeDisabled (0ab0e4b)
  • Automatic TTS cancellation on user speech detection (0ab0e4b)
  • MulawToWAV() function in audio/codec for WAV container encoding (4a34433)

Changed

  • Twilio example moved to omni-twilio repository (26d2703)

Dependencies

  • Added github.com/redis/go-redis/v9 for Redis session storage (b1c8cf9)
  • Bump github.com/grokify/mogo from 0.74.3 to 0.74.6 (e22b0e9)

Documentation

  • Voice architecture guide with system design overview (b2b6215)
  • Barge-in detection usage guide (b2b6215)
  • Session storage configuration guide (b2b6215)

Tests

  • Storage interface conformance tests for MemoryStore and RedisStore (b1c8cf9)
  • Barge-in detector tests for interruption modes and timing (0ab0e4b)

v0.9.0 - 2026-05-02

Highlights

  • Canonical Transcript format for STT output with embedded JSON Schema
  • DurationMilliseconds type for JSON-friendly duration serialization

Added

  • Canonical Transcript type in stt package for standardized STT output across providers (af49b8b)
  • TranscriptSegment and TranscriptWord types with timing information (af49b8b)
  • TranscriptMetadata for provenance (provider, model, options) (af49b8b)
  • NewTranscript() constructor from TranscriptionResult (af49b8b)
  • LoadTranscript() and SaveJSON() for file I/O (af49b8b)
  • TotalDuration(), SegmentDuration(), WordDuration() convenience methods (af49b8b)
  • schema package with embedded TranscriptV1Schema JSON Schema (d794eb0)
  • Duration fields use duration.DurationMilliseconds for integer millisecond JSON serialization (af49b8b)

Dependencies

  • Added github.com/grokify/mogo v0.74.3 for DurationMilliseconds type (0f83fc8)

Tests

  • TestNewTranscript validates transcript creation from transcription results (92755a8)
  • TestTranscriptJSONRoundTrip validates JSON serialization and deserialization (92755a8)

v0.8.0 - 2026-04-03

Highlights

  • New resilience package for provider-agnostic error classification and retry logic
  • Smart fallback in TTS and STT clients - only switch providers on permanent errors
  • 8 error categories for actionable error handling decisions

Added

  • resilience package with error categorization system (53a72b1)
  • ErrorCategory type with 8 categories: transient, rate_limit, validation, auth, not_found, server, quota, unknown (53a72b1)
  • ErrorInfo struct with category, retryability, code, message, suggestion, and retry-after hint (53a72b1)
  • ProviderError type wrapping provider errors with classification metadata (53a72b1)
  • ErrorClassifier interface for provider-specific error classification (53a72b1)
  • HTTPStatusClassifier for HTTP status code classification (53a72b1)
  • Retry() and RetryWithResult[T]() generic retry functions with configurable backoff (53a72b1)
  • RetryConfig with max attempts, backoff strategy, classifier, and OnRetry callback (53a72b1)
  • RetryError type for exhausted retry attempts with attempt count (53a72b1)
  • Backoff strategies: ExponentialBackoff, LinearBackoff, ConstantBackoff, NoBackoff (53a72b1)
  • DefaultBackoff() and DefaultRetryConfig() with sensible defaults (53a72b1)
  • IsProviderError() helper to extract ProviderError from error chain (53a72b1)
  • IsRetryable() helper to check error retryability (53a72b1)

Changed

  • TTS client now uses smart fallback - only switches providers on permanent (non-retryable) errors (53a72b1)
  • STT client now uses smart fallback - only switches providers on permanent (non-retryable) errors (53a72b1)
  • Fallback behavior is now determined by error classification, not error occurrence (53a72b1)

Documentation

  • Package documentation with examples in resilience/doc.go (53a72b1)

Tests

  • 44 tests in resilience package covering retry logic, backoff strategies, error types, and categories (53a72b1)
  • 4 TTS fallback behavior tests in tts/fallback_test.go (53a72b1)
  • 4 STT fallback behavior tests in stt/fallback_test.go (53a72b1)

v0.7.0 - 2026-03-30

Highlights

  • Enhanced TTS mock fixtures with WAV audio generation
  • Provider-specific mocks for ElevenLabs, Deepgram, and OpenAI
  • Configurable mock behaviors for testing (latency, errors, timing)

Added

  • GenerateWAVFixture(durationMs, sampleRate) creates valid WAV files with proper RIFF headers for testing (17ef8b9)
  • GenerateShortWAV() and GenerateOneSecondWAV() convenience fixtures (17ef8b9)
  • MockProviderOption functional options pattern for configurable mock behavior (17ef8b9)
  • WithFixedDuration(ms) option for mocks returning fixed-length audio (17ef8b9)
  • WithRealisticTiming() option for text-length-proportional audio duration (17ef8b9)
  • WithError(err) option for error injection testing (17ef8b9)
  • WithLatency(duration) option for simulating network delays (17ef8b9)
  • WithFailAfterN(n, err) option for testing retry/failover logic (17ef8b9)
  • NewElevenLabsMock() with 3 preconfigured voices (Rachel, Bella, Antoni) (17ef8b9)
  • NewDeepgramMock() with 3 preconfigured voices (Asteria, Luna, Orion) (17ef8b9)
  • NewOpenAIMock() with 6 preconfigured voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer) (17ef8b9)
  • Common test errors: ErrMockRateLimit, ErrMockQuotaExceeded, ErrMockNetworkError, ErrMockInvalidVoice (17ef8b9)

Fixed

  • Fixed gofmt formatting and added G115 nolint annotations for bounded integer conversions in WAV generation (aa28bc8)

Documentation

  • Testing with Mocks guide covering mock providers, WAV fixtures, and common test patterns (1980a53)

Tests

  • Comprehensive tests for WAV fixture generation and header validation (17ef8b9)
  • Tests for all MockProviderOption behaviors including context cancellation (17ef8b9)
  • Provider conformance tests for ElevenLabs, Deepgram, and OpenAI mocks (17ef8b9)

v0.6.0 - 2026-03-21

Highlights

  • Observability package for voice instrumentation with hooks and events
  • Registry package for provider discovery and registration
  • CallSystem client with multi-provider failover support
  • SMS messaging support via SMSProvider interface

Added

  • observability package with VoiceEvent, VoiceObserver, TTSHook, and STTHook interfaces (dda0212)
  • Event types for call lifecycle (initiated, ringing, answered, ended, failed) (dda0212)
  • NoOpTTSHook and NoOpSTTHook for optional instrumentation (dda0212)
  • registry package with Registry interface for provider discovery (0ec559f)
  • Factory types for TTS, STT, and CallSystem providers (0ec559f)
  • callsystem.Client for managing multiple CallSystem providers with automatic failover (c12a16a)
  • SMSProvider interface and SMSMessage type for SMS support (d7d8c63)
  • Hook field in SynthesisConfig for TTS observability (8b3c38a)
  • Hook field in TranscriptionConfig for STT observability (8b3c38a)
  • Observer field in CallSystemConfig and CallOptions for call events (8b3c38a)
  • WithObserver CallOption for per-call observability (8b3c38a)
  • ObservableCallSystem interface combining CallSystem with Observable (8b3c38a)
  • SetHook() and Hook() methods on TTS and STT clients (8b3c38a)

Fixed

  • Resolved gosec G120 warnings in Twilio webhook example by adding http.MaxBytesReader (ac115b7)

v0.5.0 - 2026-02-28

Highlights

  • Organization rename from agentplexus to plexusone

Changed

  • Breaking: Go module path changed from github.com/agentplexus/omnivoice to github.com/plexusone/omnivoice-core (bf46b07)

v0.4.3 - 2026-02-15

Highlights

  • Comprehensive tests for English and Chinese subtitle generation

Tests

  • TestWordsToSubtitleCues_EnglishWordGrouping for word-based cue grouping (0ddb8bc)
  • TestWordsToSubtitleCues_ChineseCharacters for character-by-character tokenization (0ddb8bc)
  • TestWordsToSubtitleCues_MixedChineseEnglish for mixed language content (0ddb8bc)
  • TestWordsToSubtitleCues_LongChineseText for multi-cue splitting (0ddb8bc)

v0.4.2 - 2026-02-15

Highlights

  • Fixed subtitle word cutoff at line boundaries

Fixed

  • Subtitle cue chunking now checks actual wrapped line count instead of total character count, preventing words from being cut off when they would appear on a third line (a301897)

Tests

  • TestWordsToSubtitleCues_LineCountLimit verifies cues split correctly at line boundaries (a301897)

v0.4.1 - 2026-02-14

Highlights

  • STT conformance tests for TranscribeFile and TranscribeURL batch transcription methods

Tests

  • TranscribeFile conformance test for local file transcription (c441944)
  • TranscribeURL conformance test for remote URL transcription (c441944)

v0.4.0 - 2026-02-14

Highlights

  • Subtitle generation from STT transcription results
  • Extensible config maps for provider-specific settings

Added

  • Subtitle package for SRT/VTT generation from transcription results (17730a7)
  • Configurable max characters per line and lines per cue for subtitles (17730a7)
  • Word-level timestamp-based cue splitting (17730a7)
  • Extensions map in TranscriptionConfig for provider-specific STT settings (84c37f5)
  • Extensions map in SynthesisConfig for provider-specific TTS settings (665c3be)

Fixed

  • Subtitle wrapText no longer clips words when text exceeds line limit (63144bb)

Documentation

  • Voice cloning guide with recording tips and phonetically balanced text (1f0cdd8)

Tests

  • Call system provider conformance tests (MakeCall, ListCalls, OnIncomingCall) (9683ca2)
  • Transport provider conformance tests (Listen, Connect, Protocol) (9683ca2)

v0.3.0 - 2026-01-24

Highlights

  • Provider conformance test suites for TTS and STT implementations

Added

  • TTS provider conformance test suite (Synthesize, SynthesizeStream, SynthesizeFromReader) (e3705c7)
  • Mock TTS provider for self-testing with configurable audio format responses (e3705c7)
  • STT provider conformance test suite (Transcribe, TranscribeStream) (69cfd20)
  • Mock STT provider with streaming transcription simulation (69cfd20)

Fixed

  • MCP session and tool handlers now log Close() errors instead of discarding (6099072)

Documentation

  • Provider conformance testing TRD describing test categories and API design (58a9697)

Build

v0.2.0 - 2026-01-18

Highlights

  • Audio codec package with PCM, mu-law, and a-law support for telephony
  • MCP server enabling Claude Code to make voice calls
  • Pipeline components connecting STT, TTS, and transport providers

Added

  • Audio codec package with PCM sample conversions (int16, float32, float64, bytes) (f64fe1e)
  • Mu-law encoding/decoding for Twilio Media Streams (f64fe1e)
  • A-law encoding/decoding for international telephony (f64fe1e)
  • Audio resampling, normalization, and analysis utilities (f64fe1e)
  • MCP server with stdio transport for voice interactions (721cbac)
  • Voice interaction tools: initiate_call, continue_call, speak_to_user, end_call (721cbac)
  • Session management for tracking active voice calls (721cbac)
  • TTSPipeline for streaming TTS output to transport connections (11c906d)
  • StreamingTTSPipeline for connecting streaming LLM text to TTS to transport (11c906d)
  • STTPipeline for streaming audio from transport to STT with event callbacks (11c906d)

Documentation

  • Voice integration PRD outlining goals, user stories, and success metrics (fd86611)
  • Twilio integration TRD detailing Media Streams architecture (fd86611)

Tests

  • Comprehensive unit tests for audio codec functions (mu-law, a-law, PCM) (f64fe1e)

v0.1.0 - 2025-12-28

Highlights

  • Initial OmniVoice voice abstraction layer for multi-provider telephony

Added

  • Voice abstraction layer with provider-agnostic interfaces (8a54bc2)
  • STT (speech-to-text) provider interface with streaming support (8a54bc2)
  • TTS (text-to-speech) provider interface with streaming support (8a54bc2)
  • Transport interface for audio connections (Twilio, Zoom, etc.) (8a54bc2)
  • Export CallOptions for provider implementations (7e1b52d)

Documentation

  • README with project overview and shields (4f298df)
  • Marp presentation for OmniVoice (d2d67cf)

Build

  • GitHub Actions CI workflow (4bad35d)
  • golangci-lint configuration and fixes (3693297)