Release Notes: v0.15.0¶
Release Date: 2026-06-27
Summary¶
Major release adding local TTS/STT providers for Apple Silicon, a comprehensive CLI tool (omnictl), voice profile library for managing reference audio, and audio analysis utilities for optimal voice cloning segment selection.
Highlights¶
- Local TTS/STT Providers: F5-TTS MLX and Whisper MLX providers communicate via gRPC over Unix Domain Socket for low-latency local inference on Apple Silicon
- CLI Tool (omnictl): Development CLI for proto generation, server management, and voice profile operations
- Voice Profile Library: Manage voice profiles with reference audio for zero-shot voice cloning
- Audio Analysis: Find optimal 10-15 second segments from longer recordings for voice cloning
What's New¶
Local TTS Provider: F5-TTS MLX¶
Zero-shot voice cloning with F5-TTS running locally on Apple Silicon:
import (
omnivoice "github.com/plexusone/omnivoice-core"
"github.com/plexusone/omnivoice-core/registry"
_ "github.com/plexusone/omnivoice-core/providers/f5tts-mlx"
)
// Get local TTS provider
provider, err := omnivoice.GetTTSProvider("f5tts-mlx",
registry.WithEndpoint("unix:///tmp/omnivoice-f5tts.sock"),
)
// Synthesize with voice cloning
if cloner, ok := provider.(tts.ReferenceSynthesizer); ok {
audio, err := cloner.SynthesizeWithReference(ctx,
"Hello, this is my cloned voice.",
referenceAudio,
referenceText,
)
}
Local STT Provider: Whisper MLX¶
Fast local transcription with Whisper running on Apple Silicon:
import (
omnivoice "github.com/plexusone/omnivoice-core"
"github.com/plexusone/omnivoice-core/registry"
_ "github.com/plexusone/omnivoice-core/providers/whisper-mlx"
)
// Get local STT provider
provider, err := omnivoice.GetSTTProvider("whisper-mlx",
registry.WithEndpoint("unix:///tmp/omnivoice-whisper.sock"),
)
// Transcribe with word-level timestamps
result, err := provider.Transcribe(ctx, audioReader, stt.TranscriptionConfig{
Language: "en",
})
CLI Tool: omnictl¶
New development CLI for omnivoice-core operations:
# Generate Go/Python code from proto files
omnictl generate proto
# Manage local TTS/STT servers
omnictl server start f5tts
omnictl server list
omnictl server stop f5tts
# Check server health
omnictl health
# Analyze audio for voice cloning
omnictl voice analyze recording.wav --duration 15
# Extract best segment
omnictl voice extract recording.wav -o reference.wav --best
# Manage voice profiles
omnictl voice list
omnictl voice create --slug john-doe reference.wav --transcript "Hello..."
Voice Profile Library¶
Programmatic voice profile management:
import "github.com/plexusone/omnivoice-core/voices"
// Create library with default path (~/.plexusone/omnivoice/voices)
lib, err := voices.NewLibrary("")
// List available profiles
profiles, err := lib.List()
// Create a new profile
err = lib.Create("john-doe", referenceAudio, referenceText, &voices.Metadata{
Name: "John Doe",
Language: "en-US",
})
// Load profile for synthesis
audio, text, err := lib.LoadReferenceAudio("john-doe")
Audio Analysis for Voice Cloning¶
Find optimal segments from longer recordings:
import "github.com/plexusone/omnivoice-core/audio"
// Read WAV file
header, samples, err := audio.ReadWAV("recording.wav")
// Find best 15-second segments
config := audio.DefaultAnalyzeConfig()
config.TargetDuration = 15.0
config.TopN = 5
segments, err := audio.FindBestSegments(samples, header.SampleRate, config)
// Extract best segment
best := segments[0]
segment := audio.ExtractSegment(samples, best.StartSample, best.EndSample)
// Write to file
err = audio.WriteWAV("reference.wav", segment, header.SampleRate)
Capability Interfaces for Local Providers¶
Optional interfaces that local providers can implement:
| Interface | Description |
|---|---|
VoiceCloner |
Create voice profiles from reference audio |
ReferenceSynthesizer |
Zero-shot synthesis with inline reference |
ProfileCacher |
Manage cached voice embeddings |
ModelManager |
Load/unload models dynamically |
RuntimeChecker |
Get runtime environment info |
HealthChecker |
Health status checks |
gRPC Protocol Buffers¶
New proto definitions for local providers:
proto/localtts/v1/localtts.proto: TTS synthesis, cloning, cachingproto/localstt/v1/localstt.proto: Transcription, streaming, modelsproto/localvoice/v1/localvoice.proto: Combined voice services
Bug Fixes¶
- WriteWAV error handling: Properly track and report write errors
- File permissions: Use 0o600 instead of 0644 for security
- CLI error handling: Handle unchecked errors in server and voice commands
- Integer conversions: Add nolint for safe int32 conversions in providers
Installation¶
Prerequisites for Local Providers¶
Local TTS/STT requires Apple Silicon (M1/M2/M3/M4) with Python MLX:
# Create Python environment
python3 -m venv ~/.plexusone/omnivoice/venv
source ~/.plexusone/omnivoice/venv/bin/activate
# Install F5-TTS MLX
pip install f5-tts-mlx grpcio protobuf
# Install Whisper MLX
pip install mlx-whisper grpcio protobuf
Migration Guide¶
From v0.14.x¶
No breaking changes. All existing code continues to work.
New functionality is additive:
// New: Local provider with endpoint
provider, _ := omnivoice.GetTTSProvider("f5tts-mlx",
registry.WithEndpoint("unix:///tmp/omnivoice-f5tts.sock"),
)
// New: Voice cloning capability check
if cloner, ok := provider.(tts.ReferenceSynthesizer); ok {
audio, _ := cloner.SynthesizeWithReference(ctx, text, refAudio, refText)
}
// New: Voice profile library
lib, _ := voices.NewLibrary("")
profiles, _ := lib.List()
Documentation¶
- Local TTS Providers Guide - Setup and usage for F5-TTS and Whisper
- Voice Cloning Guide - Tips for recording reference audio
Full Changelog¶
See CHANGELOG.md for the complete list of changes.