Release Notes: v0.15.0¶

Release Date: 2026-06-27

Summary¶

Major release adding local TTS/STT providers for Apple Silicon, a comprehensive CLI tool (omnictl), voice profile library for managing reference audio, and audio analysis utilities for optimal voice cloning segment selection.

Highlights¶

Local TTS/STT Providers: F5-TTS MLX and Whisper MLX providers communicate via gRPC over Unix Domain Socket for low-latency local inference on Apple Silicon
CLI Tool (omnictl): Development CLI for proto generation, server management, and voice profile operations
Voice Profile Library: Manage voice profiles with reference audio for zero-shot voice cloning
Audio Analysis: Find optimal 10-15 second segments from longer recordings for voice cloning

What's New¶

Local TTS Provider: F5-TTS MLX¶

Zero-shot voice cloning with F5-TTS running locally on Apple Silicon:

import (
    omnivoice "github.com/plexusone/omnivoice-core"
    "github.com/plexusone/omnivoice-core/registry"
    _ "github.com/plexusone/omnivoice-core/providers/f5tts-mlx"
)

// Get local TTS provider
provider, err := omnivoice.GetTTSProvider("f5tts-mlx",
    registry.WithEndpoint("unix:///tmp/omnivoice-f5tts.sock"),
)

// Synthesize with voice cloning
if cloner, ok := provider.(tts.ReferenceSynthesizer); ok {
    audio, err := cloner.SynthesizeWithReference(ctx,
        "Hello, this is my cloned voice.",
        referenceAudio,
        referenceText,
    )
}

Local STT Provider: Whisper MLX¶

Fast local transcription with Whisper running on Apple Silicon:

import (
    omnivoice "github.com/plexusone/omnivoice-core"
    "github.com/plexusone/omnivoice-core/registry"
    _ "github.com/plexusone/omnivoice-core/providers/whisper-mlx"
)

// Get local STT provider
provider, err := omnivoice.GetSTTProvider("whisper-mlx",
    registry.WithEndpoint("unix:///tmp/omnivoice-whisper.sock"),
)

// Transcribe with word-level timestamps
result, err := provider.Transcribe(ctx, audioReader, stt.TranscriptionConfig{
    Language: "en",
})

CLI Tool: omnictl¶

New development CLI for omnivoice-core operations:

# Generate Go/Python code from proto files
omnictl generate proto

# Manage local TTS/STT servers
omnictl server start f5tts
omnictl server list
omnictl server stop f5tts

# Check server health
omnictl health

# Analyze audio for voice cloning
omnictl voice analyze recording.wav --duration 15

# Extract best segment
omnictl voice extract recording.wav -o reference.wav --best

# Manage voice profiles
omnictl voice list
omnictl voice create --slug john-doe reference.wav --transcript "Hello..."

Voice Profile Library¶

Programmatic voice profile management:

import "github.com/plexusone/omnivoice-core/voices"

// Create library with default path (~/.plexusone/omnivoice/voices)
lib, err := voices.NewLibrary("")

// List available profiles
profiles, err := lib.List()

// Create a new profile
err = lib.Create("john-doe", referenceAudio, referenceText, &voices.Metadata{
    Name:     "John Doe",
    Language: "en-US",
})

// Load profile for synthesis
audio, text, err := lib.LoadReferenceAudio("john-doe")

Audio Analysis for Voice Cloning¶

Find optimal segments from longer recordings:

import "github.com/plexusone/omnivoice-core/audio"

// Read WAV file
header, samples, err := audio.ReadWAV("recording.wav")

// Find best 15-second segments
config := audio.DefaultAnalyzeConfig()
config.TargetDuration = 15.0
config.TopN = 5

segments, err := audio.FindBestSegments(samples, header.SampleRate, config)

// Extract best segment
best := segments[0]
segment := audio.ExtractSegment(samples, best.StartSample, best.EndSample)

// Write to file
err = audio.WriteWAV("reference.wav", segment, header.SampleRate)

Capability Interfaces for Local Providers¶

Optional interfaces that local providers can implement:

Interface	Description
`VoiceCloner`	Create voice profiles from reference audio
`ReferenceSynthesizer`	Zero-shot synthesis with inline reference
`ProfileCacher`	Manage cached voice embeddings
`ModelManager`	Load/unload models dynamically
`RuntimeChecker`	Get runtime environment info
`HealthChecker`	Health status checks

gRPC Protocol Buffers¶

New proto definitions for local providers:

proto/localtts/v1/localtts.proto: TTS synthesis, cloning, caching
proto/localstt/v1/localstt.proto: Transcription, streaming, models
proto/localvoice/v1/localvoice.proto: Combined voice services

Bug Fixes¶

WriteWAV error handling: Properly track and report write errors
File permissions: Use 0o600 instead of 0644 for security
CLI error handling: Handle unchecked errors in server and voice commands
Integer conversions: Add nolint for safe int32 conversions in providers

Installation¶

go get github.com/plexusone/omnivoice-core@v0.15.0

Prerequisites for Local Providers¶

Local TTS/STT requires Apple Silicon (M1/M2/M3/M4) with Python MLX:

# Create Python environment
python3 -m venv ~/.plexusone/omnivoice/venv
source ~/.plexusone/omnivoice/venv/bin/activate

# Install F5-TTS MLX
pip install f5-tts-mlx grpcio protobuf

# Install Whisper MLX
pip install mlx-whisper grpcio protobuf

Migration Guide¶

From v0.14.x¶

No breaking changes. All existing code continues to work.

New functionality is additive:

// New: Local provider with endpoint
provider, _ := omnivoice.GetTTSProvider("f5tts-mlx",
    registry.WithEndpoint("unix:///tmp/omnivoice-f5tts.sock"),
)

// New: Voice cloning capability check
if cloner, ok := provider.(tts.ReferenceSynthesizer); ok {
    audio, _ := cloner.SynthesizeWithReference(ctx, text, refAudio, refText)
}

// New: Voice profile library
lib, _ := voices.NewLibrary("")
profiles, _ := lib.List()

Documentation¶

Local TTS Providers Guide - Setup and usage for F5-TTS and Whisper
Voice Cloning Guide - Tips for recording reference audio

Full Changelog¶

See CHANGELOG.md for the complete list of changes.