Skip to content

Release Notes: v0.15.0

Release Date: 2026-06-27

Summary

Major release adding local TTS/STT providers for Apple Silicon, a comprehensive CLI tool (omnictl), voice profile library for managing reference audio, and audio analysis utilities for optimal voice cloning segment selection.

Highlights

  • Local TTS/STT Providers: F5-TTS MLX and Whisper MLX providers communicate via gRPC over Unix Domain Socket for low-latency local inference on Apple Silicon
  • CLI Tool (omnictl): Development CLI for proto generation, server management, and voice profile operations
  • Voice Profile Library: Manage voice profiles with reference audio for zero-shot voice cloning
  • Audio Analysis: Find optimal 10-15 second segments from longer recordings for voice cloning

What's New

Local TTS Provider: F5-TTS MLX

Zero-shot voice cloning with F5-TTS running locally on Apple Silicon:

import (
    omnivoice "github.com/plexusone/omnivoice-core"
    "github.com/plexusone/omnivoice-core/registry"
    _ "github.com/plexusone/omnivoice-core/providers/f5tts-mlx"
)

// Get local TTS provider
provider, err := omnivoice.GetTTSProvider("f5tts-mlx",
    registry.WithEndpoint("unix:///tmp/omnivoice-f5tts.sock"),
)

// Synthesize with voice cloning
if cloner, ok := provider.(tts.ReferenceSynthesizer); ok {
    audio, err := cloner.SynthesizeWithReference(ctx,
        "Hello, this is my cloned voice.",
        referenceAudio,
        referenceText,
    )
}

Local STT Provider: Whisper MLX

Fast local transcription with Whisper running on Apple Silicon:

import (
    omnivoice "github.com/plexusone/omnivoice-core"
    "github.com/plexusone/omnivoice-core/registry"
    _ "github.com/plexusone/omnivoice-core/providers/whisper-mlx"
)

// Get local STT provider
provider, err := omnivoice.GetSTTProvider("whisper-mlx",
    registry.WithEndpoint("unix:///tmp/omnivoice-whisper.sock"),
)

// Transcribe with word-level timestamps
result, err := provider.Transcribe(ctx, audioReader, stt.TranscriptionConfig{
    Language: "en",
})

CLI Tool: omnictl

New development CLI for omnivoice-core operations:

# Generate Go/Python code from proto files
omnictl generate proto

# Manage local TTS/STT servers
omnictl server start f5tts
omnictl server list
omnictl server stop f5tts

# Check server health
omnictl health

# Analyze audio for voice cloning
omnictl voice analyze recording.wav --duration 15

# Extract best segment
omnictl voice extract recording.wav -o reference.wav --best

# Manage voice profiles
omnictl voice list
omnictl voice create --slug john-doe reference.wav --transcript "Hello..."

Voice Profile Library

Programmatic voice profile management:

import "github.com/plexusone/omnivoice-core/voices"

// Create library with default path (~/.plexusone/omnivoice/voices)
lib, err := voices.NewLibrary("")

// List available profiles
profiles, err := lib.List()

// Create a new profile
err = lib.Create("john-doe", referenceAudio, referenceText, &voices.Metadata{
    Name:     "John Doe",
    Language: "en-US",
})

// Load profile for synthesis
audio, text, err := lib.LoadReferenceAudio("john-doe")

Audio Analysis for Voice Cloning

Find optimal segments from longer recordings:

import "github.com/plexusone/omnivoice-core/audio"

// Read WAV file
header, samples, err := audio.ReadWAV("recording.wav")

// Find best 15-second segments
config := audio.DefaultAnalyzeConfig()
config.TargetDuration = 15.0
config.TopN = 5

segments, err := audio.FindBestSegments(samples, header.SampleRate, config)

// Extract best segment
best := segments[0]
segment := audio.ExtractSegment(samples, best.StartSample, best.EndSample)

// Write to file
err = audio.WriteWAV("reference.wav", segment, header.SampleRate)

Capability Interfaces for Local Providers

Optional interfaces that local providers can implement:

Interface Description
VoiceCloner Create voice profiles from reference audio
ReferenceSynthesizer Zero-shot synthesis with inline reference
ProfileCacher Manage cached voice embeddings
ModelManager Load/unload models dynamically
RuntimeChecker Get runtime environment info
HealthChecker Health status checks

gRPC Protocol Buffers

New proto definitions for local providers:

  • proto/localtts/v1/localtts.proto: TTS synthesis, cloning, caching
  • proto/localstt/v1/localstt.proto: Transcription, streaming, models
  • proto/localvoice/v1/localvoice.proto: Combined voice services

Bug Fixes

  • WriteWAV error handling: Properly track and report write errors
  • File permissions: Use 0o600 instead of 0644 for security
  • CLI error handling: Handle unchecked errors in server and voice commands
  • Integer conversions: Add nolint for safe int32 conversions in providers

Installation

go get github.com/plexusone/omnivoice-core@v0.15.0

Prerequisites for Local Providers

Local TTS/STT requires Apple Silicon (M1/M2/M3/M4) with Python MLX:

# Create Python environment
python3 -m venv ~/.plexusone/omnivoice/venv
source ~/.plexusone/omnivoice/venv/bin/activate

# Install F5-TTS MLX
pip install f5-tts-mlx grpcio protobuf

# Install Whisper MLX
pip install mlx-whisper grpcio protobuf

Migration Guide

From v0.14.x

No breaking changes. All existing code continues to work.

New functionality is additive:

// New: Local provider with endpoint
provider, _ := omnivoice.GetTTSProvider("f5tts-mlx",
    registry.WithEndpoint("unix:///tmp/omnivoice-f5tts.sock"),
)

// New: Voice cloning capability check
if cloner, ok := provider.(tts.ReferenceSynthesizer); ok {
    audio, _ := cloner.SynthesizeWithReference(ctx, text, refAudio, refText)
}

// New: Voice profile library
lib, _ := voices.NewLibrary("")
profiles, _ := lib.List()

Documentation

Full Changelog

See CHANGELOG.md for the complete list of changes.