Skip to content

OmniVoice Deepgram Provider

OmniVoice provider implementation for Deepgram speech-to-text and text-to-speech services.

This package adapts the official Deepgram Go SDK to the OmniVoice interfaces, enabling Deepgram's STT and TTS capabilities within the OmniVoice framework.

Features

Speech-to-Text (STT)

  • Real-time streaming transcription via WebSocket
  • Batch transcription from files, URLs, or bytes
  • Support for telephony audio formats (mu-law, a-law)
  • Interim and final transcription results
  • Speech start/end detection for natural turn-taking
  • Speaker diarization support
  • Keyword boosting

Text-to-Speech (TTS)

  • Non-streaming synthesis via REST API
  • Real-time streaming synthesis via WebSocket
  • Streaming input support (pipe LLM output directly to TTS)
  • Automatic sentence splitting for natural speech
  • Multiple Aura voices (male/female, US/UK/IE accents)
  • Multiple output formats (mp3, linear16, mulaw, opus, etc.)

Agent Experience (AX)

  • Error classification with 8 categories (transient, rate_limit, validation, auth, not_found, server, quota, unknown)
  • Automatic retry decisions based on error type
  • Actionable error suggestions for recovery
  • Integration with resilience.ProviderError from omnivoice-core

Installation

go get github.com/plexusone/omnivoice-deepgram

Quick Start

Speech-to-Text

import (
    deepgramstt "github.com/plexusone/omnivoice-deepgram/omnivoice/stt"
    "github.com/plexusone/omnivoice-core/stt"
)

provider, err := deepgramstt.New(deepgramstt.WithAPIKey("your-api-key"))
if err != nil {
    log.Fatal(err)
}

result, err := provider.TranscribeURL(ctx, "https://example.com/audio.mp3", stt.TranscriptionConfig{
    Model:    "nova-2",
    Language: "en-US",
})

Text-to-Speech

import (
    deepgramtts "github.com/plexusone/omnivoice-deepgram/omnivoice/tts"
    "github.com/plexusone/omnivoice-core/tts"
)

provider, err := deepgramtts.New(deepgramtts.WithAPIKey("your-api-key"))
if err != nil {
    log.Fatal(err)
}

result, err := provider.Synthesize(ctx, "Hello, world!", tts.SynthesisConfig{
    VoiceID:      "aura-asteria-en",
    OutputFormat: "mp3",
})

Requirements