Command-Line Interface¶

OmniVoice includes a CLI for speech-to-text transcription without writing code.

Installation¶

go install github.com/plexusone/omnivoice/cmd/omnivoice@latest

Commands¶

transcribe¶

Transcribe audio files to text.

omnivoice transcribe [flags] <audio-file>

Flags:

Flag	Short	Default	Description
`--provider`	`-p`	`deepgram`	STT provider (deepgram, openai, elevenlabs)
`--output`	`-o`	stdout	Output file path
`--format`	`-f`	`text`	Output format (text, json, srt, vtt)
`--language`	`-l`	`en-US`	Language code (BCP-47)
`--model`	`-m`		Provider-specific model
`--diarize`		`false`	Enable speaker diarization
`--timestamps`		`false`	Enable word timestamps
`--verbose`	`-v`	`false`	Verbose output
`--quiet`	`-q`	`false`	Suppress non-essential output

providers list¶

List available STT providers and their configuration status.

omnivoice providers list

Examples¶

Basic Transcription¶

export DEEPGRAM_API_KEY="your-api-key"

# Output to terminal
omnivoice transcribe podcast.mp3

# Save to file
omnivoice transcribe -o transcript.txt podcast.mp3

JSON Output (OmniVoice Transcript Format)¶

Get full metadata including timestamps, speakers, and confidence scores:

omnivoice transcribe -p deepgram --diarize --timestamps -f json -o transcript.json podcast.mp3

Output:

{
  "$schema": "https://omnivoice.dev/schema/transcript-v1.json",
  "version": "1.0",
  "text": "Hello and welcome to the podcast...",
  "language": "en-US",
  "duration_ms": 330000,
  "segments": [
    {
      "text": "Hello and welcome to the podcast.",
      "start_ms": 0,
      "end_ms": 2500,
      "speaker": "Speaker 1",
      "words": [
        {"text": "Hello", "start_ms": 0, "end_ms": 300}
      ]
    }
  ],
  "metadata": {
    "provider": "deepgram",
    "created_at": "2026-04-28T10:30:00Z",
    "audio_file": "podcast.mp3"
  }
}

Subtitle Generation¶

SRT (SubRip)¶

omnivoice transcribe -p deepgram -f srt -o subtitles.srt podcast.mp3

Output:

1
00:00:00,000 --> 00:00:02,500
Hello and welcome to the podcast.

2
00:00:02,800 --> 00:00:05,200
Today we're talking about AI.

WebVTT¶

omnivoice transcribe -p deepgram -f vtt -o subtitles.vtt podcast.mp3

Output:

WEBVTT

1
00:00:00.000 --> 00:00:02.500
Hello and welcome to the podcast.

2
00:00:02.800 --> 00:00:05.200
Today we're talking about AI.

Speaker Diarization¶

Identify different speakers in the audio:

omnivoice transcribe -p deepgram --diarize -f json -o meeting.json meeting.mp3

Using Different Providers¶

# Deepgram (default)
export DEEPGRAM_API_KEY="your-key"
omnivoice transcribe -p deepgram audio.mp3

# OpenAI Whisper
export OPENAI_API_KEY="your-key"
omnivoice transcribe -p openai audio.mp3

# ElevenLabs
export ELEVENLABS_API_KEY="your-key"
omnivoice transcribe -p elevenlabs audio.mp3

Check Provider Status¶

omnivoice providers list

Output:

PROVIDER    ENV VAR             CONFIGURED  FEATURES
--------    -------             ----------  --------
deepgram    DEEPGRAM_API_KEY    Yes         streaming, diarization, timestamps, punctuation
elevenlabs  ELEVENLABS_API_KEY  No          diarization, timestamps
openai      OPENAI_API_KEY      Yes         timestamps, punctuation

Output Formats¶

Format	Use Case	Features
`text`	Simple transcript	Plain text only
`json`	Programmatic access	Full metadata, timestamps, speakers, confidence
`srt`	Video subtitles	Timed text, widely supported
`vtt`	Web video	W3C standard, HTML5 video support

Supported Audio Formats¶

MP3
WAV
FLAC
OGG
M4A
WebM

Environment Variables¶

Variable	Provider	Required
`DEEPGRAM_API_KEY`	Deepgram	For Deepgram provider
`OPENAI_API_KEY`	OpenAI	For OpenAI provider
`ELEVENLABS_API_KEY`	ElevenLabs	For ElevenLabs provider