Skip to content

Release Notes: v0.9.0

Release Date: 2026-05-02

Highlights

  • Canonical Transcript Format: Standardized JSON output for STT transcription results
  • DurationMilliseconds Type: Durations serialize as integer milliseconds for JSON interoperability
  • Embedded JSON Schema: Validate transcripts with the embedded JSON Schema v1

New Features

Canonical Transcript Format

The stt package now includes a canonical Transcript type for consistent output across all STT providers:

import "github.com/plexusone/omnivoice-core/stt"

// Convert transcription result to canonical format
transcript := stt.NewTranscript(result, "deepgram", "nova-2", "audio.mp3", config)

// Save as JSON
err := transcript.SaveJSON("output.transcript.json")

// Load from JSON
loaded, err := stt.LoadTranscript("output.transcript.json")

Transcript Structure

The transcript format includes:

Field Description
$schema JSON Schema URL for validation
version Format version (currently "1.0")
text Complete transcription text
language BCP-47 language code (e.g., "en-US")
duration_ms Audio duration in milliseconds
segments Array of transcript segments
metadata Provider, model, and options used

Segment and Word Timing

// Access segment timing
for _, seg := range transcript.Segments {
    fmt.Printf("Segment: %s (%.1fs - %.1fs)\n",
        seg.Text,
        seg.Start.Duration().Seconds(),
        seg.End.Duration().Seconds())

    // Word-level timing (if enabled)
    for _, word := range seg.Words {
        fmt.Printf("  %s: %dms\n", word.Text, word.Start.Milliseconds())
    }
}

DurationMilliseconds Type

All duration fields use duration.DurationMilliseconds from github.com/grokify/mogo:

  • Go semantics: Full time.Duration functionality via .Duration() method
  • JSON interop: Serializes as integer milliseconds (not nanoseconds)
  • Type safety: Distinct type prevents mixing with raw integers
import "github.com/grokify/mogo/time/duration"

// Create from time.Duration
d := duration.FromDuration(5 * time.Second)

// Create from milliseconds
d := duration.FromMilliseconds(5000)

// Access as time.Duration
td := d.Duration()

// JSON serialization
data, _ := json.Marshal(d) // -> "5000"

Embedded JSON Schema

The schema package provides embedded JSON Schema for validation:

import "github.com/plexusone/omnivoice-core/schema"

// Get the transcript schema
schemaJSON := schema.TranscriptV1Schema

// Use with any JSON Schema validator
validator := jsonschema.MustCompile(schemaJSON)

JSON Format

Example transcript JSON:

{
  "$schema": "https://omnivoice.dev/schema/transcript-v1.json",
  "version": "1.0",
  "text": "Hello world",
  "language": "en-US",
  "duration_ms": 5000,
  "segments": [
    {
      "text": "Hello world",
      "start_ms": 0,
      "end_ms": 2500,
      "speaker": "speaker_1",
      "confidence": 0.98,
      "words": [
        {
          "text": "Hello",
          "start_ms": 0,
          "end_ms": 1000,
          "confidence": 0.99
        },
        {
          "text": "world",
          "start_ms": 1200,
          "end_ms": 2500,
          "confidence": 0.97
        }
      ]
    }
  ],
  "metadata": {
    "provider": "deepgram",
    "model": "nova-2",
    "created_at": "2026-05-02T12:00:00Z",
    "audio_file": "audio.mp3",
    "options": {
      "enable_punctuation": true,
      "enable_word_timestamps": true,
      "enable_speaker_diarization": true
    }
  }
}

API Reference

Types

Type Description
Transcript Complete transcription with metadata
TranscriptSegment Segment (sentence/phrase) with timing
TranscriptWord Word with timing and confidence
TranscriptMetadata Provider and options provenance
TranscriptOptions Transcription options record

Functions

Function Description
NewTranscript(result, provider, model, audioFile, config) Create Transcript from TranscriptionResult
LoadTranscript(filePath) Load Transcript from JSON file
(t *Transcript) ToJSON() Serialize to JSON bytes
(t *Transcript) SaveJSON(filePath) Save to JSON file
(t *Transcript) TotalDuration() Get duration as time.Duration
(s *TranscriptSegment) SegmentDuration() Get segment duration
(w *TranscriptWord) WordDuration() Get word duration

Constants

const TranscriptFormatVersion = "1.0"
const TranscriptSchemaURL = "https://omnivoice.dev/schema/transcript-v1.json"

Installation

go get github.com/plexusone/omnivoice-core@v0.9.0

Migration Guide

From v0.8.0

No breaking changes. To use the new Transcript format:

  1. Update dependency:
go get github.com/plexusone/omnivoice-core@v0.9.0
  1. Convert transcription results to canonical format:
transcript := stt.NewTranscript(result, "provider", "model", "file.mp3", config)
  1. Use JSON serialization for storage or interop:
// Save
err := transcript.SaveJSON("transcript.json")

// Load
loaded, err := stt.LoadTranscript("transcript.json")

Full Changelog

See CHANGELOG.md for the complete list of changes.