Speech Generation API

Learn how to integrate Demeterics into your workflows with step-by-step guides and API examples.

Speech Generation API

Beta Access: The Speech API is currently in beta and available only to whitelisted users. Please contact support to request access.

The Demeterics Speech API provides a unified Text-to-Speech (TTS) interface across multiple providers. Convert text to natural-sounding audio with a single API while automatically tracking usage, costs, and storing generated audio for analysis.

Overview

Base URL: https://api.demeterics.com/tts/v1

Features:

  • Unified API: Single endpoint for OpenAI, ElevenLabs, Google Cloud TTS, and Murf.ai
  • Auto-tracking: Every request logged to BigQuery with full observability
  • Audio Storage: Generated audio stored in GCS with 15-minute signed URLs
  • BYOK Support: Use your own provider API keys with dual-key authentication
  • Cost Control: Automatic credit billing with 15% managed or 10% BYOK fee

Authentication

Managed Keys (Default)

Use only your Demeterics API key:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{...}'

Bring Your Own Key (BYOK)

Use the dual-key format to provide your own provider API key:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key;sk-your_openai_key" \
  -H "Content-Type: application/json" \
  -d '{...}'

The format is: [demeterics_api_key];[provider_api_key]

BYOK Benefits:

  • 10% service fee instead of 15%
  • Use your own rate limits and quotas
  • Provider costs billed directly to your account

Endpoints

Generate Speech

POST /tts/v1/generate

Convert text to speech audio.

Request Body:

Field Type Required Description
provider string Yes Target provider: openai, elevenlabs, google, murf
model string No TTS model (provider-specific)
voice string No Voice identifier
input string Yes Text to convert (max varies by provider)
format string No Output format: mp3, wav, opus, flac
speed float No Playback speed: 0.25-4.0 (default: 1.0)
language string No Language code (ISO 639-1)

Example Request:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "tts-1",
    "voice": "alloy",
    "input": "Hello, welcome to Demeterics!",
    "format": "mp3"
  }'

Response:

{
  "id": "01JARV4HZ6XPQMWVCS9N1GKEFD",
  "provider": "openai",
  "model": "tts-1",
  "voice": "alloy",
  "audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
  "duration_seconds": 2.3,
  "cost_usd": 0.00023,
  "usage": {
    "input_chars": 31
  },
  "metadata": {
    "format": "mp3",
    "sample_rate": 24000,
    "channels": 1,
    "generation_ms": 450
  }
}

List Voices

GET /tts/v1/voices?provider={provider}

List available voices for a provider.

Query Parameters:

Parameter Type Required Description
provider string Yes Provider: openai, elevenlabs, google, murf

Example Request:

curl -X GET "https://api.demeterics.com/tts/v1/voices?provider=openai" \
  -H "Authorization: Bearer dmt_your_api_key"

Response:

{
  "voices": [
    {
      "id": "alloy",
      "name": "Alloy",
      "description": "Neutral and balanced",
      "gender": "neutral"
    },
    {
      "id": "echo",
      "name": "Echo",
      "description": "Clear and articulate",
      "gender": "male"
    }
  ]
}

Providers

OpenAI

Models:

  • gpt-4o-mini-tts - Latest model with better steerability (~85% cheaper than ElevenLabs)
  • tts-1 - Fast and efficient (legacy)
  • tts-1-hd - Higher quality (legacy)

Voices:

  • alloy - Neutral and balanced
  • ash - Warm and conversational
  • ballad - Soft and melodic
  • coral - Friendly and approachable
  • echo - Clear and articulate
  • fable - Expressive and dynamic
  • onyx - Deep and authoritative
  • nova - Friendly and warm
  • sage - Calm and measured
  • shimmer - Bright and optimistic
  • verse - Dynamic and engaging

Supported Formats: mp3, opus, aac, flac, wav, pcm

Max Characters: 4,096

ElevenLabs

Models:

  • eleven_multilingual_v2 - Best quality, 29 languages
  • eleven_turbo_v2_5 - Fast, English-optimized
  • eleven_turbo_v2 - Previous fast model
  • eleven_monolingual_v1 - English only

Voices: Over 100 pre-made voices plus custom voice cloning

Supported Formats: mp3, pcm, ulaw

Max Characters: 5,000

Google Cloud TTS

Models:

  • standard - Basic quality
  • neural2 - Neural network based
  • wavenet - High quality WaveNet
  • journey - Conversational style
  • studio - Professional quality

Voices: 220+ voices across 40+ languages

Supported Formats: mp3, wav, ogg

Max Characters: 5,000

Murf.ai

Models:

  • GEN2 - Latest generation, highest quality
  • FALCON - Fast streaming model

Voices: 120+ voices across 20+ languages including:

  • en-US-natalie - Natalie (US English, female)
  • en-US-miles - Miles (US English, male)
  • en-US-julia - Julia (US English, female)
  • en-UK-iris - Iris (UK English, female)
  • es-ES-elena - Elena (Spanish, female)
  • fr-FR-claire - Claire (French, female)
  • de-DE-anna - Anna (German, female)

Supported Formats: mp3, wav, flac, ogg, pcm, alaw, ulaw

Max Characters: 10,000

Features:

  • Voice styles (conversational, newscast, etc.)
  • Speed and pitch control
  • Multi-language support with native locales

Pricing

Managed Keys

Character-based pricing with 15% service fee:

Provider Model Cost per 1M chars
OpenAI gpt-4o-mini-tts $0.69
OpenAI tts-1 $17.25
OpenAI tts-1-hd $34.50
ElevenLabs eleven_multilingual_v2 $345.00
ElevenLabs eleven_turbo_v2_5 $86.25
Google wavenet $18.40
Google neural2 $18.40
Google standard $4.60
Murf GEN2 $27.60
Murf FALCON $23.00

BYOK

10% service fee on top of provider costs. Provider costs billed directly to your account.

Error Handling

Error Response Format:

{
  "error": {
    "type": "invalid_request",
    "message": "Input text exceeds maximum length",
    "code": "text_too_long"
  }
}

Common Error Codes:

Code HTTP Status Description
invalid_provider 400 Unknown provider specified
invalid_voice 400 Voice not available for provider
text_too_long 400 Input exceeds provider limit
insufficient_credits 402 Not enough credits
provider_error 502 Provider API failed
rate_limited 429 Too many requests

Data Tracking

Every speech generation is automatically tracked in BigQuery with:

  • Transaction ID (ULID)
  • User and API key identifiers
  • Provider, model, and voice used
  • Input character count and text hash (privacy-safe)
  • Audio duration and format
  • GCS storage path
  • Cost breakdown (provider cost, service fee, total)
  • Latency metrics
  • Error information (if failed)

Query your speech generations:

SELECT
  transaction_id,
  provider,
  model,
  tts.voice,
  tts.input_chars,
  tts.duration_sec,
  total_cost
FROM `demeterics.demeterics.interactions`
WHERE interaction_type = 'tts'
  AND user_id = @user_id
  AND timing.question_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY timing.question_time DESC

SDK Support

Python

import requests

response = requests.post(
    "https://api.demeterics.com/tts/v1/generate",
    headers={"Authorization": "Bearer dmt_your_api_key"},
    json={
        "provider": "openai",
        "voice": "alloy",
        "input": "Hello, world!",
        "format": "mp3"
    }
)

audio_url = response.json()["audio_url"]

Node.js

const response = await fetch("https://api.demeterics.com/tts/v1/generate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer dmt_your_api_key",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    provider: "openai",
    voice: "alloy",
    input: "Hello, world!",
    format: "mp3"
  })
});

const { audio_url } = await response.json();

Best Practices

  1. Choose the right provider: OpenAI for speed, ElevenLabs for quality, Google for language coverage
  2. Cache audio: Store frequently-used audio locally to reduce API calls
  3. Use appropriate formats: MP3 for web, WAV for editing, Opus for streaming
  4. Monitor costs: Track usage in your Demeterics dashboard
  5. Handle errors gracefully: Implement retry logic with exponential backoff