Audio Ingestion Overview

Veronese accepts audio through four channels. Each channel results in the same outcome: a new episode in your library, queued for transcription.

Channels at a glance

Channel	Best for
Web upload	Quick one-off files from your computer
YouTube / URL	Public videos, podcasts, web audio
Email	Mobile recordings, quick captures from any device
Telegram	Voice notes from mobile

The ingestion pipeline

Every episode goes through the same pipeline regardless of channel:

Source resolution — Download the remote file or retrieve the uploaded attachment.
Normalization — FFmpeg converts the audio to a clean WAV at a fixed sample rate.
Duration probe — ffprobe measures the audio length and stores it on the episode.
Billing check — Your available credit balance must be ≥ 300 seconds to proceed.
Transcription queue — A transcription job is enqueued for async processing.

Episode states

draft
  ↓
ingesting                ← source audio being downloaded/normalized
  ↓
ready_for_transcription
  ↓
transcribing             ← AI model processing the audio
  ↓
ready                    ← transcript available for editing

If anything goes wrong, the episode moves to failed and a notification is sent.

Supported audio formats

Veronese accepts any audio format that FFmpeg can decode, including:

MP3, M4A, AAC, OGG, FLAC, WAV, AIFF, OPUS
Video with audio tracks (MP4, MOV, WebM) — audio is extracted automatically
URLs pointing directly to audio files