Skip to content

Audio Ingestion Overview

Veronese accepts audio through four channels. Each channel results in the same outcome: a new episode in your library, queued for transcription.

ChannelBest for
Web uploadQuick one-off files from your computer
YouTube / URLPublic videos, podcasts, web audio
EmailMobile recordings, quick captures from any device
TelegramVoice notes from mobile

Every episode goes through the same pipeline regardless of channel:

  1. Source resolution — Download the remote file or retrieve the uploaded attachment.
  2. Normalization — FFmpeg converts the audio to a clean WAV at a fixed sample rate.
  3. Duration probeffprobe measures the audio length and stores it on the episode.
  4. Billing check — Your available credit balance must be ≥ 300 seconds to proceed.
  5. Transcription queue — A transcription job is enqueued for async processing.
draft
ingesting ← source audio being downloaded/normalized
ready_for_transcription
transcribing ← AI model processing the audio
ready ← transcript available for editing

If anything goes wrong, the episode moves to failed and a notification is sent.

Veronese accepts any audio format that FFmpeg can decode, including:

  • MP3, M4A, AAC, OGG, FLAC, WAV, AIFF, OPUS
  • Video with audio tracks (MP4, MOV, WebM) — audio is extracted automatically
  • URLs pointing directly to audio files