Farsi Transcriber

Live Farsi Audio Transcription Tool

A web app that transcribes Farsi (Persian) audio files to text — deployed live and handling real-world file sizes through intelligent chunked processing.


Farsi Transcriber was built to solve a practical problem: accurately converting spoken Farsi audio into clean, readable text. The app leverages OpenAI's gpt-4o-transcribe model — one of the most capable multilingual speech recognition models available — and wraps it in a clean, accessible Streamlit interface that anyone can use without technical knowledge.

A core challenge with audio transcription is handling large files reliably. Rather than attempting to upload multi-hundred-megabyte files directly to the API, the app uses pydub to split audio into 200-second chunks and transcribes them sequentially, showing live progress as each segment completes. This makes the tool robust even for long recordings like lectures or interviews.


Technical Implementation

Architecture

  • UI Layer: Streamlit app managing multi-step session state (API setup → upload → transcription → results)
  • Core Logic: Dedicated api/transcriber.py module handling chunking and OpenAI calls
  • Audio Processing: pydub + FFmpeg for format conversion and segmentation
  • Deployment: Containerized via Docker, live at farsi-transcriber.jonahsaidian.com

Processing Pipeline

  • User uploads audio file via drag-and-drop
  • File loaded in a background thread with progress estimation
  • chunk_audio() splits into 200-second WAV segments
  • Each segment sent to OpenAI with language="fa"
  • Transcribed text accumulates live in the UI
  • Temp files cleaned up after each chunk

Features

  • File Upload

    Drag-and-drop support for MP3, MP4, MPEG, MPGA, M4A, WAV, and WEBM formats.

  • Real-Time Progress

    Live progress bar with time estimates and streaming transcription display as each chunk completes.

  • Text Export

    Download the full transcription as a .txt file with one click.

  • Chunked Processing

    Handles large audio files reliably by processing in 200-second segments — no file size bottlenecks.

  • Auto API Key

    Reads OPENAI_API_KEY from environment or .env, with a fallback UI input field.

  • Clean Reset

    One-click restart clears all state and the file uploader, ready for a new transcription.