Skip to content

Podcast Maker Implementation Overview

Podcast Maker orchestrates a multi-stage content pipeline: project configuration, research grounding, script composition, media rendering, and publish-state tracking.

Architecture & Data Flow

flowchart LR
    UI[Podcast Maker UI]
    API[Podcast API Router]
    PROJ[Project Service]
    RESEARCH[Research Handler]
    SCRIPT[Script Handler]
    RENDER[Audio/Video Render Handlers]
    STORE[(Podcast Tables)]
    JOBS[(Render Queue)]

    UI --> API
    API --> PROJ
    API --> RESEARCH
    API --> SCRIPT
    API --> RENDER

    PROJ --> STORE
    RESEARCH --> STORE
    SCRIPT --> STORE
    RENDER --> JOBS
    RENDER --> STORE

    JOBS --> UI
    STORE --> UI

Podcast Maker is split into:

  • Frontend orchestration service: frontend/src/services/podcastApi.ts
  • Coordinates step flow (analysis → research → script → audio/video)
  • Runs preflight checks before expensive calls
  • Maps API payloads into UI-friendly objects
  • Backend podcast handlers: backend/api/podcast/handlers/*.py
  • Route-level APIs for analysis, research, script, media, and projects
  • Authenticated operations with user-scoped media/project data

Frontend orchestration responsibilities

Primary responsibilities in podcastApi.ts:

  • Create project analysis payloads and map response into Podcast Analysis UI data.
  • Build/validate research query payloads for Exa research route.
  • Generate script scenes and normalize scene/line structure for editor state.
  • Render per-scene audio and combine scenes into final audio.
  • Trigger scene image and video generation workflows.
  • Persist project state via project CRUD endpoints.

Backend handler modules

  • analysis.py: idea enhancement, analysis, regenerate-queries.
  • research.py: Exa research endpoint.
  • script.py: script generation and scene approval.
  • audio.py: audio upload, generation, combine, serving audio files.
  • images.py: scene image generation and image serving.
  • video.py: scene video generation, video listing/serving, combine videos.
  • avatar.py: avatar upload, avatar generation, avatar cleanup/presentability.
  • projects.py: create, get, update, list, delete, favorite project records.
  • dubbing.py: dubbing/voice clone lifecycle endpoints (currently backend-available).

Data models (functional view)

At feature level, the flow revolves around:

  • Project metadata: project_id, idea, duration, speakers, budget and status fields.
  • Analysis output: audience, content type, keywords, outlines, title suggestions.
  • Research output: source list, summarized insights, fact cards for script grounding.
  • Script output: scenes with IDs, durations, emotions, and speaker lines.
  • Media output: audio files, scene images, scene videos, combined episode artifacts.

Operational notes

  • Preflight checks are used to fail fast on plan/credit constraints.
  • Some operations are synchronous (analysis/script/audio/image), while video is async task-based.
  • Client-side task polling is used for long-running jobs.

Engineering references

  • docs/Podcast_maker/AI_PODCAST_BACKEND_REFERENCE.md
  • docs/Podcast_maker/PODCAST_API_CALL_ANALYSIS.md
  • docs/Podcast_maker/PODCAST_PLAN_COMPLETION_STATUS.md