Video Pipeline API - ezForge AI

The Ember video pipeline turns a documentation page into a narrated screen-recording video. It chains four stages — script generation, HeyGen avatar rendering, Playwright screen recording, and FFmpeg compositing — and exposes them through two modules:

Module	Role
`compositor.ts`	FFmpeg wrapper that overlays an avatar as picture-in-picture on a screen recording and concatenates intro/outro clips
`orchestrator.ts`	End-to-end stage coordinator with retry, resume, and progress callbacks

Compositor

`CompositeOptions`

Passed to VideoCompositor.composite() to define a single compositing job.

Field	Type	Required	Default	Description
`avatarVideoPath`	`string`	Yes	—	Local file path or HTTP URL of the HeyGen-generated avatar MP4. FFmpeg handles both natively; no pre-download is needed.
`screenRecordingPath`	`string`	Yes	—	Absolute path to the Playwright-generated screen recording (video-only MP4).
`outputPath`	`string`	No	Auto-generated in `outputDir`	Absolute path for the composited output file. A UUID-based name is used when omitted.
`avatarPosition`	`'bottom-right' \| 'bottom-left' \| 'top-right' \| 'top-left'`	No	`'bottom-right'`	Which corner of the frame the avatar picture-in-picture occupies.
`avatarScale`	`number`	No	`0.25`	Avatar overlay size as a fraction of the screen width (greater than `0`, up to `1`). For example, `0.25` makes the avatar 25% as wide as the screen.
`introPath`	`string`	No	—	Path to an MP4 clip to prepend to the composited body. See Generating intro/outro clips.
`outroPath`	`string`	No	—	Path to an MP4 clip to append to the composited body. See Generating intro/outro clips.

Example:

import { VideoCompositor } from './video-pipeline/compositor.js';

const compositor = new VideoCompositor({ outputDir: '/tmp/my-videos' });

const result = await compositor.composite({
  avatarVideoPath: 'https://cdn.heygen.com/videos/abc123.mp4',
  screenRecordingPath: '/tmp/recordings/demo.mp4',
  avatarPosition: 'bottom-right',
  avatarScale: 0.25,
  introPath: '/assets/video-templates/intro.mp4',
  outroPath: '/assets/video-templates/outro.mp4',
});

console.log(result.outputPath);       // absolute path to the composited MP4
console.log(result.durationSeconds);  // total runtime in seconds

`CompositorConfig`

Passed to the VideoCompositor constructor to configure FFmpeg options and output location.

Field	Type	Required	Default
`outputDir`	`string`	Yes	—
`ffmpegPath`	`string`	No	`'ffmpeg'` (must be on `PATH`)
`ffprobePath`	`string`	No	`'ffprobe'` (must be on `PATH`)
`videoCodec`	`string`	No	`'libx264'`
`audioCodec`	`string`	No	`'aac'`
`crf`	`number`	No	`23`

Use createVideoCompositor() as a convenience factory; it reads FFMPEG_PATH and FFPROBE_PATH from the environment and defaults outputDir to '/tmp/ezforge-compositor' when omitted.

Orchestrator

`PipelineInput`

Passed to PipelineOrchestrator.run() to kick off a full pipeline run.

Field	Type	Required	Default	Description
`title`	`string`	Yes	—	Documentation page title. Used as the video title and passed to the script generator.
`content`	`string`	Yes	—	Documentation page body (Markdown or plain text). The script generator uses this to write the narration.
`recordingSteps`	`RecordingStep[]`	Yes	—	Playwright recording steps for the screen recording stage. See the screen recorder docs for the step schema.
`avatarId`	`string`	No	`OrchestratorConfig.defaultAvatarId`	HeyGen avatar ID to use. Required if not set in `OrchestratorConfig`.
`voiceId`	`string`	No	Avatar default	HeyGen voice ID. Omit to use the avatar’s built-in voice.
`targetDurationSeconds`	`number`	No	`120`	Hint to the script generator for how long the narration should run.
`audience`	`string`	No	—	Free-form audience description passed to the script generator (e.g., `'senior backend engineers'`).
`introPath`	`string`	No	—	Forwarded directly to `CompositeOptions.introPath`.
`outroPath`	`string`	No	—	Forwarded directly to `CompositeOptions.outroPath`.

`PipelineResult`

Returned by PipelineOrchestrator.run() on success.

Field	Type	Description
`pipelineId`	`string`	Unique ID for this pipeline run.
`outputVideoPath`	`string`	Absolute path to the final composited MP4.
`durationSeconds`	`number`	Total video runtime in seconds.
`script`	`GeneratedScript`	The script object produced by the script-generation stage.
`screenRecordingPath`	`string`	Absolute path to the raw screen recording.
`avatarVideoUrl`	`string`	HeyGen CDN URL for the avatar-only video.

`stateListener` callback

The stateListener field in OrchestratorConfig is an optional callback that fires on every stage transition — pending → running → completed/failed. Use it to stream progress updates to a client, log pipeline state, or persist the PipelineState for crash-recovery.

import * as fs from 'fs';
import {
  createPipelineOrchestrator,
  PipelineState,
} from './video-pipeline/orchestrator.js';

const orchestrator = await createPipelineOrchestrator({
  pipeline: { /* ... */ },
  stateListener: (state: PipelineState) => {
    // `state` is a deep clone — safe to mutate or serialise
    const { stages } = state;
    console.log(
      `[${state.id}] script=${stages.script.status}` +
      ` avatar=${stages.avatar.status}` +
      ` screen=${stages.screen.status}` +
      ` composite=${stages.composite.status}`,
    );

    // Persist for crash-recovery (PipelineState is plain JSON)
    fs.writeFileSync(`/tmp/pipeline-${state.id}.json`, JSON.stringify(state));
  },
});

Stage lifecycle:

pending → running → completed
                 └→ failed  (retried up to maxAttempts times)

Each StageState object carries:

Field	Type	Description
`status`	`StageStatus`	`'pending'`, `'running'`, `'completed'`, or `'failed'`
`result`	`T \| undefined`	Stage output (populated on `completed`)
`error`	`string \| undefined`	Last error message (populated on `failed`)
`attempts`	`number`	Number of attempts made so far
`startedAt`	`string \| undefined`	ISO-8601 timestamp when the current attempt started
`completedAt`	`string \| undefined`	ISO-8601 timestamp when the stage completed successfully

Resuming a failed run

PipelineState is deliberately plain JSON — serialise it to disk or a database, then pass it back as the initial argument to skip already-completed stages:

let savedState: PipelineState | undefined;

const orchestrator = await createPipelineOrchestrator({
  pipeline: { /* ... */ },
  stateListener: (state: PipelineState) => { savedState = state; },
});

// First run (may crash mid-pipeline)
await orchestrator.run(input).catch(() => {});

// Resume — completed stages are skipped automatically
const result = await orchestrator.run(input, savedState);

Generating intro/outro clips

The assets/video-templates/ directory ships two FFmpeg filter templates for building standardised intro and outro clips from static image assets.

File	Effect	Duration
`intro-filter.txt`	Fade in from black (0.5 s), hold, fade to black (0.5 s)	3 s
`outro-filter.txt`	Fade in from black (0.5 s), hold, fade to black (1 s)	4 s

Generating the intro clip

Requires a title_card.png at 1280×720 resolution.

ffmpeg -loop 1 -t 3 -i assets/video-templates/title_card.png \
       -f lavfi -t 3 -i "anullsrc=r=44100:cl=stereo" \
       -filter_complex "$(cat assets/video-templates/intro-filter.txt)" \
       -map "[v_intro]" -map "[a_intro]" \
       -c:v libx264 -c:a aac -pix_fmt yuv420p \
       assets/video-templates/intro.mp4

Generating the outro clip

Requires an end_card.png at 1280×720 resolution.

ffmpeg -loop 1 -t 4 -i assets/video-templates/end_card.png \
       -f lavfi -t 4 -i "anullsrc=r=44100:cl=stereo" \
       -filter_complex "$(cat assets/video-templates/outro-filter.txt)" \
       -map "[v_outro]" -map "[a_outro]" \
       -c:v libx264 -c:a aac -pix_fmt yuv420p \
       assets/video-templates/outro.mp4

Once generated, pass the output paths to PipelineInput.introPath / PipelineInput.outroPath (or directly to CompositeOptions.introPath / CompositeOptions.outroPath).

The filter template files are reference assets — they do not ship pre-built MP4 clips. You must run the FFmpeg commands above (substituting your own image assets) before the intro/outro feature can be used in a pipeline run.

Scene Timeline Protocol

The Scene Timeline Protocol (Phase 2) lets the script generator produce a multi-scene narration that drives both the audio synthesis and screen-recording stages. Instead of a hardcoded sequence of recording steps, the generator embeds [[scene ...]] markers directly in the narration text. For the full specification, see executive/specifications/video-pipeline-scene-markers.md.

Marker grammar

Markers are inline in the narration. Canonical form:

[[scene id="hero" src="web:https://ezforge.ai/" duration="15s"]]
Narration text for the hero scene goes here.
[[scene id="features" src="web:https://ezforge.ai/platform" wordBudget="60"]]
Narration for the features scene...
[[scene id="closeout" src="image:mintlify-docs/images/cta.png" duration="5s"]]
Final words.

Required attributes

Attribute	Description
`id`	Kebab-case, unique within the timeline (e.g. `hero`, `feature-overview`). Used in logs and resume keys.
`src`	SceneSource URI of the form `<adapter>:<target>`. See Adapters below.

Optional attributes

Attribute	Description
`duration`	Explicit duration. Accepts `30s`, `1500ms`, `1m30s`. Wins over `wordBudget` when both are present.
`wordBudget`	Integer word count expected during this scene. Resolver uses it when `duration` is absent.
`transition`	One of `cut` (default), `crossfade`, `fade-to-black`.
`hold`	Unquoted boolean. Declares the source is expected to finish before the scene duration ends (e.g. a fast navigation). Suppresses the under-run warning; rendering behaviour is unchanged.

Each marker begins a new scene. The narration that follows belongs to that scene until the next marker or end of script. At least one of duration or wordBudget must be present on every scene.

Adapters

Adapter id	URI form	Notes
`web`	`web:<absolute-url>`	Playwright browser recording navigating to the given URL.
`image`	`image:<local-path>`	Recorder holds the last displayed frame for the scene duration (wait-only step). `ImageSceneSource.render()` is Phase 3 infrastructure.

The orchestrator converts web: scenes to [navigate, wait] recording steps for the existing ScreenRecorder. image: scenes produce wait-only recording steps — the recorder holds the last displayed frame for the scene duration. ImageSceneSource.render() (ffmpeg) is Phase 3 infrastructure not yet wired into the orchestrator. Both adapters pass the shared contract test.

Timeline lifecycle

LLM output (narration with [[scene ...]] markers)
    │
    ▼ parseTimeline()
Timeline { scenes: Scene[], narration: string }
    │
    ▼ validateTimeline()
    │
    ▼ resolveTimeline()
ResolvedTimeline { scenes: ResolvedScene[] }  ← each scene has explicit durationMs
    │
    ▼ stripMarkers()
cleanNarration ──► HeyGen audio synthesis
    │
    └──► resolved scenes ──► ScreenRecorder / ImageSceneSource ──► compositor

Stripping always happens before the narration reaches HeyGen. If any [[scene substring remains after stripping, the orchestrator throws and refuses to send the narration (§1.7 hard gate).

Back-compatibility

If the LLM emits no markers (or parsing fails), GeneratedScript.timeline is undefined and the orchestrator falls back to PipelineInput.recordingSteps — byte-identical to pre-Phase-2 behaviour. MARKETING_INTRO_RECORDING_STEPS and the isIntroDoc branch in run-pipeline.ts are retained as a known-good fallback for the introduction video specifically.

Duration resolver

Each scene’s duration in milliseconds is computed as follows:

if scene.duration is set:
    durationMs = parseDurationString(scene.duration)
else:
    durationMs = round(wordBudget / 130 * 60_000)   // 130 wpm default

speechMs = round(actualWords / 130 * 60_000)
if speechMs > durationMs:
    durationMs = speechMs    // never truncate speech
    overrunFlag = true

The resolver is pure (no I/O) and is the single source of truth for scene durations.

Adapter contract

Every SceneSource implementation must satisfy the following interface:

interface SceneSource {
  readonly id: string;
  supports(uri: string): boolean;
  prepare(scene: ResolvedScene, ctx: RenderContext): Promise<PreparedScene>;
  render(prepared: PreparedScene, sink: FrameSink): Promise<RenderResult>;
  dispose(prepared: PreparedScene): Promise<void>;
}

The shared contract test in tests/video-pipeline/scene-sources/contract.test.ts is instantiated for each adapter and verifies the prepare → render → dispose lifecycle.

​Compositor

​CompositeOptions

​CompositorConfig

​Orchestrator

​PipelineInput

​PipelineResult

​stateListener callback

​Resuming a failed run

​Generating intro/outro clips

​Generating the intro clip

​Generating the outro clip

​Scene Timeline Protocol

​Marker grammar

​Required attributes

​Optional attributes

​Adapters

​Timeline lifecycle

​Back-compatibility

​Duration resolver

​Adapter contract

Compositor

`CompositeOptions`

`CompositorConfig`

Orchestrator

`PipelineInput`

`PipelineResult`

`stateListener` callback

Resuming a failed run

Generating intro/outro clips

Generating the intro clip

Generating the outro clip

Scene Timeline Protocol

Marker grammar

Required attributes

Optional attributes

Adapters

Timeline lifecycle

Back-compatibility

Duration resolver

Adapter contract