The Ember video pipeline turns a documentation page into a narrated screen-recording video.
It chains four stages — script generation, HeyGen avatar rendering, Playwright screen recording,
and FFmpeg compositing — and exposes them through two modules:
| Module | Role |
|---|
compositor.ts | FFmpeg wrapper that overlays an avatar as picture-in-picture on a screen recording and concatenates intro/outro clips |
orchestrator.ts | End-to-end stage coordinator with retry, resume, and progress callbacks |
Compositor
CompositeOptions
Passed to VideoCompositor.composite() to define a single compositing job.
| Field | Type | Required | Default | Description |
|---|
avatarVideoPath | string | Yes | — | Local file path or HTTP URL of the HeyGen-generated avatar MP4. FFmpeg handles both natively; no pre-download is needed. |
screenRecordingPath | string | Yes | — | Absolute path to the Playwright-generated screen recording (video-only MP4). |
outputPath | string | No | Auto-generated in outputDir | Absolute path for the composited output file. A UUID-based name is used when omitted. |
avatarPosition | 'bottom-right' | 'bottom-left' | 'top-right' | 'top-left' | No | 'bottom-right' | Which corner of the frame the avatar picture-in-picture occupies. |
avatarScale | number | No | 0.25 | Avatar overlay size as a fraction of the screen width (greater than 0, up to 1). For example, 0.25 makes the avatar 25% as wide as the screen. |
introPath | string | No | — | Path to an MP4 clip to prepend to the composited body. See Generating intro/outro clips. |
outroPath | string | No | — | Path to an MP4 clip to append to the composited body. See Generating intro/outro clips. |
Example:
import { VideoCompositor } from './video-pipeline/compositor.js';
const compositor = new VideoCompositor({ outputDir: '/tmp/my-videos' });
const result = await compositor.composite({
avatarVideoPath: 'https://cdn.heygen.com/videos/abc123.mp4',
screenRecordingPath: '/tmp/recordings/demo.mp4',
avatarPosition: 'bottom-right',
avatarScale: 0.25,
introPath: '/assets/video-templates/intro.mp4',
outroPath: '/assets/video-templates/outro.mp4',
});
console.log(result.outputPath); // absolute path to the composited MP4
console.log(result.durationSeconds); // total runtime in seconds
CompositorConfig
Passed to the VideoCompositor constructor to configure FFmpeg options and output location.
| Field | Type | Required | Default |
|---|
outputDir | string | Yes | — |
ffmpegPath | string | No | 'ffmpeg' (must be on PATH) |
ffprobePath | string | No | 'ffprobe' (must be on PATH) |
videoCodec | string | No | 'libx264' |
audioCodec | string | No | 'aac' |
crf | number | No | 23 |
Use createVideoCompositor() as a convenience factory; it reads FFMPEG_PATH and FFPROBE_PATH
from the environment and defaults outputDir to '/tmp/ezforge-compositor' when omitted.
Orchestrator
Passed to PipelineOrchestrator.run() to kick off a full pipeline run.
| Field | Type | Required | Default | Description |
|---|
title | string | Yes | — | Documentation page title. Used as the video title and passed to the script generator. |
content | string | Yes | — | Documentation page body (Markdown or plain text). The script generator uses this to write the narration. |
recordingSteps | RecordingStep[] | Yes | — | Playwright recording steps for the screen recording stage. See the screen recorder docs for the step schema. |
avatarId | string | No | OrchestratorConfig.defaultAvatarId | HeyGen avatar ID to use. Required if not set in OrchestratorConfig. |
voiceId | string | No | Avatar default | HeyGen voice ID. Omit to use the avatar’s built-in voice. |
targetDurationSeconds | number | No | 120 | Hint to the script generator for how long the narration should run. |
audience | string | No | — | Free-form audience description passed to the script generator (e.g., 'senior backend engineers'). |
introPath | string | No | — | Forwarded directly to CompositeOptions.introPath. |
outroPath | string | No | — | Forwarded directly to CompositeOptions.outroPath. |
PipelineResult
Returned by PipelineOrchestrator.run() on success.
| Field | Type | Description |
|---|
pipelineId | string | Unique ID for this pipeline run. |
outputVideoPath | string | Absolute path to the final composited MP4. |
durationSeconds | number | Total video runtime in seconds. |
script | GeneratedScript | The script object produced by the script-generation stage. |
screenRecordingPath | string | Absolute path to the raw screen recording. |
avatarVideoUrl | string | HeyGen CDN URL for the avatar-only video. |
stateListener callback
The stateListener field in OrchestratorConfig is an optional callback that fires on every
stage transition — pending → running → completed/failed. Use it to stream progress updates to
a client, log pipeline state, or persist the PipelineState for crash-recovery.
import * as fs from 'fs';
import {
createPipelineOrchestrator,
PipelineState,
} from './video-pipeline/orchestrator.js';
const orchestrator = await createPipelineOrchestrator({
pipeline: { /* ... */ },
stateListener: (state: PipelineState) => {
// `state` is a deep clone — safe to mutate or serialise
const { stages } = state;
console.log(
`[${state.id}] script=${stages.script.status}` +
` avatar=${stages.avatar.status}` +
` screen=${stages.screen.status}` +
` composite=${stages.composite.status}`,
);
// Persist for crash-recovery (PipelineState is plain JSON)
fs.writeFileSync(`/tmp/pipeline-${state.id}.json`, JSON.stringify(state));
},
});
Stage lifecycle:
pending → running → completed
└→ failed (retried up to maxAttempts times)
Each StageState object carries:
| Field | Type | Description |
|---|
status | StageStatus | 'pending', 'running', 'completed', or 'failed' |
result | T | undefined | Stage output (populated on completed) |
error | string | undefined | Last error message (populated on failed) |
attempts | number | Number of attempts made so far |
startedAt | string | undefined | ISO-8601 timestamp when the current attempt started |
completedAt | string | undefined | ISO-8601 timestamp when the stage completed successfully |
Resuming a failed run
PipelineState is deliberately plain JSON — serialise it to disk or a database, then pass it
back as the initial argument to skip already-completed stages:
let savedState: PipelineState | undefined;
const orchestrator = await createPipelineOrchestrator({
pipeline: { /* ... */ },
stateListener: (state: PipelineState) => { savedState = state; },
});
// First run (may crash mid-pipeline)
await orchestrator.run(input).catch(() => {});
// Resume — completed stages are skipped automatically
const result = await orchestrator.run(input, savedState);
Generating intro/outro clips
The assets/video-templates/ directory ships two FFmpeg filter templates for building
standardised intro and outro clips from static image assets.
| File | Effect | Duration |
|---|
intro-filter.txt | Fade in from black (0.5 s), hold, fade to black (0.5 s) | 3 s |
outro-filter.txt | Fade in from black (0.5 s), hold, fade to black (1 s) | 4 s |
Generating the intro clip
Requires a title_card.png at 1280×720 resolution.
ffmpeg -loop 1 -t 3 -i assets/video-templates/title_card.png \
-f lavfi -t 3 -i "anullsrc=r=44100:cl=stereo" \
-filter_complex "$(cat assets/video-templates/intro-filter.txt)" \
-map "[v_intro]" -map "[a_intro]" \
-c:v libx264 -c:a aac -pix_fmt yuv420p \
assets/video-templates/intro.mp4
Generating the outro clip
Requires an end_card.png at 1280×720 resolution.
ffmpeg -loop 1 -t 4 -i assets/video-templates/end_card.png \
-f lavfi -t 4 -i "anullsrc=r=44100:cl=stereo" \
-filter_complex "$(cat assets/video-templates/outro-filter.txt)" \
-map "[v_outro]" -map "[a_outro]" \
-c:v libx264 -c:a aac -pix_fmt yuv420p \
assets/video-templates/outro.mp4
Once generated, pass the output paths to PipelineInput.introPath / PipelineInput.outroPath
(or directly to CompositeOptions.introPath / CompositeOptions.outroPath).
The filter template files are reference assets — they do not ship pre-built MP4 clips.
You must run the FFmpeg commands above (substituting your own image assets) before the
intro/outro feature can be used in a pipeline run.
Scene Timeline Protocol
The Scene Timeline Protocol (Phase 2) lets the script generator produce a multi-scene narration
that drives both the audio synthesis and screen-recording stages. Instead of a hardcoded sequence
of recording steps, the generator embeds [[scene ...]] markers directly in the narration text.
For the full specification, see executive/specifications/video-pipeline-scene-markers.md.
Marker grammar
Markers are inline in the narration. Canonical form:
[[scene id="hero" src="web:https://ezforge.ai/" duration="15s"]]
Narration text for the hero scene goes here.
[[scene id="features" src="web:https://ezforge.ai/platform" wordBudget="60"]]
Narration for the features scene...
[[scene id="closeout" src="image:mintlify-docs/images/cta.png" duration="5s"]]
Final words.
Required attributes
| Attribute | Description |
|---|
id | Kebab-case, unique within the timeline (e.g. hero, feature-overview). Used in logs and resume keys. |
src | SceneSource URI of the form <adapter>:<target>. See Adapters below. |
Optional attributes
| Attribute | Description |
|---|
duration | Explicit duration. Accepts 30s, 1500ms, 1m30s. Wins over wordBudget when both are present. |
wordBudget | Integer word count expected during this scene. Resolver uses it when duration is absent. |
transition | One of cut (default), crossfade, fade-to-black. |
hold | Unquoted boolean. Declares the source is expected to finish before the scene duration ends (e.g. a fast navigation). Suppresses the under-run warning; rendering behaviour is unchanged. |
Each marker begins a new scene. The narration that follows belongs to that scene until the next
marker or end of script. At least one of duration or wordBudget must be present on every scene.
Adapters
| Adapter id | URI form | Notes |
|---|
web | web:<absolute-url> | Playwright browser recording navigating to the given URL. |
image | image:<local-path> | Recorder holds the last displayed frame for the scene duration (wait-only step). ImageSceneSource.render() is Phase 3 infrastructure. |
The orchestrator converts web: scenes to [navigate, wait] recording steps for the existing
ScreenRecorder. image: scenes produce wait-only recording steps — the recorder holds the last
displayed frame for the scene duration. ImageSceneSource.render() (ffmpeg) is Phase 3
infrastructure not yet wired into the orchestrator. Both adapters pass the
shared contract test.
Timeline lifecycle
LLM output (narration with [[scene ...]] markers)
│
▼ parseTimeline()
Timeline { scenes: Scene[], narration: string }
│
▼ validateTimeline()
│
▼ resolveTimeline()
ResolvedTimeline { scenes: ResolvedScene[] } ← each scene has explicit durationMs
│
▼ stripMarkers()
cleanNarration ──► HeyGen audio synthesis
│
└──► resolved scenes ──► ScreenRecorder / ImageSceneSource ──► compositor
Stripping always happens before the narration reaches HeyGen. If any [[scene substring
remains after stripping, the orchestrator throws and refuses to send the narration (§1.7 hard gate).
Back-compatibility
If the LLM emits no markers (or parsing fails), GeneratedScript.timeline is undefined and the
orchestrator falls back to PipelineInput.recordingSteps — byte-identical to pre-Phase-2 behaviour.
MARKETING_INTRO_RECORDING_STEPS and the isIntroDoc branch in run-pipeline.ts are retained as
a known-good fallback for the introduction video specifically.
Duration resolver
Each scene’s duration in milliseconds is computed as follows:
if scene.duration is set:
durationMs = parseDurationString(scene.duration)
else:
durationMs = round(wordBudget / 130 * 60_000) // 130 wpm default
speechMs = round(actualWords / 130 * 60_000)
if speechMs > durationMs:
durationMs = speechMs // never truncate speech
overrunFlag = true
The resolver is pure (no I/O) and is the single source of truth for scene durations.
Adapter contract
Every SceneSource implementation must satisfy the following interface:
interface SceneSource {
readonly id: string;
supports(uri: string): boolean;
prepare(scene: ResolvedScene, ctx: RenderContext): Promise<PreparedScene>;
render(prepared: PreparedScene, sink: FrameSink): Promise<RenderResult>;
dispose(prepared: PreparedScene): Promise<void>;
}
The shared contract test in tests/video-pipeline/scene-sources/contract.test.ts is instantiated
for each adapter and verifies the prepare → render → dispose lifecycle.