Automated Captioning
Overview
The Agent Board supports sophisticated automated captioning workflows, from importing industry-standard subtitle files to generating dynamic, word-highlighted "TikTok-style" overlays. This is powered by the @remotion/captions package and standard Remotion composition patterns.
Installation
To use captioning features, you must first install the @remotion/captions package:
npx remotion add @remotion/captions
1. Sourcing Captions
Captions must be converted into an array of Caption objects. You can achieve this via file import or audio transcription.
Importing SRT Files
If you already have a .srt file, use the utility provided by @remotion/captions to parse it into a compatible format:
import { parseSrt } from '@remotion/captions';
import { staticFile } from 'remotion';
const srtUrl = staticFile('subtitles.srt');
// Fetch and parse the SRT content
const captions = parseSrt(srtContent);
Audio Transcription
For automated workflows, audio files can be transcribed to generate the initial Caption tokens. Ensure your transcription output matches the Caption interface, which requires text, fromMs, and toMs for every token or sentence.
2. Processing for TikTok-Style Display
To create fast-paced, modern captions where only a few words appear at a time, use the createTikTokStyleCaptions utility. This groups individual tokens into "pages" based on a time threshold.
import { useMemo } from 'react';
import { createTikTokStyleCaptions } from '@remotion/captions';
const SWITCH_MS = 1200; // Adjust to control words-per-page
const { pages } = useMemo(() => {
return createTikTokStyleCaptions({
captions: myCaptions,
combineTokensWithinMilliseconds: SWITCH_MS,
});
}, [myCaptions]);
3. Rendering Captions with Sequences
To ensure captions are perfectly synced with the video timeline and performant during render, map the processed pages to Remotion <Sequence> components.
import { Sequence, useVideoConfig, AbsoluteFill } from 'remotion';
export const CaptionedVideo: React.FC = () => {
const { fps } = useVideoConfig();
return (
<AbsoluteFill>
{pages.map((page, index) => {
const startFrame = (page.startMs / 1000) * fps;
const durationInFrames = ((page.endMs - page.startMs) / 1000) * fps;
return (
<Sequence
key={index}
from={startFrame}
durationInFrames={durationInFrames}
>
<CaptionPage page={page} />
</Sequence>
);
})}
</AbsoluteFill>
);
};
4. Word-Level Highlighting
Inside the CaptionPage component, you can iterate through the tokens of a page to highlight the specific word currently being spoken. This is calculated by comparing the useCurrentFrame() time against the token's millisecond timestamps.
import { useCurrentFrame, useVideoConfig } from 'remotion';
import type { TikTokPage } from '@remotion/captions';
const HIGHLIGHT_COLOR = '#39E508';
const CaptionPage: React.FC<{ page: TikTokPage }> = ({ page }) => {
const frame = useCurrentFrame();
const { fps } = useVideoConfig();
const currentTimeMs = (frame / fps) * 1000;
const absoluteTimeMs = page.startMs + currentTimeMs;
return (
<div style={{ fontSize: 80, fontWeight: 'bold', textAlign: 'center' }}>
{page.tokens.map((token, i) => {
const isActive =
absoluteTimeMs >= token.fromMs &&
absoluteTimeMs < token.toMs;
return (
<span
key={i}
style={{ color: isActive ? HIGHLIGHT_COLOR : 'white' }}
>
{token.text}{' '}
</span>
);
})}
</div>
);
};
Best Practices
- FPS Synchronization: Always use the
fpsfromuseVideoConfig()when converting milliseconds to frames to avoid sync drift. - Memoization: Use
useMemofor parsing SRTs or creating caption pages, especially if your caption file is large, to prevent expensive re-calculations on every frame. - Layout: Set
whiteSpace: 'pre-wrap'or similar CSS on your caption container to ensure tokens wrap correctly within the video frame. - Visual Polish: Consider using
springorinterpolatefrom the Remotion core to animate the scale or position of the "active" token for a more dynamic look.