Automated Captioning

Overview

The Agent Board supports sophisticated automated captioning workflows, from importing industry-standard subtitle files to generating dynamic, word-highlighted "TikTok-style" overlays. This is powered by the @remotion/captions package and standard Remotion composition patterns.

Installation

To use captioning features, you must first install the @remotion/captions package:

npx remotion add @remotion/captions

1. Sourcing Captions

Captions must be converted into an array of Caption objects. You can achieve this via file import or audio transcription.

Importing SRT Files

If you already have a .srt file, use the utility provided by @remotion/captions to parse it into a compatible format:

import { parseSrt } from '@remotion/captions';
import { staticFile } from 'remotion';

const srtUrl = staticFile('subtitles.srt');
// Fetch and parse the SRT content
const captions = parseSrt(srtContent);

Audio Transcription

For automated workflows, audio files can be transcribed to generate the initial Caption tokens. Ensure your transcription output matches the Caption interface, which requires text, fromMs, and toMs for every token or sentence.

2. Processing for TikTok-Style Display

To create fast-paced, modern captions where only a few words appear at a time, use the createTikTokStyleCaptions utility. This groups individual tokens into "pages" based on a time threshold.

import { useMemo } from 'react';
import { createTikTokStyleCaptions } from '@remotion/captions';

const SWITCH_MS = 1200; // Adjust to control words-per-page

const { pages } = useMemo(() => {
  return createTikTokStyleCaptions({
    captions: myCaptions,
    combineTokensWithinMilliseconds: SWITCH_MS,
  });
}, [myCaptions]);

3. Rendering Captions with Sequences

To ensure captions are perfectly synced with the video timeline and performant during render, map the processed pages to Remotion <Sequence> components.

import { Sequence, useVideoConfig, AbsoluteFill } from 'remotion';

export const CaptionedVideo: React.FC = () => {
  const { fps } = useVideoConfig();

  return (
    <AbsoluteFill>
      {pages.map((page, index) => {
        const startFrame = (page.startMs / 1000) * fps;
        const durationInFrames = ((page.endMs - page.startMs) / 1000) * fps;

        return (
          <Sequence
            key={index}
            from={startFrame}
            durationInFrames={durationInFrames}
          >
            <CaptionPage page={page} />
          </Sequence>
        );
      })}
    </AbsoluteFill>
  );
};

4. Word-Level Highlighting

Inside the CaptionPage component, you can iterate through the tokens of a page to highlight the specific word currently being spoken. This is calculated by comparing the useCurrentFrame() time against the token's millisecond timestamps.

import { useCurrentFrame, useVideoConfig } from 'remotion';
import type { TikTokPage } from '@remotion/captions';

const HIGHLIGHT_COLOR = '#39E508';

const CaptionPage: React.FC<{ page: TikTokPage }> = ({ page }) => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();

  const currentTimeMs = (frame / fps) * 1000;
  const absoluteTimeMs = page.startMs + currentTimeMs;

  return (
    <div style={{ fontSize: 80, fontWeight: 'bold', textAlign: 'center' }}>
      {page.tokens.map((token, i) => {
        const isActive = 
          absoluteTimeMs >= token.fromMs && 
          absoluteTimeMs < token.toMs;

        return (
          <span
            key={i}
            style={{ color: isActive ? HIGHLIGHT_COLOR : 'white' }}
          >
            {token.text}{' '}
          </span>
        );
      })}
    </div>
  );
};

Best Practices

FPS Synchronization: Always use the fps from useVideoConfig() when converting milliseconds to frames to avoid sync drift.
Memoization: Use useMemo for parsing SRTs or creating caption pages, especially if your caption file is large, to prevent expensive re-calculations on every frame.
Layout: Set whiteSpace: 'pre-wrap' or similar CSS on your caption container to ensure tokens wrap correctly within the video frame.
Visual Polish: Consider using spring or interpolate from the Remotion core to animate the scale or position of the "active" token for a more dynamic look.