blogccasion

The requestVideoFrameCallback API

There's a new Web API on the block, defined in the HTMLVideoElement.requestVideoFrameCallback() specification. The requestVideoFrameCallback() method allows web authors to register a callback, which runs in the rendering steps when a new video frame 🎞 is sent to the compositor. This is intended to allow developers to perform efficient per-video-frame operations on video, such as video processing and painting to a canvas, video analysis, or synchronization with external audio sources.

Operations like drawing a video frame to a canvas via drawImage() made through this API will be synchronized as a best effort with the video playing on screen. Different from window.requestAnimationFrame(), which usually fires about 60 times per second, requestVideoFrameCallback() is bound to the actual video frame rate—with an important exception:

The effective rate at which callbacks are run is the lesser rate between the video's rate and the browser's rate. This means a 25fps video playing in a browser that paints at 60Hz would fire callbacks at 25Hz. A 120fps video in that same 60Hz browser would fire callbacks at 60Hz.

Due to its similarity with window.requestAnimationFrame(), the API initially was proposed as video.requestAnimationFrame(), but I'm happy the new name requestVideoFrameCallback() was agreed on after a lengthy discussion. Yay, bikeshedding for the win 🙌!

The API is implemented in Chromium already, and Mozilla folks like it. For what it's worth, I have just filed a WebKit bug asking for it. Feature detection of the API works like this:

if ('requestVideoFrameCallback' in HTMLVideoElement.prototype) {
  // The API is supported! 🎉
}

I have created a small demo on Glitch that shows how frames are drawn on a canvas at exactly the frame rate of the video and where the frame metadata is logged for debugging purposes. The core logic is just a couple of lines of JavaScript. As a developer, the API's look and feel does indeed remind of requestAnimationFrame(), but as outlined above, it's still different in what it actually does.

let paintCount = 0;
let startTime = 0.0;

const updateCanvas = (now, metadata) => {
  if (startTime === 0.0) {
    startTime = now;
  }

  ctx.drawImage(video, 0, 0, canvas.width, canvas.height);

  const elapsed = (now - startTime) / 1000.0;
  const fps = (++paintCount / elapsed).toFixed(3);
  fpsInfo.innerText = `video fps: ${fps}`;
  metadataInfo.innerText = JSON.stringify(metadata, null, 2);

  video.requestVideoFrameCallback(updateCanvas);
};

video.requestVideoFrameCallback(updateCanvas);

I have done frame-level processing for a long time—without having access to the actual frames by approximating through video.currentTime. At one point during my PhD research, I implemented video shot segmentation in JavaScript, the demo is still up (click for a screenshot). This work was the topic of a research paper that was presented at the World Wide Web conference 2012 in Lyon, France in the Developers Track. Had the requestVideoFrameCallback() existed back then, my life would have been much simpler…