.. SPDX-License-Identifier: CC-BY-SA-4.0

Introduction
============

.. note::

   This document assumes familiarity of the reader with the concepts of the
   traditional `V4L2 API`_, excluding the Media Controller extensions.

.. _V4L2 API: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/v4l2.html


The V4L2 history
----------------

When the original Video4Linux (V4L) API was created in 1999, the video capture
devices available for Linux were mostly analog TV capture cards and early
webcams (the first widespread USB webcam hit the market about a year later).
From the point of view of the operating system, devices provided streams of
frames ready to be consumed by applications, with a small set of high-level
parameters to control the frame size or modify the image brightness and
contrast.

Those devices have shaped the API design. As they are fairly monolithic in the
sense that they appear to the operating system as a black box with relative
high-level controls, the V4L API exposed a device to userspace as one video
device node in ``/dev`` with a set of ioctls to handle buffer management,
format selection, stream control and access to parameters. Many mistakes in the
original design were fixed in Video4Linux2 (V4L2), released in 2002. The
original V4L API got deprecated in 2006 and removed from the Linux kernel in
2010.

.. note::

   While the V4L2 API supports both video capture and video output, this
   document mostly focusses on the former.

V4L2 covers a wide range of features for both analog and digital video devices,
including tuner and audio control, and has grown over time to accommodate more
features as video capture devices became more complex. It can enumerate the
device capabilities and parameters (supported video and audio inputs, formats,
frame sizes and frame rates, cropping and composing, analog video standards and
digital video timings, and control parameters), expose them to applications
(with get, try and set access, and a negotiation mechanism), manage buffers
(allocate, queue, dequeue and free them, with the ability to share buffers with
other devices for zero copy operation through dmabuf), start and stop video
streams, and report various conditions to applications through an event
mechanism. The V4L2 API has proven its ability to be extended (from the 51
ioctls present in 2005 in version 2.6.12 of the kernel, 2 were removed and 33
added as of version 6.0 in 2022), but has and still retains the same monolithic
device model as its predecessor.


Modularity with V4L2 subdevices
-------------------------------

As Linux moved towards the embedded space, the video capture devices started
exposing the multiple hardware components they contained (such as camera
sensors, TV tuners, video encoders and decoders, image signal processors, ...)
to the operating system instead of hiding them in a black box. The same camera
sensor or TV tuner could be used on different systems with different SoCs,
calling for a different architecture inside the kernel that would enable code
reuse.

In 2008, the Linux media subsystem gained support for a modular model of video
capture drivers. A new V4L2 subdevice object (``struct v4l2_subdev``) was
created to model external hardware components and expose them to the rest of
the kernel through an abstract API (``struct v4l2_subdev_ops``). The main
driver, also called the bridge driver as it controls the components the bridge
external devices with system memory, still creates the video devices (``struct
video_device``) that are exposed to userspace, but translates and delegates the
API calls from applications into calls to the appropriate subdevices. For
instance, when an application sets a V4L2 control on the video device, the
bridge driver will locate the subdevice that implements that control and
forward it the set control call.

The bridge driver also creates a top-level V4L2 device (``struct v4l2_device``)
and registers it with the V4L2 framework core, to bind together the subdevices
and video devices inside the kernel. This new model provided code reuse and
modularity inside the kernel.

.. figure:: subdev.svg

   Modularity with V4L2 subdevices

The new model only addressed in-kernel issues and kept the monolithic V4L2
userspace API untouched. The relief it brought was short-lived, as development
of the first Linux kernel driver for an image signal processor (the TI OMAP3
ISP) showed a need for lower-level control of device internals from
applications.

An ISP is a complex piece of hardware made of multiple processing blocks. Those
blocks are assembled in image processing pipelines, and in many devices data
routing within pipelines is configurable. Inline pipelines connect a video
source (usually a raw Bayer camera sensor) to the ISP and process frames on the
fly, writing fully processed images to memory. Offline pipelines first capture
raw images to memory and process them in memory-to-memory mode. Hybrid
architectures are also possible, and the same device may be configurable in
different modes depending on the use case. With different devices having
different processing blocks and different routing options, applications need to
control data routing within the device.

Furthermore, similar operations can often be performed in different places in
the pipeline. For instance, both camera sensors and ISPs are able to scale down
images, with the former usually offering lower-quality scaling than the latter,
but with the ability to achieve higher frame rates. Digital gains and colour
gains are also often found in both camera sensors and ISPs. As where to apply a
given image processing operation is dependent on the use case, a bridge driver
can't correctly decide how to delegate V4L2 API calls from applications to the
appropriate V4L2 subdevice without hardcoding and restricting possible use
cases.

The OMAP3 ISP driver reached the limits of the monolithic V4L2 API. Two years
of development were needed to fix this problem and finally merge, at the
beginning of 2011, the Media Controller API in the kernel.