.. SPDX-License-Identifier: CC-BY-SA-4.0 Media Controller and V4L2 Subdevice APIs ======================================== The term `Media Controller` usually covers two distinct APIs: - The Media Controller (MC) API itself, whose task it is to expose the internal topology of the device to applications. - The V4L2 subdevice userspace API, which exposes low-level control of individual subdevices to applications. Collectively, and in collaboration with the V4L2 API, these offer the features needed by applications to control complex video capture devices. The Media Controller API ------------------------ .. _media-ctl: https://git.linuxtv.org/v4l-utils.git/tree/utils/media-ctl The Media Controller kernel framework and userspace API model devices as a directed acyclic graph of `entities`. Each entity represents a hardware block, which can be an external on-board component, an IP core in the SoC, or a piece of either of those. The API doesn't precisely define how a device should be split in entities. Individual drivers decide on the exact model they want to expose, to allow fine-grained control of the hardware blocks while minimizing the number of entities to avoid unnecessary complexity. Entities include `pads`, which model input and output ports through which entities receive or produce data. Data inputs are called `sinks`, and data outputs `sources`. The data flow through the graph is modelled by `links` that connect sources to sinks. Each link connects one source pad of an entity to a sink pad of another entity. Cycles in the graph are not allowed. .. note:: The Media Controller API is not limited to video capture devices and has been designed to model any type of data flow in a media device. This includes, for instance, audio and display devices. However, as of version 6.0 of the kernel, the API is only used in the Linux media subsystem, by V4L2 and DVB drivers, and hasn't made its way to the ALSA and DRM/KMS subsystems. When used in a V4L2 driver, an entity models either a video device (``struct video_device``) or a subdevice (``struct v4l2_subdevice``). For video capture devices, subdevices represent video sources (camera sensors, input connectors, ...) or processing elements, and video devices represent the connection to system memory at the end of a pipeline (typically a DMA engine, but it can also be a USB connection for USB webcams). The entity type is exposed to applications as an entity `function`, for instance - ``MEDIA_ENT_F_CAM_SENSOR`` for a camera sensor - ``MEDIA_ENT_F_PROC_VIDEO_SCALER`` for a video scaler - ``MEDIA_ENT_F_IO_V4L`` for a connection to system memory through a V4L2 video device The kernel media controller device (``struct media_device``) is exposed to userspace through a media device node, typically named ``/dev/media[0-9]+``. The `media-ctl`_ tool can query the topology of a Media Controller device and display it in either plain text (``--print-topology`` or ``-p``) or `DOT format `_ (``--print-dot``). .. code-block:: sh $ media-ctl -d /dev/media0 --print-dot | dot -Tsvg > omap3isp.svg :numref:`media-graph-ti-omap3-isp` represents the TI OMAP3 ISP, with entities corresponding to subdevices in green and entities corresponding to video devices in yellow. .. _media-graph-ti-omap3-isp: .. graphviz:: omap3isp.dot :caption: Media graph of the TI OMAP3 ISP The ``mt9p031 2-0048`` on the top row is a camera sensor, all other entities are internal to the OMAP3 SoC and part of the ISP. Entities, their pads, and the links are intrinsic properties of the device. They are created by the driver at initialization time to model the hardware topology. Unless parts of the device is hot-pluggable, no entities or links are created or removed after initialization. Only their properties can be modified by applications. Data flow routing is controlled by enabling or disabling links, using the ``MEDIA_LNK_FL_ENABLED`` link flag. Links that model immutable connections at the hardware level are displayed as a thick plain line in the media graph. They have the ``MEDIA_LNK_FL_IMMUTABLE`` and ``MEDIA_LNK_FL_ENABLED`` flags set and can't be modified. Links that model configurable routing options can be controlled, and are displayed as a dotted line if they are disabled or as thin plain line if they are enabled. As the Media Controller API can model any type of data flow, it doesn't expose any property specific to a particular device type, such as, for instance, pixel formats or frame rates. This is left to other, device-specific APIs The V4L2 Subdevice Userspace API -------------------------------- .. _V4L2 Subdevice Userspace API: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/dev-subdev.html .. _V4L2 controls ioctls: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/vidioc-g-ext-ctrls.html .. _v4l2-ctl: https://git.linuxtv.org/v4l-utils.git/tree/utils/v4l2-ctl The `V4L2 Subdevice Userspace API`_ (often shortened to just V4L2 Subdevice API when this doesn't cause any ambiguity with the in-kernel V4L2 subdevice operations) has been developed along the Media Controller API to expose to applications the properties of entities corresponding to V4L2 subdevices. It allows accessing V4L2 controls directly on a subdevice, as well as formats and selection rectangles on the subdevice pads. Subdevices are exposed to userspace through V4L2 subdevice nodes, typically named ``/dev/v4l-subdev[0-9]+``. They are controlled using ioctls in a similar fashion as the V4L2 video devices. The `v4l2-ctl`_ tool supports a wide range of subdevice-specific options to access subdevices from the command line (see ``v4l2-ctl --help-subdev`` for a detailed list). The rest of this document will use the NXP i.MX8MP ISP as an example. Its media graph is shown in :numref:`media-graph-nxp-imx8mp`. .. _media-graph-nxp-imx8mp: .. graphviz:: imx8mp-isp.dot :caption: Media graph of the NXP i.MX8MP It contains the following V4L2 subdevices: - A raw camera sensor (``imx290 2-001a``), with a single source pad connected to the SoC through a MIPI CSI-2 link. - A MIPI CSI-2 receiver (``csis-32e40000.csi``), internal to the SoC, that receives data from the sensor on its sink pad and provides it to the ISP on its source pad. - An ISP (``rkisp1_isp``), with two sink pads that receive image data and processing parameters (0 and 1 respectively) and two source pads that output image data and statistcis (2 and 3 respectively). - A scaler (``rkisp1_resizer_mainpath``) that can scale the frames up or down. It also contains the following video devices: - A capture device that writes video frames to memory (``rkisp1_mainpath``). - A capture device that writes statistics to memory (``rkisp1_stats``). - An output device that reads ISP parameters from memory (``rkisp1_params``). V4L2 Subdevice Controls ~~~~~~~~~~~~~~~~~~~~~~~ Subdevice controls are accessed using the `V4L2 controls ioctls`_ in exactly the same way as for video device, except that the ioctls should be issued on the subdevice node. Tools that access controls on video devices can usually be used unmodified on subdevices. For instance, to list the controls supported by the IMX290 camera sensor subdevice, .. code-block:: none $ v4l2-ctl -d /dev/v4l-subdev3 -l User Controls exposure 0x00980911 (int) : min=1 max=1123 step=1 default=1123 value=1123 Camera Controls camera_orientation 0x009a0922 (menu) : min=0 max=2 default=0 value=0 (Front) flags=read-only camera_sensor_rotation 0x009a0923 (int) : min=0 max=0 step=1 default=0 value=0 flags=read-only Image Source Controls vertical_blanking 0x009e0901 (int) : min=45 max=45 step=1 default=45 value=45 flags=read-only horizontal_blanking 0x009e0902 (int) : min=280 max=280 step=1 default=280 value=280 flags=read-only analogue_gain 0x009e0903 (int) : min=0 max=240 step=1 default=0 value=0 Image Processing Controls link_frequency 0x009f0901 (intmenu): min=0 max=1 default=0 value=0 (222750000 0xd46e530) flags=read-only pixel_rate 0x009f0902 (int64) : min=1 max=2147483647 step=1 default=178200000 value=178200000 flags=read-only test_pattern 0x009f0903 (menu) : min=0 max=7 default=0 value=0 (Disabled) By accessing controls on subdevices, applications can control the behaviour of each subdevice independently. If multiple subdevices in the graph implement the same control (such as a digital gain), those controls can be set individually. This wouldn't be possible using with the traditional V4L2 API on video devices, as the identical controls from two different subdevices would conflict. .. _v4l2-subdevice-formats: V4L2 Subdevice Formats ~~~~~~~~~~~~~~~~~~~~~~ Where video devices expose only the format of the frames being captured to memory, subdevices allow fine-grained configuration of formats on every pad in the pipeline. This enables setting up pipelines with different internal configurations to match precise use cases. To understand why this is needed, let's consider the simplified example in :numref:`scaling-pipeline`, where a 12MP camera sensor (IMX477) is connected to an SoC that includes an ISP and a scaler. .. _scaling-pipeline: .. graphviz:: scaler.dot :caption: Scaling pipeline All three components can affect the image size: - The camera sensor can subsample the image through mechanisms such as binning and skipping. - The ISP can subsample the image horizontally through averaging. - The scaler uses a polyphase filter for high quality scaling. All these components can further crop the image if desired. Different use cases will call for cropping and resizing the image in different ways through the pipeline. Let's assume that, in all cases, we want to capture 1.5MP images from the 12MP native sensor resolution, When frame rate is more important than quality, the sensor will typically subsample the image to comply with the bandwidth limitations of the ISP. As the subsampling factor is restricted to powers of two, the scaler is further used to achieve the exact desired size (:numref:`scaling-pipeline-fast`). .. _scaling-pipeline-fast: .. graphviz:: scaler-fast.dot :caption: Fast scaling On the other hand, when capturing still images, the full image should be processed through the pipeline and resized at the very end using the higher quality scaler (:numref:`scaling-pipeline-hq`). .. _scaling-pipeline-hq: .. graphviz:: scaler-hq.dot :caption: High quality scaling Using the traditional V4L2 API on video nodes, the bridge driver configures the internal pipeline based on the desired capture format. As the use cases above produce the same format at the output of the pipeline, the bridge driver won't be able to differentiate between them and configure the pipeline appropriately for each use case. To solve this problem, the V4L2 subdevice userspace API let applications access formats on pads directly. Formats on subdevice pads are called `media bus formats`. They are described by the ``v4l2_mbus_framefmt`` structure: .. code-block:: c struct v4l2_mbus_framefmt { __u32 width; __u32 height; __u32 code; __u32 field; __u32 colorspace; union { __u16 ycbcr_enc; __u16 hsv_enc; }; __u16 quantization; __u16 xfer_func; __u16 flags; __u16 reserved[10]; }; Unlike the pixel formats used on video devices, which describe how image data is stored in memory (using the ``v4l2_pix_format`` and ``v4l2_pix_format_mplane`` structures), media bus formats describe how image data is transmitted on buses between subdevices. The ``bytesperline`` and ``sizeimage`` fields of the pixel format are thus not found in the media bus formats, as they refer to memory sizes. This difference between the two concepts causes a second difference between the media bus and pixel format structures. The FourCC values used to described pixel formats are not applicable to bus formats, as they also describe data organization in memory. Media bus formats instead use `format codes` that describe how individual bits are organized and transferred on a bus. The format codes are 32-bit numerical values defined by the ``MEDIA_BUS_FMT_*`` macros and are documented in the `Media Bus Formats`_ section of the V4L2 API documentation. .. _Media Bus Formats: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/subdev-formats.html> .. note:: In the remaining of this document, the terms `media bus format`, `bus format` or `format`, when applying to subdevice pads, refers to the combination of all fields of the ``v4l2_mbus_framefmt`` structure. To refer to the media bus format code specifically, the terms `media bus code`, `format code` or `code` will be used. In general, there is no 1:1 universal mapping between pixel formats and media bus formats. To understand this, let's consider the ``MEDIA_BUS_FMT_UYVY8_1X16`` media bus code that describes on common way to transmit YUV 4:2:2 data on a 16-bit parallel bus. When the image data reaches the DMA engine at the end of the pipeline and is written to memory, it can be rearranged in different ways, producing for instance the ``V4L2_PIX_FMT_UYVY`` packed pixel format that seem to be a direct match, but also the semi-planar ``V4L2_PIX_FMT_NV16`` format by writing the luma and chroma data to separate memory planes. How a media bus code is translated to pixel formats depends on the capabilities of the DMA engine, and is thus device-specific.