mc-v4l2-api.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301

.. SPDX-License-Identifier: CC-BY-SA-4.0

Media Controller and V4L2 Subdevice APIs
========================================

The term `Media Controller` usually covers two distinct APIs:

- The Media Controller (MC) API itself, whose task it is to expose the internal
  topology of the device to applications.

- The V4L2 subdevice userspace API, which exposes low-level control of
  individual subdevices to applications.

Collectively, and in collaboration with the V4L2 API, these offer the features
needed by applications to control complex video capture devices.


The Media Controller API
------------------------

.. _media-ctl: https://git.linuxtv.org/v4l-utils.git/tree/utils/media-ctl

The Media Controller kernel framework and userspace API model devices as a
directed acyclic graph of `entities`. Each entity represents a hardware block,
which can be an external on-board component, an IP core in the SoC, or a piece
of either of those. The API doesn't precisely define how a device should be
split in entities. Individual drivers decide on the exact model they want to
expose, to allow fine-grained control of the hardware blocks while minimizing
the number of entities to avoid unnecessary complexity.

Entities include `pads`, which model input and output ports through which
entities receive or produce data. Data inputs are called `sinks`, and data
outputs `sources`. The data flow through the graph is modelled by `links` that
connect sources to sinks. Each link connects one source pad of an entity to a
sink pad of another entity. Cycles in the graph are not allowed.

.. note::

   The Media Controller API is not limited to video capture devices and has
   been designed to model any type of data flow in a media device. This
   includes, for instance, audio and display devices. However, as of version
   6.0 of the kernel, the API is only used in the Linux media subsystem, by
   V4L2 and DVB drivers, and hasn't made its way to the ALSA and DRM/KMS
   subsystems.

When used in a V4L2 driver, an entity models either a video device (``struct
video_device``) or a subdevice (``struct v4l2_subdevice``). For video capture
devices, subdevices represent video sources (camera sensors, input connectors,
...) or processing elements, and video devices represent the connection to
system memory at the end of a pipeline (typically a DMA engine, but it can also
be a USB connection for USB webcams). The entity type is exposed to
applications as an entity `function`, for instance

- ``MEDIA_ENT_F_CAM_SENSOR`` for a camera sensor
- ``MEDIA_ENT_F_PROC_VIDEO_SCALER`` for a video scaler
- ``MEDIA_ENT_F_IO_V4L`` for a connection to system memory through a V4L2 video
  device

The kernel media controller device (``struct media_device``) is exposed to
userspace through a media device node, typically named ``/dev/media[0-9]+``.
The `media-ctl`_ tool can query the topology of a Media Controller device and
display it in either plain text (``--print-topology`` or ``-p``) or `DOT format
<https://graphviz.org/doc/info/lang.html>`_ (``--print-dot``).

.. code-block:: sh

   $ media-ctl -d /dev/media0 --print-dot | dot -Tsvg > omap3isp.svg

:numref:`media-graph-ti-omap3-isp` represents the TI OMAP3 ISP, with entities
corresponding to subdevices in green and entities corresponding to video
devices in yellow.

.. _media-graph-ti-omap3-isp:

.. graphviz:: omap3isp.dot
   :caption: Media graph of the TI OMAP3 ISP

The ``mt9p031 2-0048`` on the top row is a camera sensor, all other entities
are internal to the OMAP3 SoC and part of the ISP.

Entities, their pads, and the links are intrinsic properties of the device.
They are created by the driver at initialization time to model the hardware
topology. Unless parts of the device is hot-pluggable, no entities or links are
created or removed after initialization. Only their properties can be modified
by applications.

Data flow routing is controlled by enabling or disabling links, using the
``MEDIA_LNK_FL_ENABLED`` link flag. Links that model immutable connections at
the hardware level are displayed as a thick plain line in the media graph. They
have the ``MEDIA_LNK_FL_IMMUTABLE`` and ``MEDIA_LNK_FL_ENABLED`` flags set and
can't be modified.  Links that model configurable routing options can be
controlled, and are displayed as a dotted line if they are disabled or as thin
plain line if they are enabled.

As the Media Controller API can model any type of data flow, it doesn't expose
any property specific to a particular device type, such as, for instance, pixel
formats or frame rates. This is left to other, device-specific APIs


The V4L2 Subdevice Userspace API
--------------------------------

.. _V4L2 Subdevice Userspace API: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/dev-subdev.html
.. _V4L2 controls ioctls: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/vidioc-g-ext-ctrls.html
.. _v4l2-ctl: https://git.linuxtv.org/v4l-utils.git/tree/utils/v4l2-ctl

The `V4L2 Subdevice Userspace API`_ (often shortened to just V4L2 Subdevice API
when this doesn't cause any ambiguity with the in-kernel V4L2 subdevice
operations) has been developed along the Media Controller API to expose to
applications the properties of entities corresponding to V4L2 subdevices. It
allows accessing V4L2 controls directly on a subdevice, as well as formats and
selection rectangles on the subdevice pads.

Subdevices are exposed to userspace through V4L2 subdevice nodes, typically
named ``/dev/v4l-subdev[0-9]+``. They are controlled using ioctls in a similar
fashion as the V4L2 video devices. The `v4l2-ctl`_ tool supports a wide range
of subdevice-specific options to access subdevices from the command line (see
``v4l2-ctl --help-subdev`` for a detailed list).

The rest of this document will use the NXP i.MX8MP ISP as an example. Its media
graph is shown in :numref:`media-graph-nxp-imx8mp`.

.. _media-graph-nxp-imx8mp:

.. graphviz:: imx8mp-isp.dot
   :caption: Media graph of the NXP i.MX8MP

It contains the following V4L2 subdevices:

- A raw camera sensor (``imx290 2-001a``), with a single source pad connected
  to the SoC through a MIPI CSI-2 link.
- A MIPI CSI-2 receiver (``csis-32e40000.csi``), internal to the SoC, that
  receives data from the sensor on its sink pad and provides it to the ISP on
  its source pad.
- An ISP (``rkisp1_isp``), with two sink pads that receive image data and
  processing parameters (0 and 1 respectively) and two source pads that output
  image data and statistcis (2 and 3 respectively).
- A scaler (``rkisp1_resizer_mainpath``) that can scale the frames up or down.

It also contains the following video devices:

- A capture device that writes video frames to memory (``rkisp1_mainpath``).
- A capture device that writes statistics to memory (``rkisp1_stats``).
- An output device that reads ISP parameters from memory (``rkisp1_params``).


V4L2 Subdevice Controls
~~~~~~~~~~~~~~~~~~~~~~~

Subdevice controls are accessed using the `V4L2 controls ioctls`_ in exactly
the same way as for video device, except that the ioctls should be issued on
the subdevice node. Tools that access controls on video devices can usually be
used unmodified on subdevices. For instance, to list the controls supported by
the IMX290 camera sensor subdevice,

.. code-block:: none

   $ v4l2-ctl -d /dev/v4l-subdev3 -l

   User Controls

                          exposure 0x00980911 (int)    : min=1 max=1123 step=1 default=1123 value=1123

   Camera Controls

                camera_orientation 0x009a0922 (menu)   : min=0 max=2 default=0 value=0 (Front) flags=read-only
            camera_sensor_rotation 0x009a0923 (int)    : min=0 max=0 step=1 default=0 value=0 flags=read-only

   Image Source Controls

                 vertical_blanking 0x009e0901 (int)    : min=45 max=45 step=1 default=45 value=45 flags=read-only
               horizontal_blanking 0x009e0902 (int)    : min=280 max=280 step=1 default=280 value=280 flags=read-only
                     analogue_gain 0x009e0903 (int)    : min=0 max=240 step=1 default=0 value=0

   Image Processing Controls

                    link_frequency 0x009f0901 (intmenu): min=0 max=1 default=0 value=0 (222750000 0xd46e530) flags=read-only
                        pixel_rate 0x009f0902 (int64)  : min=1 max=2147483647 step=1 default=178200000 value=178200000 flags=read-only
                      test_pattern 0x009f0903 (menu)   : min=0 max=7 default=0 value=0 (Disabled)

By accessing controls on subdevices, applications can control the behaviour of
each subdevice independently. If multiple subdevices in the graph implement the
same control (such as a digital gain), those controls can be set individually.
This wouldn't be possible using with the traditional V4L2 API on video devices,
as the identical controls from two different subdevices would conflict.


.. _v4l2-subdevice-formats:

V4L2 Subdevice Formats
~~~~~~~~~~~~~~~~~~~~~~

Where video devices expose only the format of the frames being captured to
memory, subdevices allow fine-grained configuration of formats on every pad in
the pipeline. This enables setting up pipelines with different internal
configurations to match precise use cases. To understand why this is needed,
let's consider the simplified example in :numref:`scaling-pipeline`, where a
12MP camera sensor (IMX477) is connected to an SoC that includes an ISP and a
scaler.

.. _scaling-pipeline:

.. graphviz:: scaler.dot
   :caption: Scaling pipeline

All three components can affect the image size:

- The camera sensor can subsample the image through mechanisms such as binning
  and skipping.
- The ISP can subsample the image horizontally through averaging.
- The scaler uses a polyphase filter for high quality scaling.

All these components can further crop the image if desired.

Different use cases will call for cropping and resizing the image in different
ways through the pipeline. Let's assume that, in all cases, we want to capture
1.5MP images from the 12MP native sensor resolution, When frame rate is more
important than quality, the sensor will typically subsample the image to comply
with the bandwidth limitations of the ISP. As the subsampling factor is
restricted to powers of two, the scaler is further used to achieve the exact
desired size (:numref:`scaling-pipeline-fast`).

.. _scaling-pipeline-fast:

.. graphviz:: scaler-fast.dot
   :caption: Fast scaling

On the other hand, when capturing still images, the full image should be
processed through the pipeline and resized at the very end using the higher
quality scaler (:numref:`scaling-pipeline-hq`).

.. _scaling-pipeline-hq:

.. graphviz:: scaler-hq.dot
   :caption: High quality scaling

Using the traditional V4L2 API on video nodes, the bridge driver configures the
internal pipeline based on the desired capture format. As the use cases above
produce the same format at the output of the pipeline, the bridge driver won't
be able to differentiate between them and configure the pipeline appropriately
for each use case. To solve this problem, the V4L2 subdevice userspace API let
applications access formats on pads directly.

Formats on subdevice pads are called `media bus formats`. They are described by
the ``v4l2_mbus_framefmt`` structure:

.. code-block:: c

   struct v4l2_mbus_framefmt {
   	__u32			width;
   	__u32			height;
   	__u32			code;
   	__u32			field;
   	__u32			colorspace;
   	union {
   		__u16			ycbcr_enc;
   		__u16			hsv_enc;
   	};
   	__u16			quantization;
   	__u16			xfer_func;
   	__u16			flags;
   	__u16			reserved[10];
   };

Unlike the pixel formats used on video devices, which describe how image data
is stored in memory (using the ``v4l2_pix_format`` and
``v4l2_pix_format_mplane`` structures), media bus formats describe how image
data is transmitted on buses between subdevices.  The ``bytesperline`` and
``sizeimage`` fields of the pixel format are thus not found in the media bus
formats, as they refer to memory sizes.

This difference between the two concepts causes a second difference between the
media bus and pixel format structures. The FourCC values used to described
pixel formats are not applicable to bus formats, as they also describe data
organization in memory.  Media bus formats instead use `format codes` that
describe how individual bits are organized and transferred on a bus. The format
codes are 32-bit numerical values defined by the ``MEDIA_BUS_FMT_*`` macros and
are documented in the `Media Bus Formats`_ section of the V4L2 API
documentation.

.. _Media Bus Formats: https://linuxtv.org/downloads/v4l-dvb-apis/userspace-api/v4l/subdev-formats.html>

.. note::

   In the remaining of this document, the terms `media bus format`, `bus
   format` or `format`, when applying to subdevice pads, refers to the
   combination of all fields of the ``v4l2_mbus_framefmt`` structure. To refer
   to the media bus format code specifically, the terms `media bus code`,
   `format code` or `code` will be used.

In general, there is no 1:1 universal mapping between pixel formats and media
bus formats. To understand this, let's consider the
``MEDIA_BUS_FMT_UYVY8_1X16`` media bus code that describes on common way to
transmit YUV 4:2:2 data on a 16-bit parallel bus. When the image data reaches
the DMA engine at the end of the pipeline and is written to memory, it can be
rearranged in different ways, producing for instance the ``V4L2_PIX_FMT_UYVY``
packed pixel format that seem to be a direct match, but also the semi-planar
``V4L2_PIX_FMT_NV16`` format by writing the luma and chroma data to separate
memory planes. How a media bus code is translated to pixel formats depends on
the capabilities of the DMA engine, and is thus device-specific.