Comment 10 for bug 2076361

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

> 100% usage of Render/3D means copy, vpp or display are stuck,

The issue is certainly related to the 100% render/compute, but this looks like a symptom and a consequence rather than a cause.

> since multiple inputs are set, could it work when encoding a single input each time?

Yeah, there are several ways to workaround it. But to be clear, the concern here is not that ffmpeg is slow, but rather that whatever is happening is taking the entire system down instead of gracefully degrading. Even keyboard input stops working.

> Maybe there is an unsupported video format input by Intel GPU decoder/encoder.

No, the input/output encoding/decoding is fine. It's using H265/HEVC, well supported by the chip.

I've done some more testing and bisected the problem to the following, with left.mp4 and right.mp4 being 4K videos.

This works fine, with Video fluctuating beteween 20 and 30%, Render close to zero:

```
ffmpeg -y \
        -vaapi_device /dev/dri/renderD128 \
        -hwaccel vaapi -hwaccel_output_format vaapi -i left.mp4 \
        -hwaccel vaapi -hwaccel_output_format vaapi -i right.mp4 \
        -filter_complex "
                [0:v]hwdownload,format=nv12,copy[v0];
                [1:v]hwdownload,format=nv12,copy[v1];
                [v0][v1]hstack=shortest=true,format=nv12,hwupload[out];
                " \
        -c:v hevc_vaapi -map "[out]" slowtest.mp4
```

This causes the issue, with Render at 100%, Video at 0%, system unresponsive:

```
fmpeg -y \
        -vaapi_device /dev/dri/renderD128 \
        -hwaccel vaapi -hwaccel_output_format vaapi -i left.mp4 \
        -hwaccel vaapi -hwaccel_output_format vaapi -i right.mp4 \
        -filter_complex "
                [0:v]hwdownload,format=nv12,copy,format=nv12,hwupload[v0];
                [1:v]hwdownload,format=nv12,copy,format=nv12,hwupload[v1];
                [v0][v1]hstack_vaapi=shortest=true[out];
                " \
        -c:v hevc_vaapi -map "[out]" out.mp4

```

The key difference between these two pipelines is whether hstack happens in the GPU or the CPU. When it works, two frames are downloaded from the GPU, stacked together, and the merged frame is then re-uploaded for encoding. When it breaks, the two frames are uploaded individually and stacked in the GPU instead.

The problem may be related to the amount of memory used in the GPU, since when the stacking happens in the GPU there is twice as much data. Even then, instead of degrading, the system came down to a halt with the GPU misbehaving, keyboard interrupts going wild, etc.

Anything else I can do to help diagnose why that's the case or whether we can address it?