kylemanna
8/26/2017 - 5:28 AM

ffmpeg-hwdec.md

MPV Hardware Acceleration Benchmarks

Test System

  • Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz

  • Intel Corporation HD Graphics 630 (rev 04)

  • Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] (rev cf)

  • Software version:

    linux 4.12.8-1
    mpv 1:0.26.0-3
    libva 1.8.3-1
    libva-vdpau-driver 0.7.4-2
    libvdpau 1.1.1-2
    mesa-vdpau 17.1.6-1
    

Output of vainfo

libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva )
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc

Output of vdpauinfo

display: :1   screen: 0                                                                
API version: 1                             
Information string: G3DVL VDPAU Driver Shared Library version 1.0                      

Video surface:                             

name   width height types                                                              
-------------------------------------------                                            
420    16384 16384  NV12 YV12              
422    16384 16384  UYVY YUYV              
444    16384 16384  Y8U8V8A8 V8U8Y8A8      

Decoder capabilities:                      

name                        level macbs width height                                   
----------------------------------------------------                                   
MPEG1                          --- not supported ---                                   
MPEG2_SIMPLE                    3 65536  4096  4096                                    
MPEG2_MAIN                      3 65536  4096  4096                                    
H264_BASELINE                  52 65536  4096  4096                                    
H264_MAIN                      52 65536  4096  4096                                    
H264_HIGH                      52 65536  4096  4096                                    
VC1_SIMPLE                      1 65536  4096  4096                                    
VC1_MAIN                        2 65536  4096  4096                                    
VC1_ADVANCED                    4 65536  4096  4096                                    
MPEG4_PART2_SP                  3 65536  4096  4096                                    
MPEG4_PART2_ASP                 5 65536  4096  4096                                    
DIVX4_QMOBILE                  --- not supported ---                                   
DIVX4_MOBILE                   --- not supported ---                                   
DIVX4_HOME_THEATER             --- not supported ---                                   
DIVX4_HD_1080P                 --- not supported ---                                   
DIVX5_QMOBILE                  --- not supported ---                                   
DIVX5_MOBILE                   --- not supported ---                                   
DIVX5_HOME_THEATER             --- not supported ---                                   
DIVX5_HD_1080P                 --- not supported ---                                   
H264_CONSTRAINED_BASELINE       0 65536  4096  4096                                    
H264_EXTENDED                  --- not supported ---                                   
H264_PROGRESSIVE_HIGH          --- not supported ---                                   
H264_CONSTRAINED_HIGH          --- not supported ---                                   
H264_HIGH_444_PREDICTIVE       --- not supported ---                                   
HEVC_MAIN                      186 65536  4096  4096                                   
HEVC_MAIN_10                   186 65536  4096  4096                                   
HEVC_MAIN_STILL                --- not supported ---                                   
HEVC_MAIN_12                   --- not supported ---                                   
HEVC_MAIN_444                  --- not supported ---                                   

Output surface:                            

name              width height nat types   
----------------------------------------------------                                   
B8G8R8A8         16384 16384    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 A8I8 I8A8     
R8G8B8A8         16384 16384    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 A8I8 I8A8     
R10G10B10A2      16384 16384    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 A8I8 I8A8     
B10G10R10A2      16384 16384    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 A8I8 I8A8     

Bitmap surface:                            

name              width height             
------------------------------             
B8G8R8A8         16384 16384               
R8G8B8A8         16384 16384               
R10G10B10A2      16384 16384               
B10G10R10A2      16384 16384               
A8               16384 16384               

Video mixer:                               

feature name                    sup        
------------------------------------       
DEINTERLACE_TEMPORAL             y         
DEINTERLACE_TEMPORAL_SPATIAL     -         
INVERSE_TELECINE                 -         
NOISE_REDUCTION                  y         
SHARPNESS                        y         
LUMA_KEY                         y         
HIGH QUALITY SCALING - L1        y         
HIGH QUALITY SCALING - L2        -         
HIGH QUALITY SCALING - L3        -         
HIGH QUALITY SCALING - L4        -         
HIGH QUALITY SCALING - L5        -         
HIGH QUALITY SCALING - L6        -         
HIGH QUALITY SCALING - L7        -         
HIGH QUALITY SCALING - L8        -         
HIGH QUALITY SCALING - L9        -         

parameter name                  sup      min      max                                  
-----------------------------------------------------                                  
VIDEO_SURFACE_WIDTH              y        48     4096                                  
VIDEO_SURFACE_HEIGHT             y        48     4096                                  
CHROMA_TYPE                      y         
LAYERS                           y         0        4                                  

attribute name                  sup      min      max                                  
-----------------------------------------------------                                  
BACKGROUND_COLOR                 y         
CSC_MATRIX                       y         
NOISE_REDUCTION_LEVEL            y      0.00     1.00                                  
SHARPNESS_LEVEL                  y     -1.00     1.00                                  
LUMA_KEY_MIN_LUMA                y         
LUMA_KEY_MAX_LUMA                y         

Source Video

ffmpeg -i "$IN"

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '$IN':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf56.40.101
  Duration: 00:10:00.03, start: 0.000000, bitrate: 2078 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1967 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 101 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

Benchmark

Defaults

/usr/bin/time -v mpv -fs --length 10 "$IN"
VO: [opengl] 1280x720 yuv420p

User time (seconds): 3.82
System time (seconds): 0.13
Percent of CPU this job got: 38%

User time (seconds): 3.26
System time (seconds): 0.18
Percent of CPU this job got: 33%

User time (seconds): 3.86
System time (seconds): 0.18
Percent of CPU this job got: 39%

VA-API without HW Decoder support

/usr/bin/time -v mpv -fs --length 10 -vo=vaapi "$IN"
VO: [vaapi] 1280x720 yuv420p

User time (seconds): 4.02
System time (seconds): 0.17
Percent of CPU this job got: 41%

User time (seconds): 3.50
System time (seconds): 0.19
Percent of CPU this job got: 36%

User time (seconds): 4.11
System time (seconds): 0.12
Percent of CPU this job got: 41%

VA-API with HW Decoder Auto support

/usr/bin/time -v mpv -fs --length 10 -vo=vaapi --hwdec=auto "${IN}"
Using hardware decoding (vdpau-copy).
VO: [vaapi] 1280x720 nv12

User time (seconds): 1.14
System time (seconds): 0.33
Percent of CPU this job got: 14%

User time (seconds): 0.94
System time (seconds): 0.34
Percent of CPU this job got: 12%

User time (seconds): 1.02
System time (seconds): 0.32
Percent of CPU this job got: 13%

VA-API with HW Decoder vaapi support

/usr/bin/time -v mpv -fs --length 10 -vo=vaapi --hwdec=vaapi "${IN}"
Using hardware decoding (vaapi).
VO: [vaapi] 1280x720 vaapi[nv12]

User time (seconds): 0.51
System time (seconds): 0.25
Percent of CPU this job got: 7%

User time (seconds): 0.53
System time (seconds): 0.22
Percent of CPU this job got: 7%

User time (seconds): 0.60
System time (seconds): 0.25
Percent of CPU this job got: 8%

VDPAU without HW Decoder

/usr/bin/time -v mpv -fs --length 10 -vo=vdpau "${IN}"
VO: [vdpau] 1280x720 yuv420p
[vo/vdpau] Compositing window manager detected. Assuming timing info is inaccurate.

User time (seconds): 3.96
System time (seconds): 0.12
Percent of CPU this job got: 40%

User time (seconds): 4.44
System time (seconds): 0.21
Percent of CPU this job got: 45%

User time (seconds): 4.71
System time (seconds): 0.22
Percent of CPU this job got: 48%

VDPAU with HW Decoder Auto support

/usr/bin/time -v mpv -fs --length 10 -vo=vdpau --hwdec=auto "${IN}"
Using hardware decoding (vdpau).
VO: [vdpau] 1280x720 vdpau[yuv420p]
[vo/vdpau] Compositing window manager detected. Assuming timing info is inaccurate.

User time (seconds): 0.55
System time (seconds): 0.21
Percent of CPU this job got: 7%

User time (seconds): 0.67
System time (seconds): 0.27
Percent of CPU this job got: 9%

User time (seconds): 0.56
System time (seconds): 0.25
Percent of CPU this job got: 8%

Conclusion

Selecting vdpau or vaapi would reduce CPU load by about 5x on supported codecs.

Resulting config ~/.config/mpv/mpv.conf:

hwdec=auto
vo=vdpau

FFMPEG HEVC + Quick Sync Notes

Tested encoding the following video which ws previously encoded to vp9:

Duration: 00:57:26.89, start: 0.000000, bitrate: 3359 kb/s
  Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, progressive), 1280x720, SAR 1:1 DAR 16:9, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc (default)

Invocation

HW Encode

/usr/bin/time -v ffmpeg -hwaccel vaapi -i "${src}" \
    -vaapi_device /dev/dri/renderD129  -vf 'format=nv12,hwupload' -vcodec hevc_vaapi \
    -pass 1 -crf ${crf} -threads 8 -an -y -f matroska "/dev/null"

/usr/bin/time -v ffmpeg -hwaccel vaapi -i "${src}" \
    -vaapi_device /dev/dri/renderD129  -vf 'format=nv12,hwupload' -vcodec hevc_vaapi \
    -pass 2 -acodec copy -crf ${crf} -threads 8 -y -f matroska "${src}.hevc.hw.mkv"

Each pass took about 38 minutes.

SW Encode

/usr/bin/time -v ffmpeg -i "${src}" \      
    -vcodec hevc \                         
    -pass 1 -crf ${crf} -threads 8 -an -y -f matroska "/dev/null"

/usr/bin/time -v ffmpeg -i "${src}" \                                                  
    -vcodec hevc \
    -pass 2 -acodec copy -crf ${crf} -threads 8 -y -f matroska "${src}.hevc.sw.mkv"

Each pass took about 14 minutes.

Simultaneous MPV playback

mkfifo fifo1 fifo2
mpv --msg-level=vd=debug --input-file=fifo1 input1.mkv -pause --start 10:00
mpv --msg-level=vd=debug --input-file=fifo1 input2.mkv -pause --start 10:00
echo pause | tee fifo1 fifo2

Results

DescriptionSizeEncode TimeSubjective Quality
VP9 Source1.4Gn/aGood
HEVC SW1G2x 38mGood, same as src
HEVC HW1.3G2x 14mOk

In conclusion, the Intel HEVC Quick Sync encoder is 2.5-3x faster then libx265 but produced videos of slightly lower quality and are approximately 30% bigger.

The HEVC/h.265/x265 video encoder/decoder is favored for hardware acceleration over VP9 implementations due to readily available decoding support (AMD RX 470 GPU).

The libx265 software codec delivers similar quality to VP9 source and is 40% smaller then the Intel Quick Sync technology.

End result was to use HEVC for re-encoding some videos due to readily available decoders and encoders. I will revisit vp9 when I buy my next graphics card in a few years, vp9 has preferrable licensing and is standardized for webm containers.

THe libx265 software codec delivers smaller files with higher visual quality at the expense of CPU encoding time. Because storage space and visual quality are the prime concern software encoding is used. Intel Quick Sync is the clear winner for high speed encoding, but seems more difficult to configure for optimal visual and size performance.

Hardware Rambling

The AMD RX 470 hardware can support 10-bit VP9 decoding with UVD 6.3, but sofware support is missing in the Mesa VA driver for complete UVD 6.3 features. The GPU can also encode HEVC, but support is also missing in the Mesa VA driver for VCE 3.4. I suspect the HEVC encoding feature will deliver similar results as Quick Sync: Faster runtime performance at the expense of file size and visual quality in which case I'll still prefer libx265.

Test VP9 Hardware Decoding using Intel Quick Sync via VA-API

    Command being timed: "ffmpeg -t 90 -i casual-test.webm -vcodec hevc -crf 22 -acodec copy -threads 8 -y -f matroska -benchmark casual-test.webm.sw-dec.90.mkv"
    User time (seconds): 470.27
    System time (seconds): 0.86
    Percent of CPU this job got: 739%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.72

    Command being timed: "../transcode.sh casual-test.webm"
    User time (seconds): 939.82
    System time (seconds): 1.63
    Percent of CPU this job got: 745%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:06.34

Add -t 90 -vaapi_device /dev/dri/renderD129 -hwaccel vaapi before input file.

Entire encoding for first 90 seconds:

    Command being timed: "ffmpeg -t 90 -vaapi_device /dev/dri/renderD129 -hwaccel vaapi -i casual-test.webm -vcodec hevc -crf 22 -acodec copy -threads 8 -y -f matroska -benchmark casual-test.webm.hw-dec.90.mkv"
    User time (seconds): 451.98
    System time (seconds): 2.30
    Percent of CPU this job got: 737%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:01.62

    Command being timed: "../transcode.sh casual-test.webm"
    User time (seconds): 904.02
    System time (seconds): 5.10
    Percent of CPU this job got: 737%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:03.31

Conclusion

File sizes are identical, so I'm assuming hw/sw decoding is deterministic? In which case the Intel Quick Sync is about 5% faster and essentially free and leaves slight more CPU for encoding, but encoding dominates the process.