FMADD, FMSUB instructions perform better or the same compared to indexed
FMLA, FMLS.
For example, the Arm Cortex-A55 Software Optimization Guide lists "FP
multiply accumulate" FMADD, FMSUB instructions with a throughput of 2
IPC, whereas it lists "ASIMD FP multiply accumulate, by element" FMLA,
FMLS with a throughput of 1 IPC.
The Arm Cortex-A77 Software Optimization Guide, however, does not
separately list "by element" variants of the "ASIMD FP multiply
accumulate" instructions, which are listed with the same throughput of 2
IPC as "FP multiply accumulate" instructions.
The result type of (EXTRACT_SUBREG ...) here seems to be deduced to i16, which causes an assertion after applying this rule.
Explicit specification of the result type fixes crashes of tests: