Tim remarked that the added patterns produce wrong code in case the fsub

instruction has a multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:

define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {

; CHECK-LABEL: test_FMLSv8f16_OP1:

; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h

entry:%mul = fmul fast <8 x half> %c, %b %sub = fsub fast <8 x half> %mul, %a ret <8 x half> %sub}

This doesn't look right to me. The exact instruction produced is "fmls

v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the

IR is calculating "v2*v1-v0". The equivalent <4 x float> code also

doesn't emit an fmls.

This patch generates an fmla and negates the value of the operand2 of the fsub.

Inspecting the pattern match, I found that there was another mistake in the

opcode to be selected: matching FMULv4*16 should generate FMLSv4*16

and not FMLSv2*32.

Tested on aarch64-linux with make check-all.