This is an archive of the discontinued LLVM Phabricator instance.

[aarch64] fix generation of fp16 fmls
ClosedPublic

Authored by sebpop on Sep 24 2019, 3:10 PM.

Details

Summary

Tim remarked that the added patterns produce wrong code in case the fsub
instruction has a multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:

define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
; CHECK-LABEL: test_FMLSv8f16_OP1:
; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
entry:

%mul = fmul fast <8 x half> %c, %b
%sub = fsub fast <8 x half> %mul, %a
ret <8 x half> %sub

}

This doesn't look right to me. The exact instruction produced is "fmls
v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
doesn't emit an fmls.

This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.

Tested on aarch64-linux with make check-all.

Diff Detail

Event Timeline

sebpop created this revision.Sep 24 2019, 3:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2019, 3:10 PM
SjoerdMeijer added inline comments.Oct 7 2019, 8:28 AM
llvm/test/CodeGen/AArch64/fp16-fmla.ll
163

Why are we not generating a fmls?

And a nit, but perhaps actually just using registers v0, v1, and v2 here makes things clearer?

sebpop marked an inline comment as done.Oct 7 2019, 11:24 AM
sebpop added inline comments.
llvm/test/CodeGen/AArch64/fp16-fmla.ll
163

That is part of the problem that Tim pointed out: when the multiply is the first operand of fsub, i.e.,

%sub = fsub fast <8 x half> %mul, %a

that should not generate a fused multiply sub.
With this patch, for b * c - a we negate the value of a and generate a fused multiply add -a + b * c.

SjoerdMeijer accepted this revision.Oct 8 2019, 1:33 AM

Cheers, lgtm

llvm/test/CodeGen/AArch64/fp16-fmla.ll
163

Thanks, I just got myself confused here.

This revision is now accepted and ready to land.Oct 8 2019, 1:33 AM
This revision was automatically updated to reflect the committed changes.