This is an archive of the discontinued LLVM Phabricator instance.

Implement aarch64 neon instruction class AdvSIMD (by element) - Clang
ClosedPublic

Authored by Jiangning on Sep 25 2013, 2:01 AM.

Details

Reviewers
t.p.northover

Diff Detail

Event Timeline

Jiangning updated this revision to Unknown Object (????).Sep 27 2013, 2:50 AM

Hi Jiangning,

Mostly looks reasonable. I've just got one question.

utils/TableGen/NeonEmitter.cpp
1571–1572

Why are these distinct? As far as I can see they are treated in exactly the same way by TableGen. Or am I missing something not shown by a simple grep; some endsWith('Q') weirdness or something?

Jiangning updated this revision to Unknown Object (????).Sep 29 2013, 8:25 PM

Hi Tim,

I removed all newly added xxx_LNQ enums.

Thanks,
-Jiangning

utils/TableGen/NeonEmitter.cpp
1571–1572

Yeah! Kevin also mentioned this problem to me. I uploaded a new version and removed all newly added xxx_LNQ enums.

t.p.northover added inline comments.Sep 30 2013, 2:31 AM
utils/TableGen/NeonEmitter.cpp
1553–1554

Sorry, I should have spotted this earlier, but I think that an fma is semantically distinct from the C expression "a+b*c".

The latter we *can* transform into an fma in -ffast-math mode, but not generally. So we probably need an intrinsic to make sure that if the user asks for a fused multiply, that's the operation they get.

Actually, there'a already a generic llvm intrinsic to handle this: "@llvm.fma.*"; we should probably use that rather than an ARM/AArch64-specific one.

Jiangning added inline comments.Sep 30 2013, 4:21 AM
utils/TableGen/NeonEmitter.cpp
1553–1554

Tim,

I see your point, so do you mean we must generate fmla instruction for intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or not? Then I think we have to generate fmls for intrinsic function vfms_lane_f32() as well. I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to represent it? Since it's a fused calculation, I doubt we should do the latter. I want to confirm with you before modifying the code.

Thanks,
-Jiangning

Hi Jiangning,

I see your point, so do you mean we must generate fmla instruction for intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or not? Then I think we have to generate fmls for intrinsic function vfms_lane_f32() as well.

I believe so.

I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to represent it?

I think I worked out that it was equivalent to @lllvm.fma(-x, y, z)
(and @llvm.fma(x, -y, z)). The negation is exact, and the fusing works
out to be the same for "z + (-x)*y" as for "z - x*y".

By the way, be wary of the operand order. @llvm.fma(x,y,z) calculates
"x*y+z", but "fmla x, y, z" calculates x + y*z. I *think* both me and
Ana got that wrong at least once. I know I did.

Cheers.

Tim.

Jiangning updated this revision to Unknown Object (????).Oct 1 2013, 6:42 AM

Tim,

I modified acle intrinsics for fma/fms by using @llvm.fma.*.

Thanks,
-Jiangning

test/CodeGen/aarch64-neon-2velem.c
5

Add the test without -fp-contract=fast.

t.p.northover accepted this revision.Apr 3 2014, 4:48 AM
Eugene.Zelenko closed this revision.Oct 4 2016, 6:48 PM
Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in rL191945.