Page MenuHomePhabricator

[AArch64] Add ARMv8.2-A FP16 scalar intrinsics
ClosedPublic

Authored by az on Jan 5 2018, 4:41 PM.

Details

Reviewers
SjoerdMeijer
Summary

ARMv8.2-A introduces half-precision floating point data processing. This patch adds the fp16 scalar intrinsics for this architecture as described in the ARM ACLE document. Only the front-end intrinsic work is done here. Some backend work related to instruction selection still needs to be done.

This work is a continuation of https://reviews.llvm.org/D32511 which addressed ARMv8.2-A vector intrinsics.

Diff Detail

Event Timeline

az created this revision.Jan 5 2018, 4:41 PM
fhahn added a subscriber: fhahn.Jan 6 2018, 1:00 PM

Thanks for working on this!
Some comments inline.

clang/include/clang/Basic/arm_fp16.td
20

nit: trailing whitespace.

59

There's a little bit of duplication here: the definitions above are the same as in arm_neon.td. Would it be easy to share this, with e.g. an include?

80

trailing whitespace

89

trailing whitespace

90

Nit: for the definitions below, indentation is sometimes a bit off. I.e. some defs have 1 space after the semicolon others have 2.

clang/lib/CodeGen/CGBuiltin.cpp
4104

Looks like a few intrinsic descriptions are missing here. For example, the first 2-operand intrinsic vaddh_f16 is missing, but there are also more. Is this intentional, or might they have slipped through the cracks (or am I missing something)?

llvm/include/llvm/IR/IntrinsicsAArch64.td
149

This and the other changes in this file are changes to LLVM. Do we need these changes for this patch? It doesn't look like it.

360

Forgot to remove this?

az updated this revision to Diff 129360.Jan 10 2018, 4:06 PM
az marked 8 inline comments as done.Jan 10 2018, 4:54 PM
az added inline comments.
clang/include/clang/Basic/arm_fp16.td
59

The duplication is small compared to the overall infrastructure/data structure needed to automatically generate the intrinsics. There are 3 ways to do this: 1) copy only the needed data structure in arm_fp16.td (this is what was done in original review) 2) put all data structure in a newly created file and include it in arm_neon.td and arm_fp16.td (done here). 3) put only the duplication in a new file and include it. I did not go for this one given that we create a new file for the only purpose of avoiding a small duplication but I am fine of going with 3 too. Note that some of the duplicated structure in the original arm_fp16.td was a stripped down version of the copied one.

clang/lib/CodeGen/CGBuiltin.cpp
4104

I agree that this is confusing. For the intrinsics listed in this table, code generation happens in a generic way based on the info in the table. The ones not listed in this table are addressed in a more specific way below in a the function called EmitAArch64BuiltinExpr. While I do not like how few things were implemented in generating the intrinsics, I am in general following the approach taken for arm_neon instead of introducing a new approach.

llvm/include/llvm/IR/IntrinsicsAArch64.td
149

Some tests in aarch64-v8.2a-fp16-intrinsics.c will fail for me without these changes. In clang/lib/CodeGen/BackendUtil.cpp, there is code there that includes llvm files and header files. It fails there if I do not fix IntrinsicAArch64.td. If you know of a better way to test differently without the need for llvm, then let me know. For example, if I remove the option flag -S from testing (i.e from aarch64-v8.2a-fp16-intrinsics.c), then there is no need to llvm but I won't be able to compare results.

SjoerdMeijer added inline comments.Jan 11 2018, 1:35 AM
clang/include/clang/Basic/arm_fp16.td
59

Given that the duplication is tiny, I don't have strong opinions to be honest. Would be nice to share these definitions if that's easy to do, otherwise we can perfectly live with this I think.

clang/lib/CodeGen/CGBuiltin.cpp
4104

Ah, I see, I somehow missed that. Fair enough.

6511

nit: indentation seems off by 1 for these case statements.

6531

nit: indentation seems off by 1

clang/utils/TableGen/NeonEmitter.cpp
2464

more nits: I see this is copied from above, but I think this and the next line can be on the same line, just increasing readability a tiny bit.

2468

same here

2470

and same here

llvm/include/llvm/IR/IntrinsicsAArch64.td
250

There's a scalar and vector variant of FMAX and thus I am wondering if we don't need two definitions here: one using AdvSIMD_2FloatArg_Intrinsic and the other AdvSIMD_2VectorArg_Intrinsic?

262

Same here for FMIN

az updated this revision to Diff 129513.Jan 11 2018, 1:50 PM
az marked 3 inline comments as done.
az marked 6 inline comments as done.Jan 11 2018, 2:26 PM
az added inline comments.
clang/include/clang/Basic/arm_fp16.td
59

So, let's keep the current version for now which is: all generic stuff goes into the file called arm_neon_incl.td. All specific code that creates and generates the intrinsic goes into specific files arm_neon.td and arm_fp16.td which include the generic file. This will work well when we create new .td file for future features if any.

llvm/include/llvm/IR/IntrinsicsAArch64.td
250

Maybe we can do that but there are many instances where vector and scalar share the same intrinsic name ( note that the type such as f16 or v4f32 is appended to that name). I have not checked carefully if what you propose already exists or not but it does seem less common from first look.

This revision is now accepted and ready to land.Jan 17 2018, 12:39 AM
SjoerdMeijer closed this revision.Feb 8 2018, 7:15 AM

Committed as r323005