This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add clang builtins support for gfx90a
ClosedPublic

Authored by gandhi21299 on Jul 27 2021, 12:55 PM.

Details

Summary

Implement target builtins for gfx90a including fadd64, fadd32, add2h, max and min on various global, flat and ds address spaces for which intrinsics are already implemented.

@rampitec Compiler recommended me to add global-noret target feature after setting it in BuiltinsAMDGPU.def. I am not sure what that means outside of the BuiltinsAMDGPU.def so I have changed it back to gfx90a-insts.

Diff Detail

Event Timeline

gandhi21299 created this revision.Jul 27 2021, 12:55 PM
gandhi21299 requested review of this revision.Jul 27 2021, 12:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2021, 12:55 PM
  • added more tests with supported combinations of address spaces
  • minor nits
gandhi21299 updated this revision to Diff 362487.EditedJul 28 2021, 11:54 AM
  • added scope argument
  • removed irrelevant tests and updated them
  • replaced constant by generic qualifier for two of the test functions
  • removed kernel from functions taking in __generic qualified addr
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2021, 1:42 PM
  • eliminated scope argument, seemed irrelevant

fixed tests by replacing checks for the functions with generic-qualified arguments.

Missing sema test for rejecting the builtins on targets that don't support this

clang/include/clang/Basic/BuiltinsAMDGPU.def
201

"_2f16" looks weird to me. The instruction names call it "pk"

clang/lib/CodeGen/CGBuiltin.cpp
16247

Initializing this here is strange, sink down to the double case

clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗(On Diff #362804)

If you're going to bother testing the ISA, is it worth testing rtn and no rtn versions?

yaxunl added inline comments.Jul 30 2021, 6:47 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
201

This is to have a consistent postfix naming convention, since the stem part here are the same. the postfix is for the argument type of the builtin function.

gandhi21299 added inline comments.Jul 30 2021, 8:30 AM
clang/lib/CodeGen/CGBuiltin.cpp
16247

You mean push it down after line 16119?

clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗(On Diff #362804)

Sorry, what do you mean by rtn version?

arsenm added inline comments.Jul 30 2021, 9:02 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
201

Just a plain 2 isn't consistent either. The llvm type naming convention would add a v prefix, but the builtins should probably follow the instructions

clang/lib/CodeGen/CGBuiltin.cpp
16247

Yes

clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗(On Diff #362804)

Most atomics can be optimized if they don't return the in memory value if the value is unused

gandhi21299 marked 7 inline comments as done.Jul 30 2021, 11:10 AM
gandhi21299 added inline comments.
clang/include/clang/Basic/BuiltinsAMDGPU.def
201

Yea, v2f16 looks reasonable.

gandhi21299 marked an inline comment as done.

addressed reviewers' feedback:

  • changed a builtin name,
  • corrected tests,
  • minor formatting nits

refreshing patch

rampitec requested changes to this revision.Aug 2 2021, 2:53 PM

Needs an IR test, a test for different supported targets, and a negative test for unsupported features.

clang/include/clang/Basic/BuiltinsAMDGPU.def
199

Correct attribute for this one in atomic-fadd-insts. In particular it was first added in gfx908 and you would need to test it too.

205

Flat address space is 0.

210

This is available since gfx8. Attribute gfx8-insts.

clang/lib/CodeGen/CGBuiltin.cpp
16208

You do not need any of that code. You can directly map a builtin to intrinsic in the IntrinsicsAMDGPU.td.

clang/test/CodeGenOpenCL/builtins-fp-atomics.cl
112 ↗(On Diff #362804)

Certainly yes, because global_atomic_add_f32 did not have return version on gfx908.

This revision now requires changes to proceed.Aug 2 2021, 2:53 PM
rampitec added a reviewer: t-tye.
gandhi21299 marked 3 inline comments as done.Aug 2 2021, 3:45 PM
gandhi21299 added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
16208

Sorry, I looked around for several days but I could not figure this out. Is there a concrete example?

rampitec added inline comments.Aug 2 2021, 3:46 PM
clang/lib/CodeGen/CGBuiltin.cpp
16208

Every instantiation of GCCBuiltin in the IntrinsicsAMDGPU.td.

arsenm added inline comments.Aug 2 2021, 3:47 PM
clang/lib/CodeGen/CGBuiltin.cpp
16208

This is not true if the intrinsic requires type mangling. GCCBuiltin is too simple to handle it

rampitec added inline comments.Aug 2 2021, 3:49 PM
clang/lib/CodeGen/CGBuiltin.cpp
16208

Yes, but these do not need it. All of these builtins are specific.

arsenm added inline comments.Aug 2 2021, 3:50 PM
clang/lib/CodeGen/CGBuiltin.cpp
16208

These intrinsics are all mangled based on the FP type

rampitec added inline comments.Aug 2 2021, 3:52 PM
clang/lib/CodeGen/CGBuiltin.cpp
16208

Ah, right. Intrinsics are mangled, builtins not. True. OK, this shall be code then.

rampitec added inline comments.Aug 2 2021, 3:54 PM
clang/lib/CodeGen/CGBuiltin.cpp
16266

Should we map flags since we already have them?

gandhi21299 marked 5 inline comments as done.Aug 3 2021, 8:31 AM
gandhi21299 added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
16266

Do you mean the memory order flag?

gandhi21299 marked an inline comment as done.Aug 3 2021, 8:37 AM

@rampitec how do I handle the following?

builtins-fp-atomics.cl:38:10: error: '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature atomic-fadd-insts
  *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x, memory_order_relaxed);
         ^

@rampitec how do I handle the following?

builtins-fp-atomics.cl:38:10: error: '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature atomic-fadd-insts
  *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x, memory_order_relaxed);
         ^

It is f64, it needs gfx90a-insts. atomic-fadd-insts is for global f32.

clang/lib/CodeGen/CGBuiltin.cpp
16266

All 3: ordering, scope and volatile.

@rampitec what should I be testing exactly in the IR test?

@rampitec what should I be testing exactly in the IR test?

Produced call to the intrinsic. All of these tests there doing that.

gandhi21299 added inline comments.Aug 3 2021, 3:52 PM
clang/lib/CodeGen/CGBuiltin.cpp
16266

Following the discussion, what change is required here?

rampitec added inline comments.Aug 3 2021, 3:53 PM
clang/lib/CodeGen/CGBuiltin.cpp
16266

Keep zeroes, drop immediate argument of the builtins.

updating test

gandhi21299 added inline comments.Aug 4 2021, 9:13 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
201

I tried add target feature gfx908-insts for this builtin but the frontend complains that it should have target feature gfx90a-insts.

foad removed a subscriber: foad.Aug 4 2021, 9:17 AM
rampitec added inline comments.Aug 4 2021, 10:11 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
201

That was for global_atomic_fadd_f32, but as per discussion we are going to use builtin only starting from gfx90a because of the noret problem. Comments in the review are off their positions after multiple patch updates.

210

This needs tests with a gfx8 target and a negative test with gfx7.

clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
14

Use _f64 or _double in the test name.

32

Same here and in other double tests, use a suffix f64 or double.

68

'constant' is wrong. It is flat. Here and everywhere.

clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx908.cl
7 ↗(On Diff #364152)

Need to check all other buintins too.

gandhi21299 marked 9 inline comments as done.Aug 4 2021, 1:54 PM
gandhi21299 marked an inline comment as done.
  • added more negative tests
  • fixed some tests
rampitec added inline comments.Aug 4 2021, 2:40 PM
clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl
8 ↗(On Diff #364242)

Add new line.

clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl
1 ↗(On Diff #364242)

Combine all of these gfx908 error tests into a single file. For example like in the builtins-amdgcn-dl-insts-err.cl. It is also better to rename these test filenames to follow the existing pattern: builtins-amdgcn-*-err.cl

gandhi21299 marked 2 inline comments as done.Aug 4 2021, 3:13 PM
  • combined tests into a single file
  • renamed tests for consistency
rampitec accepted this revision.Aug 4 2021, 3:17 PM

LGTM

This revision is now accepted and ready to land.Aug 4 2021, 3:17 PM

Thanks a lot for the review, I will merge this patch in :)

I will merge this patch in as soon as the builds are successful.

This revision was automatically updated to reflect the committed changes.