This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Export target workitem related builtins
ClosedPublic

Authored by jvesely on May 16 2016, 1:33 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

jvesely updated this revision to Diff 57391.May 16 2016, 1:33 PM
jvesely retitled this revision from to AMDGPU: Export target workitem related builtins.
jvesely updated this object.
jvesely added a reviewer: tstellarAMD.
jvesely set the repository for this revision to rL LLVM.
jvesely added subscribers: arsenm, llvm-commits.

Comment messages LANGBUILTIN but not used.

We don't need builtins for the ones that just read off an implicit offset from the kern arg pointer.

I also had a partial patch for this that added custom codegen which added !range metadata for the maximum group sizes on the call sites. Do you want to try adding that?

arsenm added inline comments.May 16 2016, 1:46 PM
include/clang/Basic/BuiltinsAMDGPU.def
16–18

I would move these after the amdgcn and fix this to be r600-NI

jvesely updated this revision to Diff 57512.May 17 2016, 1:06 PM
jvesely removed rL LLVM as the repository for this revision.

use LANDGBUILTIN
drop amdgcn intrinsics that are just reading off the parameter vector

jvesely marked an inline comment as done.May 17 2016, 1:06 PM
jvesely updated this revision to Diff 57947.May 20 2016, 10:06 AM
jvesely updated this object.
jvesely set the repository for this revision to rL LLVM.

add kernarg segment intrinsic

jvesely updated this revision to Diff 57950.May 20 2016, 10:14 AM

kernarg segment return const as pointer

arsenm added inline comments.May 20 2016, 10:35 AM
include/clang/Basic/BuiltinsAMDGPU.def
88

These shouldn't be lang builtin, I just meant the commit message said so

test/CodeGenOpenCL/builtins-amdgcn.cl
295

range metadata check?

jvesely updated this revision to Diff 57960.May 20 2016, 11:28 AM

revert back to BUILTIN

jvesely marked an inline comment as done.May 22 2016, 4:47 PM
jvesely added inline comments.
include/clang/Basic/BuiltinsAMDGPU.def
88

sorry, it was kind of hard to parse, I mentioned LANGBUILTIN as a possible alternative.
another alternative is to have clc specific builtins, e.g

__builtin_clc_get_work_dim()
__builtin_clc_get_global_offset(int)
...

and implement them for every target

test/CodeGenOpenCL/builtins-amdgcn.cl
295

range metadata is not generated.
if I understand the code correctly, we'd need to drop __builtin names from

include/llvm/IR/IntrinsicsAMDGPU.td

and emit each intrinsic manually in

lib/CodeGen/CGBuiltin.cpp

with added range metadata.

I can take a look but I think it should be a separate patch

tstellarAMD accepted this revision.May 24 2016, 5:41 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.May 24 2016, 5:41 PM
jvesely planned changes to this revision.May 26 2016, 7:57 AM
jvesely marked an inline comment as done.

I'll add range metadata to this patch. It should be easier to do it right away than to synch changes with llvm to avoid test failures.

jvesely updated this revision to Diff 58669.May 26 2016, 12:52 PM
jvesely edited edge metadata.

emit intrinsics manually and add range metadata.
not sure where to get authoritative source on global size/wg_id limits.
clover reports 256, but that seems bogus.

This revision is now accepted and ready to land.May 26 2016, 12:52 PM
jvesely requested a review of this revision.May 26 2016, 12:53 PM
jvesely edited edge metadata.

it'd be nice if the accepted state reset after updating the diff...

I've been using 2048. That is the theoretical limit on SI. The limit might have decreased to 1024 on newer hardware

AMD OpenCL has only ever exposed 256 though. Supposedly there was no benefit to larger groups, and I think there might be some hardware bugs at 1024

AMD OpenCL has only ever exposed 256 though. Supposedly there was no benefit to larger groups, and I think there might be some hardware bugs at 1024

OK, let's stick with 256

jvesely updated this revision to Diff 58685.May 26 2016, 1:34 PM
jvesely edited edge metadata.

only local size and wi id are range restricted

jvesely marked 2 inline comments as done.May 26 2016, 1:35 PM
jvesely updated this revision to Diff 58717.May 26 2016, 4:13 PM

ids are [0,256), size is [1,257)

tstellarAMD accepted this revision.May 27 2016, 4:54 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.May 27 2016, 4:54 PM
jvesely updated this revision to Diff 59005.May 30 2016, 4:24 PM
jvesely edited edge metadata.

drop "segment" from the implicitarg.ptr builtin

jvesely updated this revision to Diff 59081.May 31 2016, 8:57 AM

fixup test after renaming implicitarg builtin

jvesely updated this revision to Diff 62536.Jul 1 2016, 2:01 PM

switch r600 to implicitarg.ptr

jvesely requested a review of this revision.Jul 1 2016, 2:01 PM
jvesely edited edge metadata.
tstellarAMD added inline comments.Jul 4 2016, 5:29 PM
lib/CodeGen/CGBuiltin.cpp
7648–7661

Are you sure 256 is the upper bounds for these?

jvesely added inline comments.Jul 4 2016, 8:09 PM
lib/CodeGen/CGBuiltin.cpp
7648–7661

I'm pretty sure it's not. There was a short discussion earlier in this revision.
OpenGL requires at least 1024x1024x64 (1024 total) for compute shaders, so I'd say hw supports at least those sizes.
EG/CM ISA specs don't say.
SI/CI/VI ISA specs say 1024.
Mesa exposes either 256 or 2048.

Larger sets can be faked using GDS, but since there is no lower bound in OpenCL it'd be nice to have (efficient) hw limits here

tstellarAMD added inline comments.Jul 5 2016, 4:55 AM
lib/CodeGen/CGBuiltin.cpp
7648–7661

I think we should use 1024 here, Using a number too low will generate incorrect code. It's also probably same to assume the 1024 is limit for EG/CM too.

A follow up improvement might be to check for OpenCL related function attributes, like reqd_work_group_size and use that to emit a smaller range.

jvesely updated this revision to Diff 62767.Jul 5 2016, 9:55 AM
jvesely edited edge metadata.

change limit to 1024

jvesely marked 3 inline comments as done.Jul 5 2016, 9:55 AM
tstellarAMD accepted this revision.Jul 8 2016, 5:51 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Jul 8 2016, 5:51 PM
This revision was automatically updated to reflect the committed changes.