Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment messages LANGBUILTIN but not used.
We don't need builtins for the ones that just read off an implicit offset from the kern arg pointer.
I also had a partial patch for this that added custom codegen which added !range metadata for the maximum group sizes on the call sites. Do you want to try adding that?
include/clang/Basic/BuiltinsAMDGPU.def | ||
---|---|---|
16–18 | I would move these after the amdgcn and fix this to be r600-NI |
use LANDGBUILTIN
drop amdgcn intrinsics that are just reading off the parameter vector
include/clang/Basic/BuiltinsAMDGPU.def | ||
---|---|---|
88 | sorry, it was kind of hard to parse, I mentioned LANGBUILTIN as a possible alternative. __builtin_clc_get_work_dim() __builtin_clc_get_global_offset(int) ... and implement them for every target | |
test/CodeGenOpenCL/builtins-amdgcn.cl | ||
295 | range metadata is not generated.
and emit each intrinsic manually in
with added range metadata. I can take a look but I think it should be a separate patch |
I'll add range metadata to this patch. It should be easier to do it right away than to synch changes with llvm to avoid test failures.
emit intrinsics manually and add range metadata.
not sure where to get authoritative source on global size/wg_id limits.
clover reports 256, but that seems bogus.
I've been using 2048. That is the theoretical limit on SI. The limit might have decreased to 1024 on newer hardware
AMD OpenCL has only ever exposed 256 though. Supposedly there was no benefit to larger groups, and I think there might be some hardware bugs at 1024
lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
7648–7661 | Are you sure 256 is the upper bounds for these? |
lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
7648–7661 | I'm pretty sure it's not. There was a short discussion earlier in this revision. Larger sets can be faked using GDS, but since there is no lower bound in OpenCL it'd be nice to have (efficient) hw limits here |
lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
7648–7661 | I think we should use 1024 here, Using a number too low will generate incorrect code. It's also probably same to assume the 1024 is limit for EG/CM too. A follow up improvement might be to check for OpenCL related function attributes, like reqd_work_group_size and use that to emit a smaller range. |
I would move these after the amdgcn and fix this to be r600-NI