Page MenuHomePhabricator

AMDGPU: Fix supported CL features

Authored by jvesely on May 18 2016, 2:46 PM.

Diff Detail


Event Timeline

jvesely updated this revision to Diff 57682.May 18 2016, 2:46 PM
jvesely retitled this revision from to AMDGPU: Fix supported CL features.
jvesely updated this object.
jvesely added reviewers: arsenm, tstellarAMD.
jvesely set the repository for this revision to rL LLVM.
jvesely added subscribers: llvm-commits, Anastasia, bader, yaxunl.
jvesely updated this object.May 18 2016, 2:56 PM
jvesely updated this revision to Diff 57692.May 18 2016, 3:42 PM

add tests
EG support the same feature set as NI

yaxunl added inline comments.May 19 2016, 7:18 AM
5 ↗(On Diff #57692)
#pragma OPENCL EXTENSION cl_xxx : enable
jvesely updated this revision to Diff 57829.May 19 2016, 11:35 AM

remove subgroups support.
better tests.

note: I had to add OPENCL_VERSION defines manually, these can be removed once clang correctly reports the version

jvesely updated this revision to Diff 57845.May 19 2016, 12:57 PM

even better tests
use existing OPENCL_C_VERSION
use -cl-std=CL for explicit OpenCL1.0

arsenm added inline comments.May 25 2016, 11:48 AM
2111 ↗(On Diff #57845)

What is the current problem with subgroups? There's no reason we can't support this

jvesely planned changes to this revision.May 25 2016, 2:51 PM
jvesely marked an inline comment as done.

I'll update this patch based on the outcome of D20447

2111 ↗(On Diff #57845)

no problem (other than not implementing any of the required builtins, or exposing cl2.0), since having 1 subgroup per threadgroup is perfectly OK (dunno why this is even an extension).
The bigger question is whether we want to export all potentially available extensions, or only those that are actually implemented.

jvesely updated this revision to Diff 58850.May 27 2016, 3:38 PM
jvesely updated this object.
jvesely marked an inline comment as done.
jvesely added a subscriber: cfe-commits.

report only exported extensions.

int64 atomics, fp16, and 3d image writes are actually not currently exported, but should be more or less done for SI+

jvesely marked an inline comment as done.May 27 2016, 3:38 PM
arsenm added inline comments.May 27 2016, 4:06 PM
4 ↗(On Diff #58850)

We should probably add builtin device macros for these

jvesely added inline comments.May 29 2016, 10:44 AM
4 ↗(On Diff #58850)

When we ran into similar problem while adding fma. We opted for feature macro(HAS_FMAF) instead of device macro. I suppose I can add HAS_FP64, but it won't have many uses since everyone checks for cl_khr_fp64.

jvesely updated this revision to Diff 59080.May 31 2016, 8:55 AM
jvesely updated this object.

add has_fp64 macro

jvesely marked 2 inline comments as done.May 31 2016, 8:56 AM
arsenm added inline comments.May 31 2016, 10:20 AM
2024–2025 ↗(On Diff #59080)

I don't think we need this. I want device macros for other tuning and intrinsic availability reasons. Right now there are builtins that only work on some subtargets but no way to test for that

jvesely added inline comments.May 31 2016, 11:50 AM
2024–2025 ↗(On Diff #59080)

Why not have macro per feature that determines intrinsic/optimization availability?
it was preferred last year [0]
at least on r600 it works nicer than separating EG/EG+FP64/NI_but_eg_isa/NI+FP64+CM_ISA
and the feature selection is done in two places (llvm+clang) instead of every piece of compiled code.


Anastasia added inline comments.Jun 1 2016, 8:33 AM
1 ↗(On Diff #59080)

Negative testing?

jvesely updated this revision to Diff 60259.Jun 9 2016, 3:20 PM

tests all extensions against expected outcome (add negative tests)
enable cl_khr_icd (works ok with mesa)

jvesely marked an inline comment as done.Jun 9 2016, 3:20 PM
arsenm accepted this revision.Jun 16 2016, 5:12 PM
arsenm edited edge metadata.


This revision is now accepted and ready to land.Jun 16 2016, 5:12 PM
This revision was automatically updated to reflect the committed changes.