When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Details
- Reviewers
jlebar echristo - Commits
- rG24e8a680e545: [NVPTX, CUDA] Improved feature constraints on NVPTX target builtins.
rG2f8efcf3ca05: [NVPTX] Removed 'satom' feature which is no longer used.
rL329830: [NVPTX] Removed 'satom' feature which is no longer used.
rC329830: [NVPTX] Removed 'satom' feature which is no longer used.
rL329829: [NVPTX, CUDA] Improved feature constraints on NVPTX target builtins.
rC329829: [NVPTX, CUDA] Improved feature constraints on NVPTX target builtins.
Diff Detail
- Build Status
Buildable 16950 Build 16950: arc lint + arc unit
Event Timeline
Run the tests with -target-cpu sm_61 to make sure intrinsics that require sm_60 are still enabled.
clang/include/clang/Basic/Cuda.h | ||
---|---|---|
55 ↗ | (On Diff #140334) | Why 'static'? |
59 ↗ | (On Diff #140334) | Could we instead rely on the fact that the CudaArch enum is dense and goes from UNKNOWN to LAST? One less bug to make... |
clang/lib/Basic/Targets/NVPTX.cpp | ||
176 | Nit, if you spell it /*FeatureMap*/ here, it'll be clearer that you're intentionally ignoring that arg. |
clang/include/clang/Basic/Cuda.h | ||
---|---|---|
55 ↗ | (On Diff #140334) | Old habits. Just 'inline' should do. |
59 ↗ | (On Diff #140334) | Actually this function is not needed. It's a leftover from an old iteration of the code. I'll remove it. |
The llvm change and corresponding switch from satom->sm_60 in the front end is fine.
Let's talk about the rest of it more. I'm not sure I'm seeing the need here rather than the annotations that are already here. Can you elaborate more here on why we need an additional method when you've already got subtarget features for each of the ptx versions anyhow?
Thanks.
-eric
The patch intends to address two issues:
- mismatch on constraints between llvm and clang. E.g. hasPTX60() on LLVM side means "ptx60 or newer". "ptx60" in TARGET_BUILTIN on clang side means "ptx60" *only*. It is possible to address this within existing implementation by enumerating all PTX versions that are newer. It works OK for ptx60 as we only need to write "ptx60|ptx61". It gets more interesting for older GPUs E.g "ptx31+" would have to become "ptx31|ptx32|ptx40|ptx41|ptx42|ptx43|ptx50|ptx60|ptx61". Similar enumeration will need to happen for GPU version, which brings us to the next point
- NVIDIA keeps growing PTX versions and GPU variants. Recently they've changed CUDA release frequency to ~1/quarter and they tend to add minor variants of PTX and GPU versions fairly frequently. It's going to be an unnecessary maintenance headache as after every new introduced variant I'll need to go and update all NVPTX builtins that have nothing to do with the CUDA changes. Granted, it's not a showstopper, but it is an annoyance that is guaranteed to stay.
With this patch, TARGET_BUILTIN constraints become semantically identical to LLVM and we no longer need to chase every bump in PTX version or GPU variant.
@echristo convinced me that this functionality can be implemented without growing a target-specific hook for custom interpretation of constraints used in TARGET_BUILTIN. Instead, we can hide unwieldy feature lists behind a macro.
Guessing that SM_60 (etc) are defines?
Anyhow LGTM. I'm not sure you can split up the satom part of the patch, but if you can that'd be great.
-eric
Nit, if you spell it /*FeatureMap*/ here, it'll be clearer that you're intentionally ignoring that arg.