Page MenuHomePhabricator

Add isGPU() to Triple

Authored by gregrodgers on Dec 19 2016, 9:52 AM.



For common GPU processing including OpenMP, gpucc (google's open cuda), and OpenCL, this triple.isGPU() method is better than (triple.isNVPTX() || triple.getArch == Triple::amdgcn).

Diff Detail

Event Timeline

gregrodgers retitled this revision from to Add isGPU() to Triple.
gregrodgers updated this object.
gregrodgers set the repository for this revision to rL LLVM.
gregrodgers added a project: Restricted Project.
gregrodgers added a subscriber: gregrodgers.
jlebar accepted this revision.Dec 19 2016, 10:51 AM
jlebar edited edge metadata.

For common GPU processing including OpenMP, gpucc (google's open cuda), and OpenCL

FYI Google's open CUDA compiler is clang. gpucc does not exist anymore. Google uses clang to compile CUDA code, full stop.

Please wrap lines to 80 characters. (I can never remember to do this myself, so I have a script that runs git-clang-format over all my patches before uploading them with arc diff.)

I presume you're adding this because you want to use it in some other patch? Could you link that here? Like, this seems sane, but I'd rather not add it if we're not going to use it anywhere.

This revision is now accepted and ready to land.Dec 19 2016, 10:51 AM
gregrodgers edited edge metadata.
gregrodgers removed rL LLVM as the repository for this revision.

Formatted to 80 characters.

jlebar added inline comments.Dec 19 2016, 11:02 AM

Missing period.

Thank you Justin, Yes, I plan to use this extensively in clang for common OpenMP code generation. But I don't have those patches ready yet.
isGPU() may also be used for compilation of cuda code to LLVM IR as alternative to isNVPTX(). I will discuss with google authors first.
I formatted to 80 lines. Thank you for your patience with a new contributor.

I will discuss with google authors first.

You're speaking with one, hi. :) I looked through all uses of isNVPTX in LLVM and clang, and I don't believe any of them should be switched over to isGPU().

I would love to see some of the patches that you're preparing before we submit this, so we can confirm that it's actually useful. Hopefully we can do that without blocking you?

gregrodgers added a comment.EditedDec 19 2016, 2:22 PM

I can email you a bigger patch from our development tree. I would rather not post it in public yet because it still needs some work. Here are two examples from this patch.

diff -Naur ibmv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp amdv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp
--- ibmv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp    2016-12-14 12:02:20.407997118 -0600
+++ amdv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp    2016-12-14 12:28:50.328752681 -0600
@@ -2455,9 +2455,7 @@
   case Builtin::BIprintf:
     if ((getLangOpts().CUDA && getLangOpts().CUDAIsDevice) ||
-        (getLangOpts().OpenMPIsDevice &&
-         (getTarget().getTriple().getArch() == llvm::Triple::nvptx ||
-          getTarget().getTriple().getArch() == llvm::Triple::nvptx64)))
+        (getLangOpts().OpenMPIsDevice && getTarget().getTriple().isGPU()))
       return EmitCUDADevicePrintfCallExpr(E, ReturnValue);
   case Builtin::BI__builtin_canonicalize:
@@ -2591,10 +2589,10 @@
   case llvm::Triple::ppc64le:
     return CGF->EmitPPCBuiltinExpr(BuiltinID, E);
   case llvm::Triple::r600:
-  case llvm::Triple::amdgcn:
     return CGF->EmitAMDGPUBuiltinExpr(BuiltinID, E);
   case llvm::Triple::systemz:
     return CGF->EmitSystemZBuiltinExpr(BuiltinID, E);
+  case llvm::Triple::amdgcn:
   case llvm::Triple::nvptx:
   case llvm::Triple::nvptx64:
     return CGF->EmitNVPTXBuiltinExpr(BuiltinID, E);

diff -Naur ibmv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp amdv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp
--- ibmv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp      2016-12-14 12:02:20.459997521 -0600
+++ amdv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp      2016-12-14 12:18:08.223518169 -0600
@@ -6827,7 +6827,7 @@
   // Build iteration variable initializer for simd loop on nvptx.
   bool OutlinedSimd = DKind == OMPD_simd &&
                       SemaRef.getLangOpts().OpenMPIsDevice &&
-                      SemaRef.Context.getTargetInfo().getTriple().isNVPTX();
+                      SemaRef.Context.getTargetInfo().getTriple().isGPU();
   ExprResult LaneInit;
   ExprResult NumLanes;
   if (OutlinedSimd) {
@@ -7132,7 +7132,7 @@
   bool CoalescedSchedule = false;
   bool CoalescedDistSchedule = false;
   if (SemaRef.getLangOpts().OpenMPIsDevice &&
-      SemaRef.Context.getTargetInfo().getTriple().isNVPTX()) {
+      SemaRef.Context.getTargetInfo().getTriple().isGPU()) {
     auto ScheduleClause =
     bool ChunkSizeOne = false;
jlebar requested changes to this revision.Dec 19 2016, 2:38 PM
jlebar edited edge metadata.

Here are two examples from this patch.


The reason I asked is because we should not assume "isGPU" <--> "isNVPTX || isAMDGCN". So I wanted to see whether there is actually a valid use-case for this function in clang.

The change in CGBuiltin looks exactly like the kind of change I am concerned about. It is assuming that if we are compiling for *any* GPU, we want to lower printf using EmitCUDADevicePrintfCallExpr. I don't think we should do that. (In fact, maybe that function should be called EmitNVPTXDevicePrintfCallExpr.)

The openmp code doesn't yet exist in tree, so I am not sure whether or not that's a good idea.

Given that the patch posted here introduces a bug into CGBuiltin (or, it does if we ever add a third GPU target), I think we should proceed carefully here: I think we should wait to consider this patch until you have a complete patch to show that uses this function.

This revision now requires changes to proceed.Dec 19 2016, 2:38 PM

Justin, the commonality between nvptx and amdgcn LLVM IR is exactly why I would like isGPU(). I actually do want to assume that "isGPU" <--> "isNVPTX || isAMDGCN".

The logic in EmitNVPTXDevicePrintfCallExpr sets up a call to a GPU device library routine called vprintf. For amdgcn, we also need to call a device library function. I could copy EmitNVPTXDevicePrintfCallExpr to a new EmitAMDGPUDevicePrintfCallExpr. However, would it be better to reuse the logic and maintain this in one source file (CGCUDABuiltin.cpp in this case)? OK, I agree that using the names NVPTX feels weird. Maybe we should eventually change this to EmitGPUDevicePrintfCallExpr. We are going to see many cases where the LLVM IR for ptx is quite similar to LLVM IR for amdgcn. For example, in the generation of calls to grid intrinsics and builtins, it is possible to use the same ptx names and convert them to gcn names with inline library routines. GPUs really like inline functions anyway. I am working on a library called libcuda2gcn.bc that will convert cuda builtins to amdgcn builtins and/or amdgcn intrinsics when that library is llvm-linked to the generated amdgcn LLVM IR. This will allow clang codegen logic to be shared between ptx and gcn.

For now, I will expand isGPU() to isNVPTX() || (getArch() ==amdgcn) and comment to change to isGPU() if ever available.

Thank you.

I actually do want to assume that "isGPU" <--> "isNVPTX || isAMDGCN".

OK, I really do not think we should have this function if we're going to use it this way.

If LLVM ever grows support for another GPU architecture that (say) doesn't have a printf function, it will break this code in clang. We should not write code in clang that is fragile in this way.

For a similar reason, I don't think a comment saying "change to isGPU() if available" would be called for here.

gregrodgers abandoned this revision.May 14 2020, 4:06 PM