This is an archive of the discontinued LLVM Phabricator instance.

Add isGPU() to Triple
AbandonedPublic

Authored by gregrodgers on Dec 19 2016, 9:52 AM.

Download Raw Diff

Details

Reviewers

chandlerc
carlo.bertolli
jlebar
jdoerfert

Summary

For common GPU processing including OpenMP, gpucc (google's open cuda), and OpenCL, this triple.isGPU() method is better than (triple.isNVPTX() || triple.getArch == Triple::amdgcn).

Diff Detail

Event Timeline

gregrodgers updated this revision to Diff 81968.Dec 19 2016, 9:52 AM

gregrodgers retitled this revision from to Add isGPU() to Triple.

gregrodgers updated this object.

gregrodgers added reviewers: jlebar, carlo.bertolli, chandlerc.

gregrodgers set the repository for this revision to rL LLVM.

gregrodgers added a project: Restricted Project.

gregrodgers added a subscriber: gregrodgers.

For common GPU processing including OpenMP, gpucc (google's open cuda), and OpenCL

FYI Google's open CUDA compiler is clang. gpucc does not exist anymore. Google uses clang to compile CUDA code, full stop. http://llvm.org/docs/CompileCudaWithLLVM.html#publication

Please wrap lines to 80 characters. (I can never remember to do this myself, so I have a script that runs git-clang-format over all my patches before uploading them with arc diff.)

I presume you're adding this because you want to use it in some other patch? Could you link that here? Like, this seems sane, but I'd rather not add it if we're not going to use it anywhere.

This revision is now accepted and ready to land.Dec 19 2016, 10:51 AM

Formatted to 80 characters.

jlebar added inline comments.Dec 19 2016, 11:02 AM

include/llvm/ADT/Triple.h
607	Missing period.

Thank you Justin, Yes, I plan to use this extensively in clang for common OpenMP code generation. But I don't have those patches ready yet.
isGPU() may also be used for compilation of cuda code to LLVM IR as alternative to isNVPTX(). I will discuss with google authors first.
I formatted to 80 lines. Thank you for your patience with a new contributor.

I will discuss with google authors first.

You're speaking with one, hi. :) I looked through all uses of isNVPTX in LLVM and clang, and I don't believe any of them should be switched over to isGPU().

I would love to see some of the patches that you're preparing before we submit this, so we can confirm that it's actually useful. Hopefully we can do that without blocking you?

I can email you a bigger patch from our development tree. I would rather not post it in public yet because it still needs some work. Here are two examples from this patch.

diff -Naur ibmv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp amdv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp
--- ibmv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp    2016-12-14 12:02:20.407997118 -0600
+++ amdv2trees/coral-clang/lib/CodeGen/CGBuiltin.cpp    2016-12-14 12:28:50.328752681 -0600
@@ -2455,9 +2455,7 @@
   }
   case Builtin::BIprintf:
     if ((getLangOpts().CUDA && getLangOpts().CUDAIsDevice) ||
-        (getLangOpts().OpenMPIsDevice &&
-         (getTarget().getTriple().getArch() == llvm::Triple::nvptx ||
-          getTarget().getTriple().getArch() == llvm::Triple::nvptx64)))
+        (getLangOpts().OpenMPIsDevice && getTarget().getTriple().isGPU()))
       return EmitCUDADevicePrintfCallExpr(E, ReturnValue);
     break;
   case Builtin::BI__builtin_canonicalize:
@@ -2591,10 +2589,10 @@
   case llvm::Triple::ppc64le:
     return CGF->EmitPPCBuiltinExpr(BuiltinID, E);
   case llvm::Triple::r600:
-  case llvm::Triple::amdgcn:
     return CGF->EmitAMDGPUBuiltinExpr(BuiltinID, E);
   case llvm::Triple::systemz:
     return CGF->EmitSystemZBuiltinExpr(BuiltinID, E);
+  case llvm::Triple::amdgcn:
   case llvm::Triple::nvptx:
   case llvm::Triple::nvptx64:
     return CGF->EmitNVPTXBuiltinExpr(BuiltinID, E);

diff -Naur ibmv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp amdv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp
--- ibmv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp      2016-12-14 12:02:20.459997521 -0600
+++ amdv2trees/coral-clang/lib/Sema/SemaOpenMP.cpp      2016-12-14 12:18:08.223518169 -0600
@@ -6827,7 +6827,7 @@
   // Build iteration variable initializer for simd loop on nvptx.
   bool OutlinedSimd = DKind == OMPD_simd &&
                       SemaRef.getLangOpts().OpenMPIsDevice &&
-                      SemaRef.Context.getTargetInfo().getTriple().isNVPTX();
+                      SemaRef.Context.getTargetInfo().getTriple().isGPU();
   ExprResult LaneInit;
   ExprResult NumLanes;
   if (OutlinedSimd) {
@@ -7132,7 +7132,7 @@
   bool CoalescedSchedule = false;
   bool CoalescedDistSchedule = false;
   if (SemaRef.getLangOpts().OpenMPIsDevice &&
-      SemaRef.Context.getTargetInfo().getTriple().isNVPTX()) {
+      SemaRef.Context.getTargetInfo().getTriple().isGPU()) {
     auto ScheduleClause =
         OMPExecutableDirective::getClausesOfKind<OMPScheduleClause>(Clauses);
     bool ChunkSizeOne = false;

Here are two examples from this patch.

Thanks.

The reason I asked is because we should not assume "isGPU" <--> "isNVPTX || isAMDGCN". So I wanted to see whether there is actually a valid use-case for this function in clang.

The change in CGBuiltin looks exactly like the kind of change I am concerned about. It is assuming that if we are compiling for *any* GPU, we want to lower printf using EmitCUDADevicePrintfCallExpr. I don't think we should do that. (In fact, maybe that function should be called EmitNVPTXDevicePrintfCallExpr.)

The openmp code doesn't yet exist in tree, so I am not sure whether or not that's a good idea.

Given that the patch posted here introduces a bug into CGBuiltin (or, it does if we ever add a third GPU target), I think we should proceed carefully here: I think we should wait to consider this patch until you have a complete patch to show that uses this function.

This revision now requires changes to proceed.Dec 19 2016, 2:38 PM

Justin, the commonality between nvptx and amdgcn LLVM IR is exactly why I would like isGPU(). I actually do want to assume that "isGPU" <--> "isNVPTX || isAMDGCN".

The logic in EmitNVPTXDevicePrintfCallExpr sets up a call to a GPU device library routine called vprintf. For amdgcn, we also need to call a device library function. I could copy EmitNVPTXDevicePrintfCallExpr to a new EmitAMDGPUDevicePrintfCallExpr. However, would it be better to reuse the logic and maintain this in one source file (CGCUDABuiltin.cpp in this case)? OK, I agree that using the names NVPTX feels weird. Maybe we should eventually change this to EmitGPUDevicePrintfCallExpr. We are going to see many cases where the LLVM IR for ptx is quite similar to LLVM IR for amdgcn. For example, in the generation of calls to grid intrinsics and builtins, it is possible to use the same ptx names and convert them to gcn names with inline library routines. GPUs really like inline functions anyway. I am working on a library called libcuda2gcn.bc that will convert cuda builtins to amdgcn builtins and/or amdgcn intrinsics when that library is llvm-linked to the generated amdgcn LLVM IR. This will allow clang codegen logic to be shared between ptx and gcn.

For now, I will expand isGPU() to isNVPTX() || (getArch() ==amdgcn) and comment to change to isGPU() if ever available.

Thank you.

I actually do want to assume that "isGPU" <--> "isNVPTX || isAMDGCN".

OK, I really do not think we should have this function if we're going to use it this way.

If LLVM ever grows support for another GPU architecture that (say) doesn't have a printf function, it will break this code in clang. We should not write code in clang that is fragile in this way.

For a similar reason, I don't think a comment saying "change to isGPU() if available" would be called for here.

gregrodgers abandoned this revision.May 14 2020, 4:06 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptMay 14 2020, 4:06 PM

Herald added subscribers: dexonsmith, Anastasia. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

ADT/

Triple.h

6 lines

Diff 81974

include/llvm/ADT/Triple.h

Show First 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	return getEnvironment() == Triple::Musl \|\|
getEnvironment() == Triple::MuslEABIHF;		getEnvironment() == Triple::MuslEABIHF;
}		}

/// Tests whether the target is NVPTX (32- or 64-bit).		/// Tests whether the target is NVPTX (32- or 64-bit).
bool isNVPTX() const {		bool isNVPTX() const {
return getArch() == Triple::nvptx \|\| getArch() == Triple::nvptx64;		return getArch() == Triple::nvptx \|\| getArch() == Triple::nvptx64;
}		}

		/// Tests whether the target is a GPU
		jlebarUnsubmitted Not Done Reply Inline Actions Missing period. jlebar: Missing period.
		bool isGPU() const {
		return getArch() == Triple::nvptx \|\| getArch() == Triple::nvptx64 \|\|
		getArch() == Triple::amdgcn;
		}

/// Tests wether the target supports comdat		/// Tests wether the target supports comdat
bool supportsCOMDAT() const { return !isOSBinFormatMachO(); }		bool supportsCOMDAT() const { return !isOSBinFormatMachO(); }

/// @}		/// @}
/// @name Mutators		/// @name Mutators
/// @{		/// @{

/// setArch - Set the architecture (first) component of the triple		/// setArch - Set the architecture (first) component of the triple
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines