This is an archive of the discontinued LLVM Phabricator instance.

That seems important. What was the symptom of failing to set these? We may now be redundantly setting some, e.g.
I think convergent is set somewhere else before this patch.

Added a few people who may be able to run the patch against nvptx to check for regressions. If there isn't already a nvptx attribute test we should commit one before this patch so we can see the change. If there is already one, then we haven't changed it here and all is good.

Edit: (Looked for an nvptx equivalent and can't see one, but that doesn't mean it's not already there somewhere)

In D113538#3121062, @JonChesterfield wrote:

That seems important. What was the symptom of failing to set these? We may now be redundantly setting some, e.g.
I think convergent is set somewhere else before this patch.

A bunch of missing attributes on the kernel. The one I noticed was not setting amdgpu-implicitarg-num-bytes (although D112488 avoids needing to do that), but we have a few other attributes that simply wouldn't be set. I'm fighting with some divergence between upstream and the internal branches with these attributes. In particular, the internal branch is hacking on the generic attributes for openmp, and also redundantly (and incorrectly) setting amdgpu-flat-work-group-size to the invalid range 257,257.

convergent isn't a target attribute and isn't the target's responsibility to add.

Also test non-kernel

Harbormaster completed remote builds in B133466: Diff 386137.Nov 10 2021, 6:45 AM

arsenm mentioned this in D112488: AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default.Nov 10 2021, 5:20 PM

ping

Apologies, I thought I had already accepted this. Thanks for the patch!

This revision is now accepted and ready to land.Nov 16 2021, 7:02 AM

Is the behaviour change in the above comments intentional? Pointed out by @estewart08

clang/lib/CodeGen/TargetInfo.cpp
9208	Here, we skip the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments if FD is nullptr
9288	Here, we do the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments regardless of whether FD is true or not

JonChesterfield added inline comments.Nov 18 2021, 7:28 AM

clang/lib/CodeGen/TargetInfo.cpp
9288	Cancel that, there's a IsOpenMP = ...&& !FD here. Failed to follow the control flow.

JonChesterfield added inline comments.Nov 18 2021, 7:42 AM

clang/lib/CodeGen/TargetInfo.cpp
9288	Are we looking for `if (AMDGPU::isKernel(function.getCallingConv())` instead of looking at the function attributes? I don't think we want to annotate non-kernels with these things

arsenm added inline comments.Nov 18 2021, 7:52 AM

clang/lib/CodeGen/TargetInfo.cpp
9288	!FD seems to always be true for openmp kernels because there's no associated function. This isn't looking at the IR function calling convention but I thought about switching to check that instead, but that's a separate change.

JonChesterfield mentioned this in D114274: [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments.Nov 19 2021, 12:52 PM

JonChesterfield mentioned this in rGae5348a38eb1: [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments.Nov 22 2021, 3:01 PM

Committed as 6c27d389c8a00040aad998fe959f38ba709a8750, recommitted as 2f0a5714184cca9325004506a22a2a3193c825aa

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGOpenMPRuntime.cpp

3 lines

TargetInfo.cpp

73 lines

test/

OpenMP/

amdgcn-attributes.cpp

43 lines

Diff 386137

clang/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntime.h"		#include "CGOpenMPRuntime.h"
#include "CGCXXABI.h"		#include "CGCXXABI.h"
#include "CGCleanup.h"		#include "CGCleanup.h"
#include "CGRecordLayout.h"		#include "CGRecordLayout.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
		#include "TargetInfo.h"
#include "clang/AST/APValue.h"		#include "clang/AST/APValue.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
#include "clang/AST/OpenMPClause.h"		#include "clang/AST/OpenMPClause.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/BitmaskEnum.h"		#include "clang/Basic/BitmaskEnum.h"
#include "clang/Basic/FileManager.h"		#include "clang/Basic/FileManager.h"
▲ Show 20 Lines • Show All 6,559 Lines • ▼ Show 20 Lines	OutlinedFn->addFnAttr("omp_target_num_teams",
std::to_string(DefaultValTeams));		std::to_string(DefaultValTeams));
}		}
int32_t DefaultValThreads = -1;		int32_t DefaultValThreads = -1;
getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);		getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);
if (DefaultValThreads > 0) {		if (DefaultValThreads > 0) {
OutlinedFn->addFnAttr("omp_target_thread_limit",		OutlinedFn->addFnAttr("omp_target_thread_limit",
std::to_string(DefaultValThreads));		std::to_string(DefaultValThreads));
}		}

		CGM.getTargetCodeGenInfo().setTargetAttributes(nullptr, OutlinedFn, CGM);
}		}

/// Checks if the expression is constant or does not have non-trivial function		/// Checks if the expression is constant or does not have non-trivial function
/// calls.		/// calls.
static bool isTrivial(ASTContext &Ctx, const Expr * E) {		static bool isTrivial(ASTContext &Ctx, const Expr * E) {
// We can skip constant expressions.		// We can skip constant expressions.
// We can skip expressions with trivial calls or simple expressions.		// We can skip expressions with trivial calls or simple expressions.
return (E->isEvaluatable(Ctx, Expr::SE_AllowUndefinedBehavior) \|\|		return (E->isEvaluatable(Ctx, Expr::SE_AllowUndefinedBehavior) \|\|
▲ Show 20 Lines • Show All 6,541 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,137 Lines • ▼ Show 20 Lines	ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty,

return ArgInfo;		return ArgInfo;
}		}

class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {		class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
public:		public:
AMDGPUTargetCodeGenInfo(CodeGenTypes &CGT)		AMDGPUTargetCodeGenInfo(CodeGenTypes &CGT)
: TargetCodeGenInfo(std::make_unique<AMDGPUABIInfo>(CGT)) {}		: TargetCodeGenInfo(std::make_unique<AMDGPUABIInfo>(CGT)) {}

		void setFunctionDeclAttributes(const FunctionDecl FD, llvm::Function F,
		CodeGenModule &CGM) const;

void setTargetAttributes(const Decl D, llvm::GlobalValue GV,		void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
CodeGen::CodeGenModule &M) const override;		CodeGen::CodeGenModule &M) const override;
unsigned getOpenCLKernelCallingConv() const override;		unsigned getOpenCLKernelCallingConv() const override;

llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,		llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,
llvm::PointerType *T, QualType QT) const override;		llvm::PointerType *T, QualType QT) const override;

LangAS getASTAllocaAddressSpace() const override {		LangAS getASTAllocaAddressSpace() const override {
Show All 23 Lines	static bool requiresAMDGPUProtectedVisibility(const Decl *D,
return D->hasAttr<OpenCLKernelAttr>() \|\|		return D->hasAttr<OpenCLKernelAttr>() \|\|
(isa<FunctionDecl>(D) && D->hasAttr<CUDAGlobalAttr>()) \|\|		(isa<FunctionDecl>(D) && D->hasAttr<CUDAGlobalAttr>()) \|\|
(isa<VarDecl>(D) &&		(isa<VarDecl>(D) &&
(D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>() \|\|		(D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>() \|\|
cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinSurfaceType() \|\|		cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinSurfaceType() \|\|
cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinTextureType()));		cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinTextureType()));
}		}

void AMDGPUTargetCodeGenInfo::setTargetAttributes(		void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {		const FunctionDecl FD, llvm::Function F, CodeGenModule &M) const {
if (requiresAMDGPUProtectedVisibility(D, GV)) {		const auto *ReqdWGS =
GV->setVisibility(llvm::GlobalValue::ProtectedVisibility);		M.getLangOpts().OpenCL ? FD->getAttr<ReqdWorkGroupSizeAttr>() : nullptr;
GV->setDSOLocal(true);		const bool IsOpenCLKernel =
}		M.getLangOpts().OpenCL && FD->hasAttr<OpenCLKernelAttr>();
		const bool IsHIPKernel = M.getLangOpts().HIP && FD->hasAttr<CUDAGlobalAttr>();
if (GV->isDeclaration())
return;
const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
if (!FD)
return;

llvm::Function *F = cast<llvm::Function>(GV);

const auto *ReqdWGS = M.getLangOpts().OpenCL ?
FD->getAttr<ReqdWorkGroupSizeAttr>() : nullptr;


const bool IsOpenCLKernel = M.getLangOpts().OpenCL &&
FD->hasAttr<OpenCLKernelAttr>();
const bool IsHIPKernel = M.getLangOpts().HIP &&
FD->hasAttr<CUDAGlobalAttr>();
if ((IsOpenCLKernel \|\| IsHIPKernel) &&
JonChesterfieldUnsubmitted Not Done Reply Inline Actions Here, we skip the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments if FD is nullptr JonChesterfield: Here, we skip the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments if FD is…
(M.getTriple().getOS() == llvm::Triple::AMDHSA))
F->addFnAttr("amdgpu-implicitarg-num-bytes", "56");

if (IsHIPKernel)
F->addFnAttr("uniform-work-group-size", "true");


const auto *FlatWGS = FD->getAttr<AMDGPUFlatWorkGroupSizeAttr>();		const auto *FlatWGS = FD->getAttr<AMDGPUFlatWorkGroupSizeAttr>();
if (ReqdWGS \|\| FlatWGS) {		if (ReqdWGS \|\| FlatWGS) {
unsigned Min = 0;		unsigned Min = 0;
unsigned Max = 0;		unsigned Max = 0;
if (FlatWGS) {		if (FlatWGS) {
Min = FlatWGS->getMin()		Min = FlatWGS->getMin()
->EvaluateKnownConstInt(M.getContext())		->EvaluateKnownConstInt(M.getContext())
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
}		}

if (const auto *Attr = FD->getAttr<AMDGPUNumVGPRAttr>()) {		if (const auto *Attr = FD->getAttr<AMDGPUNumVGPRAttr>()) {
uint32_t NumVGPR = Attr->getNumVGPR();		uint32_t NumVGPR = Attr->getNumVGPR();

if (NumVGPR != 0)		if (NumVGPR != 0)
F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));		F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
}		}
		}

		void AMDGPUTargetCodeGenInfo::setTargetAttributes(
		const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {
		if (requiresAMDGPUProtectedVisibility(D, GV)) {
		GV->setVisibility(llvm::GlobalValue::ProtectedVisibility);
		GV->setDSOLocal(true);
		}

		if (GV->isDeclaration())
		return;

		llvm::Function *F = dyn_cast<llvm::Function>(GV);
		if (!F)
		return;

		const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
		if (FD)
		setFunctionDeclAttributes(FD, F, M);

		const bool IsOpenCLKernel =
		M.getLangOpts().OpenCL && FD && FD->hasAttr<OpenCLKernelAttr>();
		const bool IsHIPKernel =
		M.getLangOpts().HIP && FD && FD->hasAttr<CUDAGlobalAttr>();

		const bool IsOpenMP = M.getLangOpts().OpenMP && !FD;
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Here, we do the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments regardless of whether FD is true or not JonChesterfield: Here, we do the amdgpu-implicitarg-num-bytes and uniform-work-group-size assignments regardless…
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Cancel that, there's a IsOpenMP = ...&& !FD here. Failed to follow the control flow. JonChesterfield: Cancel that, there's a IsOpenMP = ...&& !FD here. Failed to follow the control flow.
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Are we looking for `if (AMDGPU::isKernel(function.getCallingConv())` instead of looking at the function attributes? I don't think we want to annotate non-kernels with these things JonChesterfield: Are we looking for `if (AMDGPU::isKernel(function.getCallingConv())` instead of looking at the…
		arsenmAuthorUnsubmitted Done Reply Inline Actions !FD seems to always be true for openmp kernels because there's no associated function. This isn't looking at the IR function calling convention but I thought about switching to check that instead, but that's a separate change. arsenm: !FD seems to always be true for openmp kernels because there's no associated function. This…
		if ((IsOpenCLKernel \|\| IsHIPKernel \|\| IsOpenMP) &&
		(M.getTriple().getOS() == llvm::Triple::AMDHSA))
		F->addFnAttr("amdgpu-implicitarg-num-bytes", "56");

		if (IsHIPKernel)
		F->addFnAttr("uniform-work-group-size", "true");

if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())		if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");		F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");

if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)		if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
F->addFnAttr("amdgpu-ieee", "false");		F->addFnAttr("amdgpu-ieee", "false");
}		}

▲ Show 20 Lines • Show All 2,109 Lines • Show Last 20 Lines

clang/test/OpenMP/amdgcn-attributes.cpp

This file was added.

				// REQUIRES: amdgpu-registered-target

				// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=DEFAULT,ALL %s
				// RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=CPU,ALL %s

				// RUN: %clang_cc1 -menable-no-nans -mno-amdgpu-ieee -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=NOIEEE,ALL %s
				// RUN: %clang_cc1 -munsafe-fp-atomics -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=UNSAFEATOMIC,ALL %s

				// expected-no-diagnostics

				#define N 100

				int callable(int);

				// Check that the target attributes are set on the generated kernel
				int func() {
				// ALL-LABEL: amdgpu_kernel void @__omp_offloading{{.*}} #0

				int arr[N];

				#pragma omp target
				for (int i = 0; i < N; i++) {
				arr[i] = callable(arr[i]);
				}

				return arr[0];
				}

				int callable(int x) {
				// ALL-LABEL: @_Z8callablei(i32 %x) #1
				return x + 1;
				}

				// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-implicitarg-num-bytes"="56" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-implicitarg-num-bytes"="56" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }
				// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "amdgpu-implicitarg-num-bytes"="56" "frame-pointer"="none" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-implicitarg-num-bytes"="56" "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }

				// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }
				// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }

This is an archive of the discontinued LLVM Phabricator instance.

OpenMP: Start calling setTargetAttributes for generated kernelsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 386137

clang/lib/CodeGen/CGOpenMPRuntime.cpp

clang/lib/CodeGen/TargetInfo.cpp

clang/test/OpenMP/amdgcn-attributes.cpp

OpenMP: Start calling setTargetAttributes for generated kernels
ClosedPublic