This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
2/5
TargetInfo.cpp
-
test/OpenMP/
-
OpenMP/
-
amdgcn-attributes.cpp

Differential D135374

[OpenMP][AMDGPU] Add 'uniform-work-group' attribute to OpenMP kernels
ClosedPublic

Authored by jhuber6 on Oct 6 2022, 9:21 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
ronlieb
yaxunl
arsenm

Commits

rG4aa87a131f93: [OpenMP][AMDGPU] Add 'uniform-work-group' attribute to OpenMP kernels

Summary

The cl-uniform-work-group attribute asserts that the global work-size
be a multiple of the work-group specified work group size. This should
allow optimizations. It is already present by default in the AMD
compiler and for HIP kernels so it should be safe to allow this for
OpenMP kernels by default.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Oct 6 2022, 9:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 9:21 AM

Herald added subscribers: kosarev, guansong, t-tye and 4 others. · View Herald Transcript

jhuber6 requested review of this revision.Oct 6 2022, 9:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 9:21 AM

Herald added subscribers: cfe-commits, sstefan1, wdng. · View Herald Transcript

arsenm added inline comments.Oct 6 2022, 9:24 AM

clang/lib/CodeGen/TargetInfo.cpp
9424–9432	Can we move this whole thing out of amdgpu code and into language code?

jhuber6 added inline comments.Oct 6 2022, 9:29 AM

clang/lib/CodeGen/TargetInfo.cpp
9424–9432	Do you mean moving this into each language's respective code generation / sema? This seemed like the most straightforward place to put it as it's where we attach other target specific attributes.

arsenm added inline comments.Oct 6 2022, 9:30 AM

clang/lib/CodeGen/TargetInfo.cpp
9424–9432	But it's not actually a target specific attribute, it's a language property

jhuber6 added inline comments.Oct 6 2022, 9:35 AM

clang/lib/CodeGen/TargetInfo.cpp
9424–9432	Sure, I just figured this was the easier option since it already existed here for HIP. Which file do you suggest this go in? Should we just do this specifically in HIP / OpenMP codegen?

Harbormaster completed remote builds in B190753: Diff 465769.Oct 6 2022, 9:49 AM

arsenm added inline comments.Oct 6 2022, 10:34 AM

clang/lib/CodeGen/TargetInfo.cpp
9424–9432	Putting it here is a fine first step, it's just always bothered me that it's here. I don't know clang enough to know where it belongs. OpenCL defined this in the first place and changed the default behavior in CL2.0

LG, add a TODO to move the code.

This revision is now accepted and ready to land.Oct 6 2022, 2:18 PM

Closed by commit rG4aa87a131f93: [OpenMP][AMDGPU] Add 'uniform-work-group' attribute to OpenMP kernels (authored by jhuber6). · Explain WhyOct 6 2022, 4:22 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG4aa87a131f93: [OpenMP][AMDGPU] Add 'uniform-work-group' attribute to OpenMP kernels.

jdoerfert mentioned this in D135444: [OpenMP] Utilize the "non-uniform-workgroup" to simplify DeviceRTL.Oct 7 2022, 6:36 AM

jdoerfert mentioned this in rGd0f9ddde9986: [OpenMP] Utilize the "non-uniform-workgroup" to simplify DeviceRTL.Nov 1 2022, 8:38 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

TargetInfo.cpp

6 lines

test/

OpenMP/

amdgcn-attributes.cpp

8 lines

Diff 465923

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,415 Lines • ▼ Show 20 Lines	void AMDGPUTargetCodeGenInfo::setTargetAttributes(
llvm::Function *F = dyn_cast<llvm::Function>(GV);		llvm::Function *F = dyn_cast<llvm::Function>(GV);
if (!F)		if (!F)
return;		return;

const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);		const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
if (FD)		if (FD)
setFunctionDeclAttributes(FD, F, M);		setFunctionDeclAttributes(FD, F, M);

const bool IsHIPKernel =		const bool IsHIPKernel =
M.getLangOpts().HIP && FD && FD->hasAttr<CUDAGlobalAttr>();		M.getLangOpts().HIP && FD && FD->hasAttr<CUDAGlobalAttr>();
		const bool IsOpenMPkernel =
		M.getLangOpts().OpenMPIsDevice &&
		(F->getCallingConv() == llvm::CallingConv::AMDGPU_KERNEL);

if (IsHIPKernel)		// TODO: This should be moved to language specific attributes instead.
		if (IsHIPKernel \|\| IsOpenMPkernel)
F->addFnAttr("uniform-work-group-size", "true");		F->addFnAttr("uniform-work-group-size", "true");
		arsenmUnsubmitted Not Done Reply Inline Actions Can we move this whole thing out of amdgpu code and into language code? arsenm: Can we move this whole thing out of amdgpu code and into language code?
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Do you mean moving this into each language's respective code generation / sema? This seemed like the most straightforward place to put it as it's where we attach other target specific attributes. jhuber6: Do you mean moving this into each language's respective code generation / sema? This seemed…
		arsenmUnsubmitted Not Done Reply Inline Actions But it's not actually a target specific attribute, it's a language property arsenm: But it's not actually a target specific attribute, it's a language property
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Sure, I just figured this was the easier option since it already existed here for HIP. Which file do you suggest this go in? Should we just do this specifically in HIP / OpenMP codegen? jhuber6: Sure, I just figured this was the easier option since it already existed here for HIP. Which…
		arsenmUnsubmitted Not Done Reply Inline Actions Putting it here is a fine first step, it's just always bothered me that it's here. I don't know clang enough to know where it belongs. OpenCL defined this in the first place and changed the default behavior in CL2.0 arsenm: Putting it here is a fine first step, it's just always bothered me that it's here. I don't know…

if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())		if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");		F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");

if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)		if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
F->addFnAttr("amdgpu-ieee", "false");		F->addFnAttr("amdgpu-ieee", "false");
}		}

▲ Show 20 Lines • Show All 2,942 Lines • Show Last 20 Lines

clang/test/OpenMP/amdgcn-attributes.cpp

Show All 26 Lines	#pragma omp target
return arr[0];		return arr[0];
}		}

int callable(int x) {		int callable(int x) {
// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1		// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1
return x + 1;		return x + 1;
}		}

// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }		// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" "uniform-work-group-size"="true" }
// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }

// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }		// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }
// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }