This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
1
AMDGPUTargetMachine.h
1
AMDGPUTargetMachine.cpp
-
R600TargetMachine.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1/2
extended-image-insts-wave32-wave64.ll

Differential D158603

[AMDGPU][TargetMachine] Handle case when +extended-image-insts is set, and the user forces +wave64
Needs RevisionPublic

Authored by jmmartinez on Aug 23 2023, 5:35 AM.

Download Raw Diff

Details

Reviewers

arsenm
Pierre-vh

Summary

Some functions from device_libs have the attribute
"target-features"="+extended-image-insts".

On targets that default to wave32, if wave64 is forced by the user,
the wave64 feature is dropped when initializing the subtarget because
the "target-features" attribute is already set.

This results in functions marked with "target-features"="+extended-image-insts"
being compiled as wave32, although wave64 was requested.

This patch is a workaround this issue.

If "target-features" is equal to "+extended-image-insts", the global and
function features are concatenated.

In the general case, we cannot just concatenate the global and function
features since they may be incompatible: The feature
"+wavefrontsize32,+wavefrontsize64" results in 64 as wavefrontsize.

Related to SWDEV-410182.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jmmartinez created this revision.Aug 23 2023, 5:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 5:35 AM

Herald added subscribers: foad, kerbowa, hiraditya and 5 others. · View Herald Transcript

jmmartinez requested review of this revision.Aug 23 2023, 5:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 5:35 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

jmmartinez added inline comments.Aug 23 2023, 5:39 AM

llvm/test/CodeGen/AMDGPU/extended-image-insts-wave32-wave64.ll
24	Simplify the test case, I'm pretty sure it doesn't need this

I think changing the behavior of the global subtarget is a bad idea. We shouldn't be relying on the global subtarget in the first place, much less selectively changing the interpretation from "set default" to "append".

Using one set of IR functions and pretending they work for both wave sizes is a bad idea and simply not working. You're running into the consequences of several layers of hacks in device libs, comgr and clang to pretend this works. The wave size behaves more like a different target-cpu, rather than a feature you can simply union. I increasingly think we should move the wavefront size out from target-features and into a separate attribute (or introduce a set of wave32 calling conventions).

There may be several bugs here, and I don't think any of them should be addressed in the backend. First, last I knew comgr was not using clang/-mlink-builtin-bitcode (although maybe this was fixed?). It doesn't matter because I just performed an experiment and found that's also broken:

// builtins.cl 
// clang -O3 -target amdgcn-amd-amdhsa -c -emit-llvm -o builtin.bc builtin.cl

__attribute__((target("extended-image-insts")))
void uses_images(void) {

}

void uses_nothing(void) {

}

// clang -target amdgcn-amd-amdhsa -mcpu=gfx1031 -mwavefrontsize64 -S -emit-llvm -Xclang -mlink-builtin-bitcode -Xclang builtin.bc  -o - builtin-user.cl  -O0
// builtin-user.cl
extern void uses_images(void);
extern void uses_nothing(void);

void user_images(void) {
    uses_images();
}

void user_nothing(void) {
    uses_nothing();
}

This is broken, because the imported builtin didn't append the feature to uses_images:

define internal void @uses_images() #1 {
  ret void
}

; Function Attrs: convergent mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define internal void @uses_nothing() #2 {
  ret void
}

attributes #1 = { convergent mustprogress nofree norecurse nosync nounwind willreturn memory(none) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx1031" "target-features"="+extended-image-insts" }
attributes #2 = { convergent mustprogress nofree norecurse nosync nounwind willreturn memory(none) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx1031" }

@uses_nothing isn't even getting the +wavefrontsize64, which is doubly broken. I believe this was working correctly at one point, so did this regress? clang is broken here, so that should be fixed first. If comgr is still not correctly using -mlink-builtin-bitcode, it should implement an approximately equivalent hack to append the features if there are still blockers to doing so.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
585–587	There's nothing special about extended-image-insts to warrant special casing it here. Don't really understand how it connects to the original failure
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
34	Feature string shouldn't require adjustment
llvm/test/CodeGen/AMDGPU/extended-image-insts-wave32-wave64.ll
2	Don't need quotes or -verify-machineinstrs

This revision now requires changes to proceed.Aug 23 2023, 6:33 AM

Harbormaster completed remote builds in B254317: Diff 552677.Aug 23 2023, 6:38 AM

In D158603#4609902, @arsenm wrote:

First, last I knew comgr was not using clang/-mlink-builtin-bitcode (although maybe this was fixed?).
[...]
@uses_nothing isn't even getting the +wavefrontsize64, which is doubly broken. I believe this was working correctly at one point, so did this regress? clang is broken here, so that should be fixed first. If comgr is still not correctly using -mlink-builtin-bitcode, it should implement an approximately equivalent hack to append the features if there are still blockers to doing so.

Thanks for pointing this out. Comgr uses the Linker::linkInModule function directly. I wasn't aware that there was a difference, but I see now that using -mlink-builtin-bitcode does Gen->CGM().addDefaultFunctionDefinitionAttributes(F); and does internalization for every function in the builtin bitcode. Linker::linkInModule only keeps the attributes in the definition coming in from the builtin bitcode and ignores those in the declaration.

However, there is a TODO in that function...

void CodeGenModule::addDefaultFunctionDefinitionAttributes(llvm::Function &F) {
  llvm::AttrBuilder FuncAttrs(F.getContext());
  getDefaultFunctionAttributes(F.getName(), F.hasOptNone(),
                               /* AttrOnCallSite = */ false, FuncAttrs);
  // TODO: call GetCPUAndFeaturesAttributes?
  F.addFnAttrs(FuncAttrs);
}

If I set up a call to GetCPUAndFeaturesAttributes(GlobalDecl(), F) any target-features present in the definition are overriden (so it overrides the +extended-image-insts).

I have one question though. Why device_libs has that target specific attribute? Shouldn't it be deduced later in the optimization pipeline since the function has calls to image intrinsics?

In D158603#4613311, @jmmartinez wrote:

I have one question though. Why device_libs has that target specific attribute? Shouldn't it be deduced later in the optimization pipeline since the function has calls to image intrinsics?

This isn't an optimization. It cannot be inferred. This is supposed to be a restriction. Programs trying to use this on targets without the underlying instructions should be rejected

Adding Pierre as he worked on this recently as well.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.h

2 lines

AMDGPUTargetMachine.cpp

24 lines

R600TargetMachine.cpp

2 lines

test/

CodeGen/

AMDGPU/

extended-image-insts-wave32-wave64.ll

26 lines

Diff 552677

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h

	Show All 25 Lines
	// AMDGPU Target Machine (R600+)			// AMDGPU Target Machine (R600+)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class AMDGPUTargetMachine : public LLVMTargetMachine {			class AMDGPUTargetMachine : public LLVMTargetMachine {
	protected:			protected:
	std::unique_ptr<TargetLoweringObjectFile> TLOF;			std::unique_ptr<TargetLoweringObjectFile> TLOF;

	StringRef getGPUName(const Function &F) const;			StringRef getGPUName(const Function &F) const;
	StringRef getFeatureString(const Function &F) const;			std::string getFeatureString(const Function &F) const;
				arsenmUnsubmitted Not Done Reply Inline Actions Feature string shouldn't require adjustment arsenm: Feature string shouldn't require adjustment

	public:			public:
	static bool EnableLateStructurizeCFG;			static bool EnableLateStructurizeCFG;
	static bool EnableFunctionCalls;			static bool EnableFunctionCalls;
	static bool EnableLowerModuleLDS;			static bool EnableLowerModuleLDS;

	AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,			AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
	StringRef FS, TargetOptions Options,			StringRef FS, TargetOptions Options,
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines

AMDGPUTargetMachine::~AMDGPUTargetMachine() = default;		AMDGPUTargetMachine::~AMDGPUTargetMachine() = default;

StringRef AMDGPUTargetMachine::getGPUName(const Function &F) const {		StringRef AMDGPUTargetMachine::getGPUName(const Function &F) const {
Attribute GPUAttr = F.getFnAttribute("target-cpu");		Attribute GPUAttr = F.getFnAttribute("target-cpu");
return GPUAttr.isValid() ? GPUAttr.getValueAsString() : getTargetCPU();		return GPUAttr.isValid() ? GPUAttr.getValueAsString() : getTargetCPU();
}		}

StringRef AMDGPUTargetMachine::getFeatureString(const Function &F) const {		std::string AMDGPUTargetMachine::getFeatureString(const Function &F) const {
Attribute FSAttr = F.getFnAttribute("target-features");		Attribute FSAttr = F.getFnAttribute("target-features");
		StringRef TargetFS = getTargetFeatureString();

return FSAttr.isValid() ? FSAttr.getValueAsString()		if (FSAttr.isValid()) {
: getTargetFeatureString();		StringRef FunctionFS = FSAttr.getValueAsString();

		// Functions from extended-image-intrinsics.ll from device_libs have the
		// attribute "target-features"="+extended-image-insts" When compiling in
		// wave64 on a gpu that defaults to wave32, dropping the TargetFS string
		// makes those functions be compiled in wave32.
		bool EnableExtendedImageInstsForFunction =
		FunctionFS == "+extended-image-insts" &&
		!TargetFS.contains("-extended-image-insts");
		arsenmUnsubmitted Not Done Reply Inline Actions There's nothing special about extended-image-insts to warrant special casing it here. Don't really understand how it connects to the original failure arsenm: There's nothing special about extended-image-insts to warrant special casing it here. Don't…
		if (EnableExtendedImageInstsForFunction) {
		return (FunctionFS + "," + TargetFS).str();
		}
		return FunctionFS.str();
		}

		return TargetFS.str();
}		}

/// Predicate for Internalize pass.		/// Predicate for Internalize pass.
static bool mustPreserveGV(const GlobalValue &GV) {		static bool mustPreserveGV(const GlobalValue &GV) {
if (const Function *F = dyn_cast<Function>(&GV))		if (const Function *F = dyn_cast<Function>(&GV))
return F->isDeclaration() \|\| F->getName().startswith("__asan_") \|\|		return F->isDeclaration() \|\| F->getName().startswith("__asan_") \|\|
F->getName().startswith("__sanitizer_") \|\|		F->getName().startswith("__sanitizer_") \|\|
AMDGPU::isEntryFunctionCC(F->getCallingConv());		AMDGPU::isEntryFunctionCC(F->getCallingConv());
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	GCNTargetMachine::GCNTargetMachine(const Target &T, const Triple &TT,
std::optional<Reloc::Model> RM,		std::optional<Reloc::Model> RM,
std::optional<CodeModel::Model> CM,		std::optional<CodeModel::Model> CM,
CodeGenOpt::Level OL, bool JIT)		CodeGenOpt::Level OL, bool JIT)
: AMDGPUTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL) {}		: AMDGPUTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL) {}

const TargetSubtargetInfo *		const TargetSubtargetInfo *
GCNTargetMachine::getSubtargetImpl(const Function &F) const {		GCNTargetMachine::getSubtargetImpl(const Function &F) const {
StringRef GPU = getGPUName(F);		StringRef GPU = getGPUName(F);
StringRef FS = getFeatureString(F);		auto FS = getFeatureString(F);

SmallString<128> SubtargetKey(GPU);		SmallString<128> SubtargetKey(GPU);
SubtargetKey.append(FS);		SubtargetKey.append(FS);

auto &I = SubtargetMap[SubtargetKey];		auto &I = SubtargetMap[SubtargetKey];
if (!I) {		if (!I) {
// This needs to be done before we create a new subtarget since any		// This needs to be done before we create a new subtarget since any
// creation will depend on the TM and the code generation flags on the		// creation will depend on the TM and the code generation flags on the
▲ Show 20 Lines • Show All 799 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/R600TargetMachine.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	R600TargetMachine::R600TargetMachine(const Target &T, const Triple &TT,
if (EnableFunctionCalls &&		if (EnableFunctionCalls &&
EnableAMDGPUFunctionCallsOpt.getNumOccurrences() == 0)		EnableAMDGPUFunctionCallsOpt.getNumOccurrences() == 0)
EnableFunctionCalls = false;		EnableFunctionCalls = false;
}		}

const TargetSubtargetInfo *		const TargetSubtargetInfo *
R600TargetMachine::getSubtargetImpl(const Function &F) const {		R600TargetMachine::getSubtargetImpl(const Function &F) const {
StringRef GPU = getGPUName(F);		StringRef GPU = getGPUName(F);
StringRef FS = getFeatureString(F);		auto FS = getFeatureString(F);

SmallString<128> SubtargetKey(GPU);		SmallString<128> SubtargetKey(GPU);
SubtargetKey.append(FS);		SubtargetKey.append(FS);

auto &I = SubtargetMap[SubtargetKey];		auto &I = SubtargetMap[SubtargetKey];
if (!I) {		if (!I) {
// This needs to be done before we create a new subtarget since any		// This needs to be done before we create a new subtarget since any
// creation will depend on the TM and the code generation flags on the		// creation will depend on the TM and the code generation flags on the
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/extended-image-insts-wave32-wave64.ll

This file was added.

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,WAVE32 %s
				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -mattr="+wavefrontsize32" -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,WAVE32 %s
				arsenmUnsubmitted Not Done Reply Inline Actions Don't need quotes or -verify-machineinstrs arsenm: Don't need quotes or -verify-machineinstrs
				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -mattr="+wavefrontsize64" -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,WAVE64 %s

				; GCN-LABEL: has_extended_image_insts
				; WAVE32: s_and_b32 vcc_lo, exec_lo, {{.*}}
				; WAVE64: s_and_b64 vcc, exec, {{.*}}
				; WAVE32: amdhsa_wavefront_size32 1
				; WAVE64: amdhsa_wavefront_size32 0

				define amdgpu_kernel void @has_extended_image_insts(float %arg10) #0 {
				.entry:
				%tmp100 = fcmp ogt float %arg10, 0.25
				br i1 %tmp100, label %if, label %endif
				if:
				%tmp101 = fadd float %arg10, 0.125
				br label %endif
				endif:
				%tmp102 = phi float [ %arg10, %.entry ], [ %tmp101, %if ]
				call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %tmp102, float %tmp102, float %tmp102, float %tmp102, i1 true, i1 true)
				ret void
				}

				declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1)
				jmmartinezAuthorUnsubmitted Done Reply Inline Actions Simplify the test case, I'm pretty sure it doesn't need this jmmartinez: Simplify the test case, I'm pretty sure it doesn't need this

				attributes #0 = { nounwind "target-features"="+extended-image-insts" }