This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
-
MCSubtargetInfo.h
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPU.h
3/3
AMDGPUClearIncompatibleFunctions.cpp
7/7
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
dummy-target.ll
-
clear-incompatible-functions.ll
-
llc-pipeline.ll

Differential D139000

[AMDGPU] Remove function with incompatible features
ClosedPublic

Authored by Pierre-vh on Nov 30 2022, 3:55 AM.

Download Raw Diff

Details

Reviewers

arsenm
foad
Joe_Nash
b-sumner
nhaehnle
Leonc
bcahoon

Commits

rG8e68c1204580: [AMDGPU] Remove function with incompatible features

Summary

Adds a new pass that removes functions
if they use features that are not supported on the current GPU.

This change is aimed at preventing crashes when building code at O0 that
uses idioms such as if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();
where ISA_VERSION is not constexpr, and intrinsic_a is not selectable
on older targets.
This is a pattern that's used all over the ROCm device libs. The main
motive behind this change is to allow code using ROCm device libs
to be built at O0.

Note: the feature checking logic is done ad-hoc in the pass. There is no other
pass that needs (or will need in the foreseeable future) to do similar
feature-checking logic so I did not see a need to generalize the feature
checking logic yet. It can (and should probably) be generalized later and
moved to a TargetInfo-like class or helper file.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Nov 30 2022, 3:55 AM

Herald added subscribers: kosarev, kerbowa, hiraditya and 5 others. · View Herald TranscriptNov 30 2022, 3:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 3:55 AM

Pierre-vh requested review of this revision.Nov 30 2022, 3:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 3:55 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

foad added inline comments.Nov 30 2022, 4:14 AM

llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
81	`isDeclaration()` is a bit more self-documenting than `empty()`.
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
217	Can you make the name of the option match the name of the pass? It seems like you have three variations in this patch: clear incompatible functions clear incompatible functions bodies incompatible features clear fns

Comments

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
217	Matching the name of the pass is not possible because the pass already generates a CL Option I think, but I tried to make it more consistent.

foad added inline comments.Nov 30 2022, 4:55 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
217	Maybe use -amdgpu-enable-* instead of just -amdgpu-* then? I see a few examples like that already.

Comments

Harbormaster completed remote builds in B200243: Diff 478904.Nov 30 2022, 5:52 AM

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

In D139000#3960912, @Joe_Nash wrote:

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

I did a quick test where I passed all .bc files from the device libs to Clang for fiji (gfx8) and it still doesn't build even with this patch. The pass kicks in a few times but there's some issues with "dot" instructions.
Not sure how to address those - should it be done in this pass? For instance device libs has a few places here it uses (target("dot8-insts") which allows selection to work (because that only checks the feature) but then it fails because there is no "real" instruction for GFX8 dot8, only GFX11 (it uses the generation).

Do I just go "whack a mole" and try to build, add more checks, try to build again, etc?
I'm worried about complexity exploding if the checks need to be more intricate. e.g. I see that dot instructions have been introduced in the middle of the GFX9 generation (GFX908?) so I'd already need to change the pass completely to check for GFX908

In D139000#3962770, @Pierre-vh wrote:

In D139000#3960912, @Joe_Nash wrote:

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

I did a quick test where I passed all .bc files from the device libs to Clang for fiji (gfx8) and it still doesn't build even with this patch. The pass kicks in a few times but there's some issues with "dot" instructions.
Not sure how to address those - should it be done in this pass? For instance device libs has a few places here it uses (target("dot8-insts") which allows selection to work (because that only checks the feature) but then it fails because there is no "real" instruction for GFX8 dot8, only GFX11 (it uses the generation).

Do I just go "whack a mole" and try to build, add more checks, try to build again, etc?
I'm worried about complexity exploding if the checks need to be more intricate. e.g. I see that dot instructions have been introduced in the middle of the GFX9 generation (GFX908?) so I'd already need to change the pass completely to check for GFX908

Thanks for looking deeper into this. I believe it is the right functional test for this patch if it can correctly build device libs with D123693 reverted. So I would recommend continuing the implementation till that is true.

I would recommend one level of abstraction on the attribute checks so it is easier to extend. So something like this pseudocode.

foreach ( attribute in function)
    if (isIllegal(attribute, function.subtarget))
        remove = true;

isIllegal can be similar to your current FeatureAndMinGen, since you mentioned before there is no existing api for determining if an attribute is legal on a subtarget (though I'm surprised by that and think if you build it the users will come).

Is there a way to detect when new feature attributes are added and raise a warning if this pass hasn't been updated?

Reworked the feature compatibility checking logic to use TableGen data.
I think this is a lot more robust. I only check the features we're interested in though - I tried checking all of them and there's too many edge cases to handle them all so it's better to do this on an "opt-in" basis IMO.

With this, device libs _mostly_ builds at O0 on fiji. There's just some small codegen issue with fsin but I will try to look into it (even if it's not a known issue)

Comments

Rebase

Harbormaster completed remote builds in B200756: Diff 479600.Dec 2 2022, 7:01 AM

Overall this is looking quite good.

do this on an "opt-in" basis

Makes sense to me

llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
87	Does this need to reach a fixed point over features implying each other? Maybe that has already been done in the Implies data structure.

Pierre-vh marked an inline comment as done.Dec 6 2022, 1:00 AM

Pierre-vh added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
87	It just needs to expand everything, not sure how the Implies data structure works exactly but I _think_ you can have implied feature that imply other feature, so there's a need to be recursive. This function is inspired by `SetImpliedBits` in `MCSubtargetInfo.cpp` (which is private)

ping

The last functional status I see here is.

With this, device libs _mostly_ builds at O0 on fiji.

If this is building device libs correctly on all targets with D123693 reverted then it is good by me.

OutOfCache added a subscriber: OutOfCache.Dec 29 2022, 3:29 AM

OutOfCache removed a subscriber: OutOfCache.

Last time I checked (before the holidays), it builds fine without the patch reverted. I can try reverting the patch later when this lands perhaps? Or should it be done at the same time?
For reference the only build issues I experienced where with hard f64 functions like llvm.sin.f64, which are apparently not (well) supported if I understand correctly

Rebase

Harbormaster completed remote builds in B205621: Diff 486187.Jan 4 2023, 2:04 AM

In D139000#4025199, @Pierre-vh wrote:

Last time I checked (before the holidays), it builds fine without the patch reverted. I can try reverting the patch later when this lands perhaps? Or should it be done at the same time?
For reference the only build issues I experienced where with hard f64 functions like llvm.sin.f64, which are apparently not (well) supported if I understand correctly

I'm fine with you landing this patch then reverting the V_ILLEGAL patch later.

LGTM! Thanks

This revision is now accepted and ready to land.Jan 4 2023, 6:26 AM

I think this is too much magic to apply based on target-features, and is too specific for the library usage. I think this needs to be an explicit opt-in behavior for the function, maybe a form of linkage (target_weak)? I also think the library use case is insufficient to add such a thing. It's like 1 medium-small file that requires duplication, plus maybe 5 or so microscopic functions

This revision now requires changes to proceed.Jan 4 2023, 6:55 AM

In D139000#4026031, @arsenm wrote:

I think this is too much magic to apply based on target-features, and is too specific for the library usage. I think this needs to be an explicit opt-in behavior for the function, maybe a form of linkage (target_weak)? I also think the library use case is insufficient to add such a thing. It's like 1 medium-small file that requires duplication, plus maybe 5 or so microscopic functions

I thought we agreed this was the solution we wanted, at least for now? I remember there was quite a bit of discussion but I ended up with a green light to work on this pass based on the current use-case (device libs)

I agree this should probably be opt-in though, but adding a linkage type seems overkill no? Could we do something simpler like a target-feature perhaps? Or just make the CL option off by default?

Rebase & refactor according to offline discussion

Herald added subscribers: nlopes, StephenFan. · View Herald TranscriptFeb 10 2023, 12:57 AM

Harbormaster completed remote builds in B212973: Diff 496365.Feb 10 2023, 2:27 AM

Pierre-vh retitled this revision from [AMDGPU] Clear bodies of function with incompatible features to [AMDGPU] Remove function with incompatible features.Feb 10 2023, 6:32 AM

Pierre-vh edited the summary of this revision. (Show Details)

Joe_Nash added inline comments.Feb 10 2023, 6:38 AM

llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp
10 ↗	(On Diff #496365)	typo: functions repeated
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1069	For the sake of compile time, can we avoid running this pass on O2 and O3? We know dead code elimination will remove the functions anyway.

foad added inline comments.Feb 10 2023, 7:05 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1069	We don't know that. DCE will only remove a function if the compiler can easily prove that it is never called.

LGTM besides my previously noted typo, but please wait for @arsenm

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1069	Ok, you're right. I was thinking of device libs only, but this pass can be used more generally.

Fix typo

Harbormaster completed remote builds in B213369: Diff 496883.Feb 13 2023, 2:59 AM

Ping (@arsenm) :)

arsenm added inline comments.Feb 20 2023, 5:43 AM

llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp
56 ↗	(On Diff #496883)	I didn't know there was a skipModule, is this the same as skipFunction in that we shouldn't be doing it for required passes?
66 ↗	(On Diff #496883)	Should replace with ConstantPointerNull, not undef
76 ↗	(On Diff #496883)	Start with lowercase
105 ↗	(On Diff #496883)	Braces
118 ↗	(On Diff #496883)	Don't need to do this per function, can just not run the pass in the first place
153–155 ↗	(On Diff #496883)	You can build this twine directly top ass to DiagnosticInfo, you don't need to build a std::string first
156 ↗	(On Diff #496883)	I'm not sure this really belongs under DiagnosticInfoUnsupported, seems like its own category (also, it's not a warning if it's supposed to happen)
llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
293 ↗	(On Diff #496883)	Use opaque pointers and named values
845 ↗	(On Diff #496883)	Also should include IR check lines
856 ↗	(On Diff #496883)	Missing uses for the functions? In addition to some direct callers, I would like to see some address captures and constantexpr uses

Comments

llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp
56 ↗	(On Diff #496883)	It's similar I think so I removed it
llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
845 ↗	(On Diff #496883)	Now that I think about it it's probably better to just check the IR and nothing else. I initially checked the asm because I used unreachable and wanted to check that it was correctly emitted but now it doesn't really matter.

Harbormaster completed remote builds in B214769: Diff 498856.Feb 20 2023, 8:55 AM

Not my favorite but I guess we can do this for now

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1068	Extra whitespace change
llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
845 ↗	(On Diff #496883)	There'd be some value in testing codegen if these also were testing the failing intrinsics in the bodies

This revision is now accepted and ready to land.Feb 20 2023, 9:19 AM

Comments

Pierre-vh marked an inline comment as done.Feb 21 2023, 1:39 AM

This revision was landed with ongoing or failed builds.Feb 21 2023, 1:42 AM

Closed by commit rG8e68c1204580: [AMDGPU] Remove function with incompatible features (authored by Pierre-vh). · Explain Why

This revision was automatically updated to reflect the committed changes.

Pierre-vh added a commit: rG8e68c1204580: [AMDGPU] Remove function with incompatible features.

Harbormaster completed remote builds in B214956: Diff 499077.Feb 21 2023, 3:14 AM

foad mentioned this in D148127: [AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL.Apr 12 2023, 6:29 AM

foad mentioned this in rGbf4dc4381e30: [AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL.Apr 19 2023, 2:00 AM

foad added inline comments.Apr 19 2023, 8:14 AM

llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
372 ↗	(On Diff #499078)	Typo GFX111
378 ↗	(On Diff #499078)	Typo GFX111

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCSubtargetInfo.h

4 lines

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUClearIncompatibleFunctions.cpp

169 lines

AMDGPUTargetMachine.cpp

12 lines

CMakeLists.txt

1 line

test/

CodeGen/

AMDGPU/

GlobalISel/

dummy-target.ll

2 lines

clear-incompatible-functions.ll

1025 lines

llc-pipeline.ll

5 lines

Diff 479600

llvm/include/llvm/MC/MCSubtargetInfo.h

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	public:
}		}

/// Check whether the CPU string is valid.		/// Check whether the CPU string is valid.
bool isCPUStringValid(StringRef CPU) const {		bool isCPUStringValid(StringRef CPU) const {
auto Found = llvm::lower_bound(ProcDesc, CPU);		auto Found = llvm::lower_bound(ProcDesc, CPU);
return Found != ProcDesc.end() && StringRef(Found->Key) == CPU;		return Found != ProcDesc.end() && StringRef(Found->Key) == CPU;
}		}

		ArrayRef<SubtargetSubTypeKV> getAllProcessorDescriptions() const {
		return ProcDesc;
		}

virtual unsigned getHwMode() const { return 0; }		virtual unsigned getHwMode() const { return 0; }

/// Return the cache size in bytes for the given level of cache.		/// Return the cache size in bytes for the given level of cache.
/// Level is zero-based, so a value of zero means the first level of		/// Level is zero-based, so a value of zero means the first level of
/// cache.		/// cache.
///		///
virtual Optional<unsigned> getCacheSize(unsigned Level) const;		virtual Optional<unsigned> getCacheSize(unsigned Level) const;

▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	FunctionPass *createSIMemoryLegalizerPass();			FunctionPass *createSIMemoryLegalizerPass();
	FunctionPass *createSIInsertWaitcntsPass();			FunctionPass *createSIInsertWaitcntsPass();
	FunctionPass *createSIPreAllocateWWMRegsPass();			FunctionPass *createSIPreAllocateWWMRegsPass();
	FunctionPass *createSIFormMemoryClausesPass();			FunctionPass *createSIFormMemoryClausesPass();

	FunctionPass *createSIPostRABundlerPass();			FunctionPass *createSIPostRABundlerPass();
	FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );			FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );
	FunctionPass *createAMDGPUUseNativeCallsPass();			FunctionPass *createAMDGPUUseNativeCallsPass();
				FunctionPass createAMDGPUClearIncompatibleFunctionsPass(const TargetMachine );
	FunctionPass *createAMDGPUCodeGenPreparePass();			FunctionPass *createAMDGPUCodeGenPreparePass();
	FunctionPass *createAMDGPULateCodeGenPreparePass();			FunctionPass *createAMDGPULateCodeGenPreparePass();
	FunctionPass *createAMDGPUMachineCFGStructurizerPass();			FunctionPass *createAMDGPUMachineCFGStructurizerPass();
	FunctionPass createAMDGPUPropagateAttributesEarlyPass(const TargetMachine );			FunctionPass createAMDGPUPropagateAttributesEarlyPass(const TargetMachine );
	ModulePass createAMDGPUPropagateAttributesLatePass(const TargetMachine );			ModulePass createAMDGPUPropagateAttributesLatePass(const TargetMachine );
	FunctionPass *createAMDGPURewriteOutArgumentsPass();			FunctionPass *createAMDGPURewriteOutArgumentsPass();
	ModulePass *createAMDGPUReplaceLDSUseWithPointerPass();			ModulePass *createAMDGPUReplaceLDSUseWithPointerPass();
	ModulePass *createAMDGPULowerModuleLDSPass();			ModulePass *createAMDGPULowerModuleLDSPass();
	▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	extern char &SIOptimizeVGPRLiveRangeID;			extern char &SIOptimizeVGPRLiveRangeID;

	void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);			void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);
	extern char &AMDGPUAnnotateUniformValuesPassID;			extern char &AMDGPUAnnotateUniformValuesPassID;

	void initializeAMDGPUCodeGenPreparePass(PassRegistry&);			void initializeAMDGPUCodeGenPreparePass(PassRegistry&);
	extern char &AMDGPUCodeGenPrepareID;			extern char &AMDGPUCodeGenPrepareID;

				void initializeAMDGPUClearIncompatibleFunctionsPass(PassRegistry &);
				extern char &AMDGPUClearIncompatibleFunctionsID;

	void initializeAMDGPULateCodeGenPreparePass(PassRegistry &);			void initializeAMDGPULateCodeGenPreparePass(PassRegistry &);
	extern char &AMDGPULateCodeGenPrepareID;			extern char &AMDGPULateCodeGenPrepareID;

	FunctionPass *createAMDGPURewriteUndefForPHIPass();			FunctionPass *createAMDGPURewriteUndefForPHIPass();
	void initializeAMDGPURewriteUndefForPHIPass(PassRegistry &);			void initializeAMDGPURewriteUndefForPHIPass(PassRegistry &);
	extern char &AMDGPURewriteUndefForPHIPassID;			extern char &AMDGPURewriteUndefForPHIPassID;

	void initializeSIAnnotateControlFlowPass(PassRegistry&);			void initializeSIAnnotateControlFlowPass(PassRegistry&);
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp

This file was added.

				//===-- AMDGPUClearIncompatibleFunctions.cpp ------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This pass replaces the bodies of functions that use GPU features
				/// incompatible with the current GPU with trap/unreachable.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "GCNSubtarget.h"
				#include "llvm/IR/DiagnosticInfo.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Pass.h"
				#include "llvm/Target/TargetMachine.h"

				#define DEBUG_TYPE "amdgpu-clear-incompatible-functions"

				using namespace llvm;

				namespace llvm {
				extern const SubtargetFeatureKV
				AMDGPUFeatureKV[AMDGPU::NumSubtargetFeatures - 1];
				}

				namespace {

				using Generation = AMDGPUSubtarget::Generation;

				class AMDGPUClearIncompatibleFunctions : public FunctionPass {
				public:
				static char ID;

				AMDGPUClearIncompatibleFunctions(const TargetMachine *TM = nullptr)
				: FunctionPass(ID), TM(TM) {
				assert(TM && "No TargetMachine!");
				}

				StringRef getPassName() const override {
				return "AMDGPU Clear Incompatible Functions Bodies";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				// If changes are made, no analyses are preserved.
				}

				bool runOnFunction(Function &F) override;

				private:
				const TargetMachine *TM = nullptr;
				};

				StringRef GetFeatureName(unsigned Feature) {
				for (const SubtargetFeatureKV &KV : AMDGPUFeatureKV)
				if (Feature == KV.Value)
				return KV.Key;

				llvm_unreachable("Unknown Target feature");
				}

				const SubtargetSubTypeKV *GetGPUInfo(const GCNSubtarget &ST,
				StringRef GPUName) {
				for (const SubtargetSubTypeKV &KV : ST.getAllProcessorDescriptions())
				if (StringRef(KV.Key) == GPUName)
				return &KV;

				return nullptr;
				}

				constexpr unsigned FeaturesToCheck[] = {
				AMDGPU::FeatureGFX11Insts, AMDGPU::FeatureGFX10Insts,
				AMDGPU::FeatureGFX9Insts, AMDGPU::FeatureGFX8Insts,
				AMDGPU::FeatureDPP, AMDGPU::Feature16BitInsts,
				AMDGPU::FeatureDot1Insts, AMDGPU::FeatureDot2Insts,
				foadUnsubmitted Done Reply Inline Actions `isDeclaration()` is a bit more self-documenting than `empty()`. foad: `isDeclaration()` is a bit more self-documenting than `empty()`.
				AMDGPU::FeatureDot3Insts, AMDGPU::FeatureDot4Insts,
				AMDGPU::FeatureDot5Insts, AMDGPU::FeatureDot6Insts,
				AMDGPU::FeatureDot7Insts, AMDGPU::FeatureDot8Insts,
				};

				FeatureBitset ExpandImpliedFeatures(const FeatureBitset &Features) {
				Joe_NashUnsubmitted Done Reply Inline Actions Does this need to reach a fixed point over features implying each other? Maybe that has already been done in the Implies data structure. Joe_Nash: Does this need to reach a fixed point over features implying each other? Maybe that has already…
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions It just needs to expand everything, not sure how the Implies data structure works exactly but I _think_ you can have implied feature that imply other feature, so there's a need to be recursive. This function is inspired by `SetImpliedBits` in `MCSubtargetInfo.cpp` (which is private) Pierre-vh: It just needs to expand everything, not sure how the Implies data structure works exactly but I…
				FeatureBitset Result = Features;
				for (const SubtargetFeatureKV &FE : AMDGPUFeatureKV)
				if (Features.test(FE.Value) && FE.Implies.any())
				Result \|= ExpandImpliedFeatures(FE.Implies.getAsBitset());
				return Result;
				}

				} // end anonymous namespace

				bool AMDGPUClearIncompatibleFunctions::runOnFunction(Function &F) {
				if (skipFunction(F) \|\| F.isDeclaration())
				return false;

				// This pass is primarily intended for GCN, so check we have a GCN GPU.
				if (!TM->getTargetTriple().isAMDGCN())
				return false;

				const GCNSubtarget *ST =
				static_cast<const GCNSubtarget *>(TM->getSubtargetImpl(F));

				// Additionally check our GPU isn't the generic one. The generic one is used
				// for testing only and we don't want this pass to interfere with it.
				StringRef GPUName = ST->getCPU();
				if (GPUName.empty() \|\| GPUName.contains("generic"))
				return false;

				// Try to fetch the GPU's info. If we can't, it's likely an unknown processor
				// so just bail out.
				const SubtargetSubTypeKV GPUInfo = GetGPUInfo(ST, GPUName);
				if (!GPUInfo)
				return false;

				LLVMContext &Ctx = F.getContext();

				// Get all the features implied by the current GPU, and recursively expand
				// the features that imply other features.
				//
				// e.g. GFX90A implies FeatureGFX9, and FeatureGFX9 implies a whole set of
				// other features.
				const FeatureBitset GPUFeatureBits =
				ExpandImpliedFeatures(GPUInfo->Implies.getAsBitset());

				// Now that the have a FeatureBitset containing all possible features for
				// the chosen GPU, check our list of "suspicious" features.

				// Check that the user didn't enable any features that aren't part of that
				// GPU's feature set. We only check a predetermined set of features.
				bool Remove = false;
				for (unsigned Feature : FeaturesToCheck) {
				if (ST->hasFeature(Feature) && !GPUFeatureBits.test(Feature)) {
				Remove = true;
				std::string Msg =
				"+" + GetFeatureName(Feature).str() +
				" is not supported on the current target. Deleting function body.";
				DiagnosticInfoUnsupported DiagInfo(F, Msg, DiagnosticLocation(),
				DS_Warning);
				Ctx.diagnose(DiagInfo);
				}
				}

				if (!Remove)
				return false;

				F.dropAllReferences();
				assert(F.empty());

				BasicBlock *Entry = BasicBlock::Create(Ctx, "entry", &F);
				IRBuilder<> Builder(Entry);
				Builder.CreateIntrinsic(Intrinsic::trap, {}, {});
				Builder.CreateUnreachable();
				return true;
				}

				INITIALIZE_PASS(AMDGPUClearIncompatibleFunctions, DEBUG_TYPE,
				"AMDGPU Clear Incompatible Functions Bodies", false, false)

				char AMDGPUClearIncompatibleFunctions::ID = 0;

				FunctionPass *
				llvm::createAMDGPUClearIncompatibleFunctionsPass(const TargetMachine *TM) {
				return new AMDGPUClearIncompatibleFunctions(TM);
				}

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines

// Option to inline all early.		// Option to inline all early.
static cl::opt<bool> EarlyInlineAll(		static cl::opt<bool> EarlyInlineAll(
"amdgpu-early-inline-all",		"amdgpu-early-inline-all",
cl::desc("Inline all functions early"),		cl::desc("Inline all functions early"),
cl::init(false),		cl::init(false),
cl::Hidden);		cl::Hidden);

		static cl::opt<bool> ClearIncompatibleFunctionsBodies(
		"amdgpu-enable-clear-incompatible-functions", cl::Hidden,
		foadUnsubmitted Done Reply Inline Actions Can you make the name of the option match the name of the pass? It seems like you have three variations in this patch: clear incompatible functions clear incompatible functions bodies incompatible features clear fns foad: Can you make the name of the option match the name of the pass? It seems like you have three…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Matching the name of the pass is not possible because the pass already generates a CL Option I think, but I tried to make it more consistent. Pierre-vh: Matching the name of the pass is not possible because the pass already generates a CL Option I…
		foadUnsubmitted Done Reply Inline Actions Maybe use -amdgpu-enable-* instead of just -amdgpu-* then? I see a few examples like that already. foad: Maybe use -amdgpu-enable-* instead of just -amdgpu-* then? I see a few examples like that…
		cl::desc("Enable deletion of function bodies when they"
		"use features not supported by the target GPU"),
		cl::init(true));

static cl::opt<bool> EnableSDWAPeephole(		static cl::opt<bool> EnableSDWAPeephole(
"amdgpu-sdwa-peephole",		"amdgpu-sdwa-peephole",
cl::desc("Enable SDWA peepholer"),		cl::desc("Enable SDWA peepholer"),
cl::init(true));		cl::init(true));

static cl::opt<bool> EnableDPPCombine(		static cl::opt<bool> EnableDPPCombine(
"amdgpu-dpp-combine",		"amdgpu-dpp-combine",
cl::desc("Enable DPP combiner"),		cl::desc("Enable DPP combiner"),
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeAMDGPUPreLegalizerCombinerPass(*PR);		initializeAMDGPUPreLegalizerCombinerPass(*PR);
initializeAMDGPURegBankCombinerPass(*PR);		initializeAMDGPURegBankCombinerPass(*PR);
initializeAMDGPUPromoteAllocaPass(*PR);		initializeAMDGPUPromoteAllocaPass(*PR);
initializeAMDGPUPromoteAllocaToVectorPass(*PR);		initializeAMDGPUPromoteAllocaToVectorPass(*PR);
initializeAMDGPUCodeGenPreparePass(*PR);		initializeAMDGPUCodeGenPreparePass(*PR);
initializeAMDGPULateCodeGenPreparePass(*PR);		initializeAMDGPULateCodeGenPreparePass(*PR);
initializeAMDGPUPropagateAttributesEarlyPass(*PR);		initializeAMDGPUPropagateAttributesEarlyPass(*PR);
initializeAMDGPUPropagateAttributesLatePass(*PR);		initializeAMDGPUPropagateAttributesLatePass(*PR);
		initializeAMDGPUClearIncompatibleFunctionsPass(*PR);
initializeAMDGPUReplaceLDSUseWithPointerPass(*PR);		initializeAMDGPUReplaceLDSUseWithPointerPass(*PR);
initializeAMDGPULowerModuleLDSPass(*PR);		initializeAMDGPULowerModuleLDSPass(*PR);
initializeAMDGPURewriteOutArgumentsPass(*PR);		initializeAMDGPURewriteOutArgumentsPass(*PR);
initializeAMDGPURewriteUndefForPHIPass(*PR);		initializeAMDGPURewriteUndefForPHIPass(*PR);
initializeAMDGPUUnifyMetadataPass(*PR);		initializeAMDGPUUnifyMetadataPass(*PR);
initializeSIAnnotateControlFlowPass(*PR);		initializeSIAnnotateControlFlowPass(*PR);
initializeAMDGPUReleaseVGPRsPass(*PR);		initializeAMDGPUReleaseVGPRsPass(*PR);
initializeAMDGPUInsertDelayAluPass(*PR);		initializeAMDGPUInsertDelayAluPass(*PR);
▲ Show 20 Lines • Show All 666 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addCodeGenPrepare() {
// here seems better that these blocks would get cleaned up by		// here seems better that these blocks would get cleaned up by
// UnreachableBlockElim inserted next in the pass flow.		// UnreachableBlockElim inserted next in the pass flow.
addPass(createLowerSwitchPass());		addPass(createLowerSwitchPass());
}		}

bool AMDGPUPassConfig::addPreISel() {		bool AMDGPUPassConfig::addPreISel() {
if (TM->getOptLevel() > CodeGenOpt::None)		if (TM->getOptLevel() > CodeGenOpt::None)
addPass(createFlattenCFGPass());		addPass(createFlattenCFGPass());

		arsenmUnsubmitted Done Reply Inline Actions Extra whitespace change arsenm: Extra whitespace change
		if (ClearIncompatibleFunctionsBodies)
		Joe_NashUnsubmitted Done Reply Inline Actions For the sake of compile time, can we avoid running this pass on O2 and O3? We know dead code elimination will remove the functions anyway. Joe_Nash: For the sake of compile time, can we avoid running this pass on O2 and O3? We know dead code…
		foadUnsubmitted Done Reply Inline Actions We don't know that. DCE will only remove a function if the compiler can easily prove that it is never called. foad: We don't know that. DCE will only remove a function if the compiler can easily prove that it is…
		Joe_NashUnsubmitted Done Reply Inline Actions Ok, you're right. I was thinking of device libs only, but this pass can be used more generally. Joe_Nash: Ok, you're right. I was thinking of device libs only, but this pass can be used more generally.
		addPass(
		createAMDGPUClearIncompatibleFunctionsPass(&getAMDGPUTargetMachine()));

return false;		return false;
}		}

bool AMDGPUPassConfig::addInstSelector() {		bool AMDGPUPassConfig::addInstSelector() {
addPass(createAMDGPUISelDag(&getAMDGPUTargetMachine(), getOptLevel()));		addPass(createAMDGPUISelDag(&getAMDGPUTargetMachine(), getOptLevel()));
return false;		return false;
}		}

▲ Show 20 Lines • Show All 511 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
AMDGPUAlwaysInlinePass.cpp		AMDGPUAlwaysInlinePass.cpp
AMDGPUAnnotateKernelFeatures.cpp		AMDGPUAnnotateKernelFeatures.cpp
AMDGPUAnnotateUniformValues.cpp		AMDGPUAnnotateUniformValues.cpp
AMDGPUArgumentUsageInfo.cpp		AMDGPUArgumentUsageInfo.cpp
AMDGPUAsmPrinter.cpp		AMDGPUAsmPrinter.cpp
AMDGPUAtomicOptimizer.cpp		AMDGPUAtomicOptimizer.cpp
AMDGPUAttributor.cpp		AMDGPUAttributor.cpp
AMDGPUCallLowering.cpp		AMDGPUCallLowering.cpp
		AMDGPUClearIncompatibleFunctions.cpp
AMDGPUCodeGenPrepare.cpp		AMDGPUCodeGenPrepare.cpp
AMDGPUCombinerHelper.cpp		AMDGPUCombinerHelper.cpp
AMDGPUCtorDtorLowering.cpp		AMDGPUCtorDtorLowering.cpp
AMDGPUExportClustering.cpp		AMDGPUExportClustering.cpp
AMDGPUFrameLowering.cpp		AMDGPUFrameLowering.cpp
AMDGPUGlobalISelUtils.cpp		AMDGPUGlobalISelUtils.cpp
AMDGPUHSAMetadataStreamer.cpp		AMDGPUHSAMetadataStreamer.cpp
AMDGPUInsertDelayAlu.cpp		AMDGPUInsertDelayAlu.cpp
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/dummy-target.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -stop-after=legalizer -o - %s \| FileCheck %s			; RUN: llc -global-isel -amdgpu-enable-clear-incompatible-functions=0 -mtriple=amdgcn-amd-amdhsa -stop-after=legalizer -o - %s \| FileCheck %s

	; Make sure legalizer info doesn't assert on dummy targets			; Make sure legalizer info doesn't assert on dummy targets

	define i16 @vop3p_add_i16(i16 %arg0) #0 {			define i16 @vop3p_add_i16(i16 %arg0) #0 {
	; CHECK-LABEL: name: vop3p_add_i16			; CHECK-LABEL: name: vop3p_add_i16
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK-NEXT: liveins: $vgpr0			; CHECK-NEXT: liveins: $vgpr0
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/clear-incompatible-functions.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s 2>%t \| FileCheck -check-prefix=GFX7 %s
				; RUN: FileCheck --check-prefix=WARN-GFX7 %s < %t

				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s 2>%t \| FileCheck -check-prefix=GFX8 %s
				; RUN: FileCheck --check-prefix=WARN-GFX8 %s < %t

				; RUN: llc -march=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s 2>%t \| FileCheck -check-prefixes=GFX9,GFX906 %s
				; RUN: FileCheck --check-prefix=WARN-GFX906 %s < %t

				; RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs < %s 2>%t \| FileCheck -check-prefixes=GFX9,GFX90A %s
				; RUN: FileCheck --check-prefix=WARN-GFX90A %s < %t

				; RUN: llc -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s 2>%t \| FileCheck -check-prefix=GFX10 %s
				; RUN: FileCheck --check-prefix=WARN-GFX10 %s < %t

				; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s 2>%t \| FileCheck -check-prefix=GFX11 %s
				; RUN: FileCheck --check-prefix=WARN-GFX11 %s < %t

				; WARN-GFX7: needs_dpp {{.*}} +dpp is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_16bit_insts {{.*}} +16-bit-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_gfx8_insts {{.*}} +gfx8-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_gfx9_insts {{.*}} +gfx9-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_gfx10_insts {{.*}} +gfx10-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_gfx11_insts {{.*}} +gfx11-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot1_insts {{.*}} +dot1-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot2_insts {{.*}} +dot2-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot3_insts {{.*}} +dot3-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot4_insts {{.*}} +dot4-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot5_insts {{.*}} +dot5-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot6_insts {{.*}} +dot6-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot7_insts {{.*}} +dot7-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7: needs_dot8_insts {{.*}} +dot8-insts is not supported on the current target. Deleting function body.
				; WARN-GFX7-NOT: not supported

				; WARN-GFX8: needs_gfx9_insts {{.*}} +gfx9-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_gfx10_insts {{.*}} +gfx10-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_gfx11_insts {{.*}} +gfx11-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot1_insts {{.*}} +dot1-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot2_insts {{.*}} +dot2-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot3_insts {{.*}} +dot3-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot4_insts {{.*}} +dot4-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot5_insts {{.*}} +dot5-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot6_insts {{.*}} +dot6-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot7_insts {{.*}} +dot7-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8: needs_dot8_insts {{.*}} +dot8-insts is not supported on the current target. Deleting function body.
				; WARN-GFX8-NOT: not supported

				; WARN-GFX906: needs_gfx10_insts {{.*}} +gfx10-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906: needs_gfx11_insts {{.*}} +gfx11-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906: needs_dot3_insts {{.*}} +dot3-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906: needs_dot4_insts {{.*}} +dot4-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906: needs_dot5_insts {{.*}} +dot5-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906: needs_dot6_insts {{.*}} +dot6-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906: needs_dot8_insts {{.*}} +dot8-insts is not supported on the current target. Deleting function body.
				; WARN-GFX906-NOT: not supported

				; WARN-GFX90A: needs_gfx10_insts {{.*}} +gfx10-insts is not supported on the current target. Deleting function body.
				; WARN-GFX90A: needs_gfx11_insts {{.*}} +gfx11-insts is not supported on the current target. Deleting function body.
				; WARN-GFX90A: needs_dot8_insts {{.*}} +dot8-insts is not supported on the current target. Deleting function body.
				; WARN-GFX90A-NOT: not supported

				; WARN-GFX10: needs_gfx11_insts {{.*}} +gfx11-insts is not supported on the current target. Deleting function body.
				; WARN-GFX10: needs_dot3_insts {{.*}} +dot3-insts is not supported on the current target. Deleting function body.
				; WARN-GFX10: needs_dot4_insts {{.*}} +dot4-insts is not supported on the current target. Deleting function body.
				; WARN-GFX10: needs_dot8_insts {{.*}} +dot8-insts is not supported on the current target. Deleting function body.
				; WARN-GFX10-NOT: not supported

				; WARN-GFX11: needs_dot1_insts {{.*}} +dot1-insts is not supported on the current target. Deleting function body.
				; WARN-GFX11: needs_dot2_insts {{.*}} +dot2-insts is not supported on the current target. Deleting function body.
				; WARN-GFX11: needs_dot3_insts {{.*}} +dot3-insts is not supported on the current target. Deleting function body.
				; WARN-GFX11: needs_dot4_insts {{.*}} +dot4-insts is not supported on the current target. Deleting function body.
				; WARN-GFX11: needs_dot6_insts {{.*}} +dot6-insts is not supported on the current target. Deleting function body.
				; WARN-GFX11-NOT: not supported

				define void @needs_dpp(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #0 {
				; GFX7-LABEL: needs_dpp:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dpp:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX8-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX8-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX8-NEXT: ; %bb.1: ; %else
				; GFX8-NEXT: v_add_u32_e32 v8, vcc, v4, v6
				; GFX8-NEXT: v_addc_u32_e32 v9, vcc, v5, v7, vcc
				; GFX8-NEXT: ; implicit-def: $vgpr2
				; GFX8-NEXT: ; %bb.2: ; %Flow
				; GFX8-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX8-NEXT: s_cbranch_execz .LBB0_4
				; GFX8-NEXT: ; %bb.3: ; %if
				; GFX8-NEXT: flat_load_dwordx2 v[8:9], v[2:3]
				; GFX8-NEXT: .LBB0_4: ; %endif
				; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX8-NEXT: s_waitcnt vmcnt(0)
				; GFX8-NEXT: flat_store_dwordx2 v[0:1], v[8:9]
				; GFX8-NEXT: s_waitcnt vmcnt(0)
				; GFX8-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-LABEL: needs_dpp:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX9-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX9-NEXT: ; %bb.1: ; %else
				; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, v5, v7, vcc
				; GFX9-NEXT: ; implicit-def: $vgpr2
				; GFX9-NEXT: ; %bb.2: ; %Flow
				; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX9-NEXT: s_cbranch_execz .LBB0_4
				; GFX9-NEXT: ; %bb.3: ; %if
				; GFX9-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX9-NEXT: .LBB0_4: ; %endif
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dpp:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[4:5]
				; GFX10-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
				; GFX10-NEXT: s_xor_b32 s4, exec_lo, s4
				; GFX10-NEXT: ; %bb.1: ; %else
				; GFX10-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: ; implicit-def: $vgpr2
				; GFX10-NEXT: ; %bb.2: ; %Flow
				; GFX10-NEXT: s_andn2_saveexec_b32 s4, s4
				; GFX10-NEXT: s_cbranch_execz .LBB0_4
				; GFX10-NEXT: ; %bb.3: ; %if
				; GFX10-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX10-NEXT: .LBB0_4: ; %endif
				; GFX10-NEXT: s_waitcnt_depctr 0xffe3
				; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_dpp:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_mov_b32 s0, exec_lo
				; GFX11-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX11-NEXT: v_cmpx_ne_u64_e32 0, v[4:5]
				; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
				; GFX11-NEXT: ; %bb.1: ; %else
				; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: ; implicit-def: $vgpr2
				; GFX11-NEXT: ; %bb.2: ; %Flow
				; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
				; GFX11-NEXT: s_cbranch_execz .LBB0_4
				; GFX11-NEXT: ; %bb.3: ; %if
				; GFX11-NEXT: global_load_b64 v[8:9], v[2:3], off
				; GFX11-NEXT: .LBB0_4: ; %endif
				; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
				; GFX11-NEXT: s_waitcnt vmcnt(0)
				; GFX11-NEXT: global_store_b64 v[0:1], v[8:9], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_16bit_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #1 {
				; GFX7-LABEL: needs_16bit_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_16bit_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX8-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX8-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX8-NEXT: ; %bb.1: ; %else
				; GFX8-NEXT: v_add_u32_e32 v8, vcc, v4, v6
				; GFX8-NEXT: v_addc_u32_e32 v9, vcc, v5, v7, vcc
				; GFX8-NEXT: ; implicit-def: $vgpr2
				; GFX8-NEXT: ; %bb.2: ; %Flow
				; GFX8-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX8-NEXT: s_cbranch_execz .LBB1_4
				; GFX8-NEXT: ; %bb.3: ; %if
				; GFX8-NEXT: flat_load_dwordx2 v[8:9], v[2:3]
				; GFX8-NEXT: .LBB1_4: ; %endif
				; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX8-NEXT: s_waitcnt vmcnt(0)
				; GFX8-NEXT: flat_store_dwordx2 v[0:1], v[8:9]
				; GFX8-NEXT: s_waitcnt vmcnt(0)
				; GFX8-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-LABEL: needs_16bit_insts:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX9-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX9-NEXT: ; %bb.1: ; %else
				; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, v5, v7, vcc
				; GFX9-NEXT: ; implicit-def: $vgpr2
				; GFX9-NEXT: ; %bb.2: ; %Flow
				; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX9-NEXT: s_cbranch_execz .LBB1_4
				; GFX9-NEXT: ; %bb.3: ; %if
				; GFX9-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX9-NEXT: .LBB1_4: ; %endif
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_16bit_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[4:5]
				; GFX10-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
				; GFX10-NEXT: s_xor_b32 s4, exec_lo, s4
				; GFX10-NEXT: ; %bb.1: ; %else
				; GFX10-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: ; implicit-def: $vgpr2
				; GFX10-NEXT: ; %bb.2: ; %Flow
				; GFX10-NEXT: s_andn2_saveexec_b32 s4, s4
				; GFX10-NEXT: s_cbranch_execz .LBB1_4
				; GFX10-NEXT: ; %bb.3: ; %if
				; GFX10-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX10-NEXT: .LBB1_4: ; %endif
				; GFX10-NEXT: s_waitcnt_depctr 0xffe3
				; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_16bit_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_mov_b32 s0, exec_lo
				; GFX11-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX11-NEXT: v_cmpx_ne_u64_e32 0, v[4:5]
				; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
				; GFX11-NEXT: ; %bb.1: ; %else
				; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: ; implicit-def: $vgpr2
				; GFX11-NEXT: ; %bb.2: ; %Flow
				; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
				; GFX11-NEXT: s_cbranch_execz .LBB1_4
				; GFX11-NEXT: ; %bb.3: ; %if
				; GFX11-NEXT: global_load_b64 v[8:9], v[2:3], off
				; GFX11-NEXT: .LBB1_4: ; %endif
				; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
				; GFX11-NEXT: s_waitcnt vmcnt(0)
				; GFX11-NEXT: global_store_b64 v[0:1], v[8:9], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_gfx8_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #2 {
				; GFX7-LABEL: needs_gfx8_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_gfx8_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX8-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX8-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX8-NEXT: ; %bb.1: ; %else
				; GFX8-NEXT: v_add_u32_e32 v8, vcc, v4, v6
				; GFX8-NEXT: v_addc_u32_e32 v9, vcc, v5, v7, vcc
				; GFX8-NEXT: ; implicit-def: $vgpr2
				; GFX8-NEXT: ; %bb.2: ; %Flow
				; GFX8-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX8-NEXT: s_cbranch_execz .LBB2_4
				; GFX8-NEXT: ; %bb.3: ; %if
				; GFX8-NEXT: flat_load_dwordx2 v[8:9], v[2:3]
				; GFX8-NEXT: .LBB2_4: ; %endif
				; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX8-NEXT: s_waitcnt vmcnt(0)
				; GFX8-NEXT: flat_store_dwordx2 v[0:1], v[8:9]
				; GFX8-NEXT: s_waitcnt vmcnt(0)
				; GFX8-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-LABEL: needs_gfx8_insts:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX9-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX9-NEXT: ; %bb.1: ; %else
				; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, v5, v7, vcc
				; GFX9-NEXT: ; implicit-def: $vgpr2
				; GFX9-NEXT: ; %bb.2: ; %Flow
				; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX9-NEXT: s_cbranch_execz .LBB2_4
				; GFX9-NEXT: ; %bb.3: ; %if
				; GFX9-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX9-NEXT: .LBB2_4: ; %endif
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_gfx8_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[4:5]
				; GFX10-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
				; GFX10-NEXT: s_xor_b32 s4, exec_lo, s4
				; GFX10-NEXT: ; %bb.1: ; %else
				; GFX10-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: ; implicit-def: $vgpr2
				; GFX10-NEXT: ; %bb.2: ; %Flow
				; GFX10-NEXT: s_andn2_saveexec_b32 s4, s4
				; GFX10-NEXT: s_cbranch_execz .LBB2_4
				; GFX10-NEXT: ; %bb.3: ; %if
				; GFX10-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX10-NEXT: .LBB2_4: ; %endif
				; GFX10-NEXT: s_waitcnt_depctr 0xffe3
				; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_gfx8_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_mov_b32 s0, exec_lo
				; GFX11-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX11-NEXT: v_cmpx_ne_u64_e32 0, v[4:5]
				; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
				; GFX11-NEXT: ; %bb.1: ; %else
				; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: ; implicit-def: $vgpr2
				; GFX11-NEXT: ; %bb.2: ; %Flow
				; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
				; GFX11-NEXT: s_cbranch_execz .LBB2_4
				; GFX11-NEXT: ; %bb.3: ; %if
				; GFX11-NEXT: global_load_b64 v[8:9], v[2:3], off
				; GFX11-NEXT: .LBB2_4: ; %endif
				; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
				; GFX11-NEXT: s_waitcnt vmcnt(0)
				; GFX11-NEXT: global_store_b64 v[0:1], v[8:9], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_gfx9_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #3 {
				; GFX7-LABEL: needs_gfx9_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_gfx9_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_gfx9_insts:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
				; GFX9-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
				; GFX9-NEXT: ; %bb.1: ; %else
				; GFX9-NEXT: v_add_co_u32_e32 v8, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v9, vcc, v5, v7, vcc
				; GFX9-NEXT: ; implicit-def: $vgpr2
				; GFX9-NEXT: ; %bb.2: ; %Flow
				; GFX9-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; GFX9-NEXT: s_cbranch_execz .LBB3_4
				; GFX9-NEXT: ; %bb.3: ; %if
				; GFX9-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX9-NEXT: .LBB3_4: ; %endif
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_gfx9_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[4:5]
				; GFX10-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
				; GFX10-NEXT: s_xor_b32 s4, exec_lo, s4
				; GFX10-NEXT: ; %bb.1: ; %else
				; GFX10-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: ; implicit-def: $vgpr2
				; GFX10-NEXT: ; %bb.2: ; %Flow
				; GFX10-NEXT: s_andn2_saveexec_b32 s4, s4
				; GFX10-NEXT: s_cbranch_execz .LBB3_4
				; GFX10-NEXT: ; %bb.3: ; %if
				; GFX10-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX10-NEXT: .LBB3_4: ; %endif
				; GFX10-NEXT: s_waitcnt_depctr 0xffe3
				; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_gfx9_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_mov_b32 s0, exec_lo
				; GFX11-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX11-NEXT: v_cmpx_ne_u64_e32 0, v[4:5]
				; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
				; GFX11-NEXT: ; %bb.1: ; %else
				; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: ; implicit-def: $vgpr2
				; GFX11-NEXT: ; %bb.2: ; %Flow
				; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
				; GFX11-NEXT: s_cbranch_execz .LBB3_4
				; GFX11-NEXT: ; %bb.3: ; %if
				; GFX11-NEXT: global_load_b64 v[8:9], v[2:3], off
				; GFX11-NEXT: .LBB3_4: ; %endif
				; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
				; GFX11-NEXT: s_waitcnt vmcnt(0)
				; GFX11-NEXT: global_store_b64 v[0:1], v[8:9], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_gfx10_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #4 {
				; GFX7-LABEL: needs_gfx10_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_gfx10_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_gfx10_insts:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: needs_gfx10_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[4:5]
				; GFX10-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
				; GFX10-NEXT: s_xor_b32 s4, exec_lo, s4
				; GFX10-NEXT: ; %bb.1: ; %else
				; GFX10-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: ; implicit-def: $vgpr2
				; GFX10-NEXT: ; %bb.2: ; %Flow
				; GFX10-NEXT: s_andn2_saveexec_b32 s4, s4
				; GFX10-NEXT: s_cbranch_execz .LBB4_4
				; GFX10-NEXT: ; %bb.3: ; %if
				; GFX10-NEXT: global_load_dwordx2 v[8:9], v[2:3], off
				; GFX10-NEXT: .LBB4_4: ; %endif
				; GFX10-NEXT: s_waitcnt_depctr 0xffe3
				; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[8:9], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_gfx10_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_mov_b32 s0, exec_lo
				; GFX11-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX11-NEXT: v_cmpx_ne_u64_e32 0, v[4:5]
				; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
				; GFX11-NEXT: ; %bb.1: ; %else
				; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: ; implicit-def: $vgpr2
				; GFX11-NEXT: ; %bb.2: ; %Flow
				; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
				; GFX11-NEXT: s_cbranch_execz .LBB4_4
				; GFX11-NEXT: ; %bb.3: ; %if
				; GFX11-NEXT: global_load_b64 v[8:9], v[2:3], off
				; GFX11-NEXT: .LBB4_4: ; %endif
				; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
				; GFX11-NEXT: s_waitcnt vmcnt(0)
				; GFX11-NEXT: global_store_b64 v[0:1], v[8:9], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_gfx11_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #5 {
				; GFX7-LABEL: needs_gfx11_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_gfx11_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_gfx11_insts:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: needs_gfx11_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_endpgm
				;
				; GFX11-LABEL: needs_gfx11_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_mov_b32 s0, exec_lo
				; GFX11-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX11-NEXT: v_cmpx_ne_u64_e32 0, v[4:5]
				; GFX11-NEXT: s_xor_b32 s0, exec_lo, s0
				; GFX11-NEXT: ; %bb.1: ; %else
				; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: ; implicit-def: $vgpr2
				; GFX11-NEXT: ; %bb.2: ; %Flow
				; GFX11-NEXT: s_and_not1_saveexec_b32 s0, s0
				; GFX11-NEXT: s_cbranch_execz .LBB5_4
				; GFX11-NEXT: ; %bb.3: ; %if
				; GFX11-NEXT: global_load_b64 v[8:9], v[2:3], off
				; GFX11-NEXT: .LBB5_4: ; %endif
				; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s0
				; GFX11-NEXT: s_waitcnt vmcnt(0)
				; GFX11-NEXT: global_store_b64 v[0:1], v[8:9], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot1_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #6 {
				; GFX7-LABEL: needs_dot1_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot1_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_dot1_insts:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot1_insts:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_dot1_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_endpgm
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot2_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #7 {
				; GFX7-LABEL: needs_dot2_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot2_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_dot2_insts:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot2_insts:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_dot2_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_endpgm
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot3_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #8 {
				; GFX7-LABEL: needs_dot3_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot3_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX906-LABEL: needs_dot3_insts:
				; GFX906: ; %bb.0: ; %entry
				; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX906-NEXT: s_endpgm
				;
				; GFX90A-LABEL: needs_dot3_insts:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX90A-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX90A-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot3_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_endpgm
				;
				; GFX11-LABEL: needs_dot3_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_endpgm
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}


				define void @needs_dot4_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #9 {
				; GFX7-LABEL: needs_dot4_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot4_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX906-LABEL: needs_dot4_insts:
				; GFX906: ; %bb.0: ; %entry
				; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX906-NEXT: s_endpgm
				;
				; GFX90A-LABEL: needs_dot4_insts:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX90A-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX90A-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot4_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_endpgm
				;
				; GFX11-LABEL: needs_dot4_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_endpgm
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot5_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #10 {
				; GFX7-LABEL: needs_dot5_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot5_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX906-LABEL: needs_dot5_insts:
				; GFX906: ; %bb.0: ; %entry
				; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX906-NEXT: s_endpgm
				;
				; GFX90A-LABEL: needs_dot5_insts:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX90A-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX90A-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot5_insts:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_dot5_insts:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: global_store_b64 v[0:1], v[2:3], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot6_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #11 {
				; GFX7-LABEL: needs_dot6_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot6_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX906-LABEL: needs_dot6_insts:
				; GFX906: ; %bb.0: ; %entry
				; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX906-NEXT: s_endpgm
				;
				; GFX90A-LABEL: needs_dot6_insts:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX90A-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX90A-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot6_insts:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_dot6_insts:
				; GFX11: ; %bb.0: ; %entry
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_endpgm
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot7_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #12 {
				; GFX7-LABEL: needs_dot7_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot7_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_dot7_insts:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
				; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
				; GFX9-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: needs_dot7_insts:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: needs_dot7_insts:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: global_store_b64 v[0:1], v[2:3], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				define void @needs_dot8_insts(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) #13 {
				; GFX7-LABEL: needs_dot8_insts:
				; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-NEXT: s_endpgm
				;
				; GFX8-LABEL: needs_dot8_insts:
				; GFX8: ; %bb.0: ; %entry
				; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX8-NEXT: s_endpgm
				;
				; GFX9-LABEL: needs_dot8_insts:
				; GFX9: ; %bb.0: ; %entry
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: needs_dot8_insts:
				; GFX10: ; %bb.0: ; %entry
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_endpgm
				;
				; GFX11-LABEL: needs_dot8_insts:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v4, v6
				; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v5, v7, vcc_lo
				; GFX11-NEXT: global_store_b64 v[0:1], v[2:3], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				attributes #0 = { "target-features"="+dpp" }
				attributes #1 = { "target-features"="+16-bit-insts" }
				attributes #2 = { "target-features"="+gfx8-insts" }
				attributes #3 = { "target-features"="+gfx9-insts" }
				attributes #4 = { "target-features"="+gfx10-insts" }
				attributes #5 = { "target-features"="+gfx11-insts" }
				attributes #6 = { "target-features"="+dot1-insts" }
				attributes #7 = { "target-features"="+dot2-insts" }
				attributes #8 = { "target-features"="+dot3-insts" }
				attributes #9 = { "target-features"="+dot4-insts" }
				attributes #10 = { "target-features"="+dot5-insts" }
				attributes #11 = { "target-features"="+dot6-insts" }
				attributes #12 = { "target-features"="+dot7-insts" }
				attributes #13 = { "target-features"="+dot8-insts" }

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: Call Graph SCC Pass Manager			; GCN-O0-NEXT: Call Graph SCC Pass Manager
	; GCN-O0-NEXT: AMDGPU Annotate Kernel Features			; GCN-O0-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O0-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O0-NEXT: Lazy Value Information Analysis			; GCN-O0-NEXT: Lazy Value Information Analysis
	; GCN-O0-NEXT: Lower SwitchInst's to branches			; GCN-O0-NEXT: Lower SwitchInst's to branches
	; GCN-O0-NEXT: Lower invoke and unwind, for unwindless code generators			; GCN-O0-NEXT: Lower invoke and unwind, for unwindless code generators
	; GCN-O0-NEXT: Remove unreachable blocks from the CFG			; GCN-O0-NEXT: Remove unreachable blocks from the CFG
				; GCN-O0-NEXT: AMDGPU Clear Incompatible Functions Bodies
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Post-Dominator Tree Construction
	; GCN-O0-NEXT: Dominator Tree Construction			; GCN-O0-NEXT: Dominator Tree Construction
	; GCN-O0-NEXT: Natural Loop Information			; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: Legacy Divergence Analysis			; GCN-O0-NEXT: Legacy Divergence Analysis
	; GCN-O0-NEXT: Unify divergent function exit nodes			; GCN-O0-NEXT: Unify divergent function exit nodes
	; GCN-O0-NEXT: Lazy Value Information Analysis			; GCN-O0-NEXT: Lazy Value Information Analysis
	; GCN-O0-NEXT: Lower SwitchInst's to branches			; GCN-O0-NEXT: Lower SwitchInst's to branches
	; GCN-O0-NEXT: Dominator Tree Construction			; GCN-O0-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Lazy Value Information Analysis			; GCN-O1-NEXT: Lazy Value Information Analysis
	; GCN-O1-NEXT: Lower SwitchInst's to branches			; GCN-O1-NEXT: Lower SwitchInst's to branches
	; GCN-O1-NEXT: Lower invoke and unwind, for unwindless code generators			; GCN-O1-NEXT: Lower invoke and unwind, for unwindless code generators
	; GCN-O1-NEXT: Remove unreachable blocks from the CFG			; GCN-O1-NEXT: Remove unreachable blocks from the CFG
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Flatten the CFG			; GCN-O1-NEXT: Flatten the CFG
				; GCN-O1-NEXT: AMDGPU Clear Incompatible Functions Bodies
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Legacy Divergence Analysis			; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: AMDGPU IR late optimizations			; GCN-O1-NEXT: AMDGPU IR late optimizations
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Code sinking			; GCN-O1-NEXT: Code sinking
	▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Lazy Value Information Analysis			; GCN-O1-OPTS-NEXT: Lazy Value Information Analysis
	; GCN-O1-OPTS-NEXT: Lower SwitchInst's to branches			; GCN-O1-OPTS-NEXT: Lower SwitchInst's to branches
	; GCN-O1-OPTS-NEXT: Lower invoke and unwind, for unwindless code generators			; GCN-O1-OPTS-NEXT: Lower invoke and unwind, for unwindless code generators
	; GCN-O1-OPTS-NEXT: Remove unreachable blocks from the CFG			; GCN-O1-OPTS-NEXT: Remove unreachable blocks from the CFG
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Flatten the CFG			; GCN-O1-OPTS-NEXT: Flatten the CFG
				; GCN-O1-OPTS-NEXT: AMDGPU Clear Incompatible Functions Bodies
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis			; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
	; GCN-O1-OPTS-NEXT: AMDGPU IR late optimizations			; GCN-O1-OPTS-NEXT: AMDGPU IR late optimizations
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Code sinking			; GCN-O1-OPTS-NEXT: Code sinking
	▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Lazy Value Information Analysis			; GCN-O2-NEXT: Lazy Value Information Analysis
	; GCN-O2-NEXT: Lower SwitchInst's to branches			; GCN-O2-NEXT: Lower SwitchInst's to branches
	; GCN-O2-NEXT: Lower invoke and unwind, for unwindless code generators			; GCN-O2-NEXT: Lower invoke and unwind, for unwindless code generators
	; GCN-O2-NEXT: Remove unreachable blocks from the CFG			; GCN-O2-NEXT: Remove unreachable blocks from the CFG
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Flatten the CFG			; GCN-O2-NEXT: Flatten the CFG
				; GCN-O2-NEXT: AMDGPU Clear Incompatible Functions Bodies
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Legacy Divergence Analysis			; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: AMDGPU IR late optimizations			; GCN-O2-NEXT: AMDGPU IR late optimizations
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Code sinking			; GCN-O2-NEXT: Code sinking
	▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Lazy Value Information Analysis			; GCN-O3-NEXT: Lazy Value Information Analysis
	; GCN-O3-NEXT: Lower SwitchInst's to branches			; GCN-O3-NEXT: Lower SwitchInst's to branches
	; GCN-O3-NEXT: Lower invoke and unwind, for unwindless code generators			; GCN-O3-NEXT: Lower invoke and unwind, for unwindless code generators
	; GCN-O3-NEXT: Remove unreachable blocks from the CFG			; GCN-O3-NEXT: Remove unreachable blocks from the CFG
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Flatten the CFG			; GCN-O3-NEXT: Flatten the CFG
				; GCN-O3-NEXT: AMDGPU Clear Incompatible Functions Bodies
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Legacy Divergence Analysis			; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: AMDGPU IR late optimizations			; GCN-O3-NEXT: AMDGPU IR late optimizations
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Code sinking			; GCN-O3-NEXT: Code sinking
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove function with incompatible featuresClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 479600

llvm/include/llvm/MC/MCSubtargetInfo.h

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/test/CodeGen/AMDGPU/GlobalISel/dummy-target.ll

llvm/test/CodeGen/AMDGPU/clear-incompatible-functions.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

[AMDGPU] Remove function with incompatible features
ClosedPublic