This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
129
167	Why not just go over CalleeAttributes here?
192	might be more interesting to print them all out.
223	It is odd we initialize with a call graph. I'm not sure we need this at all. The TM can be checked in the runOnModule and we might want to not support the old-PM right away.

Implement more functionality.
Support more tests.

Herald added a subscriber: ormris. · View Herald TranscriptJun 27 2021, 3:00 PM

kuter marked an inline comment as done.Jun 27 2021, 3:02 PM

jdoerfert added inline comments.Jun 27 2021, 3:11 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
134	There should be at least a todo. We should actually look at all call sites and if that succeeds we can propagate information just fine. Address taken doesn't need to be a bad thing per se.
207	I imagine we need to walk all instructions and look at all operands here, no?

Harbormaster completed remote builds in B111209: Diff 354775.Jun 27 2021, 3:31 PM

kuter added inline comments.Jun 27 2021, 3:47 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
134	I agree, but `simple-indirect-call.ll` depends on this behavior.
207	yes we do.

jdoerfert added inline comments.Jun 27 2021, 3:49 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
134	Add a fixme for now then.
207	Or, you start with all globals in the interesting address spaces and make your way down the use chains. Probably cheaper.

jdoerfert added inline comments.Jun 27 2021, 10:55 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
207	Or, you ask AAMemoryLocation for all globals that can be accessed. The downside is (potentially) that it won't track non-access uses, e.g. `return &shared_mem = ptr;` Unsure if that is needed.

arsenm added inline comments.Jun 28 2021, 6:26 AM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32	These are the attributes as they exist now, but I do think these need to be inverted to be more sound. Assuming they are present is the conservative direction, so ideally we would operate on a no-* basis
192	I do not like this attribute and don't believe it's very sound, but I guess continuing with it doesn't make things worse
200–201	This also depends on the subtarget since we don't need this on newer ones for addrspacecast

@arsenm I'd suggest we get to feature parity with the existing pass, then add more features as needed, then modify the names/polarity as we want it to be. Doing it as part of this will cause too much complexity.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32	I'd recommend to do that in a follow up as this is tested on it's own first, wdyt?

@kuter Are there other tests we haven't ported to this yet? If there are things we don't do yet, just add FIXMEs for now.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
185–186

arsenm added inline comments.Jun 28 2021, 1:34 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32	Sure, but I would assume inference works naturally in the negative direction?

jdoerfert added inline comments.Jun 29 2021, 7:58 AM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32	yes, it should. Not sure I would say the direction of the inference is different but the lattice we use for the state is reversed, still pulling information into kernels (transitively).

ormris removed a subscriber: ormris.Jun 29 2021, 10:13 AM

Added Uniform Work Group deduction.
Addressed review.

Harbormaster completed remote builds in B111828: Diff 355657.Jun 30 2021, 1:54 PM

jdoerfert added inline comments.Jul 3 2021, 9:41 AM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
126
276–283	don't do this. Use the regular, checkforallcallsites mechanism instead. It will detect non-call site uses just fine but also allow callbacks and implicit calls (soon).

kuter marked an inline comment as done.Jul 3 2021, 2:04 PM

kuter added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
276–283	The flow of the deduction here is callee -> caller. I don't really understand why this needs to be done in the first place. I just assumed that there is a special reason why this is needed `simple-indirect-call.ll` explicitly checks for this.

Support more tests.
Fix the problem with existing tests.
Simplify logic.
Misc changes.

Anything except the constant expression address space cast tests is supported with
attributor now.

I will add the missing feature soon.

Harbormaster completed remote builds in B112347: Diff 356365.Jul 3 2021, 9:41 PM

Can you add a test:

kernel() { // uniform-work-group-size = true
  weak();
}

weak() {  // weak linkage
  internal();
}

int G;
internal() {  // internal linkage
  G = 0;
}

here internal should have uniform-work-group-size = true.

Also, add:

kernel() {  // uniform-work-group-size = true
  internal1();
}
internal1() {  // internal linkage, same below.
  internal2();
}
int G;
internal2() {
  if (G == 0) {
    internal3();
    internal2();
  }
}
internal3() {
  G = 1;
}

everything should have uniform work group size = true

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
171–176	As I said before, this is not needed.
196	you need to check the return value here, I think.
234	Maybe you just use the boolean state instead? Having the additional boolean here doesn't seem to bring any benefit. You can probably just expose `clampStateAndIndicateChange` in the Attributor.h (`AA` namespace) and then use it in the `CheckCallSite` callback. There should not be anything to do but ask for the callee AA and clamp. This should remove 20 or so lines here.
357	I don't think the address space stuff is good. We should use a dedicated AA to track global uses or reuse AAMemoryLocations to ask what globals are accessed. The latter doesn't capture non-access uses though, so maybe the former is needed.

Add support for constant exploration
All tests are supported now
Bug fix

There are some slight differences in the deduction of uniform-work-group-size
mainly in some cases the attributor is setting uniform-work-group-size to false when
the AMDGPUAnnotateKernelFeatures.cpp is not adding any attribute.

It is not possible to exactly match the behaviour without some really hacky code.
I will address some of the reviews now.

Harbormaster completed remote builds in B113291: Diff 357635.Jul 9 2021, 4:18 PM

kuter added inline comments.Jul 10 2021, 3:15 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
171–176	`uniform-work-group-propagate-attribute.ll` specifically checks for this the `@weak_func`
234	I don't think this would work. In some cases we need to be able turn `uniform-work-group-size` attribute to false even if it is initialized with true. in `uniform-work-group-attribute-missing.ll` `@foo` get's initialized with `uniform-work-group-size` true. But we turn it to false because the kernel doesn't have the attribute (for kernels we assume it is false if it is not present). If we where to use the BooleanState, `@foo` would automatically be at a fixpoint on initialization (since it is set to true). and we wouldn't be able to turn it to false.

kuter added inline comments.Jul 10 2021, 3:21 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
234	Also @jdoerfert There is now several attributes that use `BooleanState` as a "placeholder" state. I think it would be great if we had something like a `VoidState` to avoid confusion. even if it was as simple as `using VoidState = BooleanState;`

Address review.
Small naming change.

Did you add the tests I suggested?

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
234	Hm, but we use the boolean in those cases don't we? To distinguish "good" and "bad". We can talk offline.
234	I don't think this would work. .. It does, if you online finalize kernels in the initialization everything else should not even read the attribute from the IR but just start optimistic and then go on.
llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll
63	it doesnt?
llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll
50	`weak_func` can be annotated with `"uniform-work-group-size"="true"` because there is no way it is called from a kernel with `"uniform-work-group-size"="false"`. The test is just too conservative here. Please don't keep the linkage check as it is not needed.

Harbormaster completed remote builds in B113382: Diff 357760.Jul 10 2021, 5:05 PM

Added requested test.
Removed the check for linkage type

Harbormaster completed remote builds in B113470: Diff 357888.Jul 12 2021, 4:34 AM

Make the uniform-work-group-size deduction use the BooleanState.
Rebase.

Harbormaster completed remote builds in B113905: Diff 358497.Jul 13 2021, 7:45 PM

Small change.

Harbormaster completed remote builds in B113906: Diff 358498.Jul 13 2021, 7:50 PM

Inline assembly call sites are no longer treated as unknown callees.
This fixes some differences in deduction.

I will double check everything, but attributors deduction of attributes
should now be equievelent to annotate-kernel-features pass.

Harbormaster completed remote builds in B114149: Diff 358834.Jul 14 2021, 9:01 PM

clang-format (sorry that I forgot)
Remove the WIP tag.

kuter retitled this revision from [WIP][AMDGPU] Deduce attributes with the Attributor to [AMDGPU] Deduce attributes with the Attributor.Jul 16 2021, 12:24 PM

arsenm added inline comments.Jul 16 2021, 1:06 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
124	I don't know why you are tracking this here. This isn't entirely true anymore, plus there's a dedicated pass for this?
227	Braces

Harbormaster completed remote builds in B114568: Diff 359413.Jul 16 2021, 1:10 PM

kuter added inline comments.Jul 16 2021, 8:45 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
124	This is used for the deduction of `amdgpu-queue-ptr` attribute. For non entry entry functions if the function uses a ds global (even transitively) it gets the amdgpuqueue-ptr attribute. I tried to replicate what AnnotateKernelFeatures.cpp does since there isn't any documentation available about the attribute(as far as I know). How do you think we should proceed ?

LGTM with nit. I still think we should invert these attributes though

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
124	Oh right, this is in case we were to insert a trap and using the legacy trap ABI which required the queue ptr. We can now handle some cases of DS instructions in functions without emitting traps, but I guess preserving this for now is fine

This revision is now accepted and ready to land.Jul 19 2021, 4:19 PM

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
160	Nit: Some doxygen docu about these methods and members would be good.

This revision was landed with ongoing or failed builds.Jul 23 2021, 8:07 PM

Closed by commit rG96709823ec37: [AMDGPU] Deduce attributes with the Attributor (authored by kuter). · Explain Why

This revision was automatically updated to reflect the committed changes.

kuter added a commit: rG96709823ec37: [AMDGPU] Deduce attributes with the Attributor.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

Attributor.h

21 lines

lib/

Target/

AMDGPU/

AMDGPU.h

2 lines

AMDGPUAttributor.cpp

529 lines

AMDGPUTargetMachine.cpp

1 line

CMakeLists.txt

1 line

Transforms/

IPO/

Attributor.cpp

35 lines

AttributorAttributes.cpp

30 lines

test/

CodeGen/

AMDGPU/

addrspacecast-constantexpr.ll

13 lines

annotate-kernel-features-hsa-call.ll

536 lines

annotate-kernel-features-hsa.ll

46 lines

annotate-kernel-features.ll

149 lines

direct-indirect-call.ll

34 lines

duplicate-attribute-indirect.ll

20 lines

pal-simple-indirect-call.ll

28 lines

simple-indirect-call.ll

12 lines

uniform-work-group-attribute-missing.ll

33 lines

uniform-work-group-multistep.ll

157 lines

uniform-work-group-nested-function-calls.ll

59 lines

uniform-work-group-prevent-attribute-propagation.ll

49 lines

uniform-work-group-propagate-attribute.ll

63 lines

uniform-work-group-recursion-test.ll

142 lines

uniform-work-group-test.ll

65 lines

Diff 361408

llvm/include/llvm/Transforms/IPO/Attributor.h

Show First 20 Lines • Show All 1,587 Lines • ▼ Show 20 Lines	public:
/// Return true if \p IRP is assumed dead.		/// Return true if \p IRP is assumed dead.
///		///
/// If \p FnLivenessAA is not provided it is queried.		/// If \p FnLivenessAA is not provided it is queried.
bool isAssumedDead(const IRPosition &IRP, const AbstractAttribute *QueryingAA,		bool isAssumedDead(const IRPosition &IRP, const AbstractAttribute *QueryingAA,
const AAIsDead *FnLivenessAA, bool &UsedAssumedInformation,		const AAIsDead *FnLivenessAA, bool &UsedAssumedInformation,
bool CheckBBLivenessOnly = false,		bool CheckBBLivenessOnly = false,
DepClassTy DepClass = DepClassTy::OPTIONAL);		DepClassTy DepClass = DepClassTy::OPTIONAL);

		/// Return true if \p BB is assumed dead.
		///
		/// If \p LivenessAA is not provided it is queried.
		bool isAssumedDead(const BasicBlock &BB, const AbstractAttribute *QueryingAA,
		const AAIsDead *FnLivenessAA,
		DepClassTy DepClass = DepClassTy::OPTIONAL);

/// Check \p Pred on all (transitive) uses of \p V.		/// Check \p Pred on all (transitive) uses of \p V.
///		///
/// This method will evaluate \p Pred on all (transitive) uses of the		/// This method will evaluate \p Pred on all (transitive) uses of the
/// associated value and return true if \p Pred holds every time.		/// associated value and return true if \p Pred holds every time.
bool checkForAllUses(function_ref<bool(const Use &, bool &)> Pred,		bool checkForAllUses(function_ref<bool(const Use &, bool &)> Pred,
const AbstractAttribute &QueryingAA, const Value &V,		const AbstractAttribute &QueryingAA, const Value &V,
bool CheckBBLivenessOnly = false,		bool CheckBBLivenessOnly = false,
DepClassTy LivenessDepClass = DepClassTy::OPTIONAL);		DepClassTy LivenessDepClass = DepClassTy::OPTIONAL);
▲ Show 20 Lines • Show All 861 Lines • ▼ Show 20 Lines	IntegerRangeState operator&=(const IntegerRangeState &R) {
unionAssumed(R);		unionAssumed(R);
return *this;		return *this;
}		}
};		};
/// Helper struct necessary as the modular build fails if the virtual method		/// Helper struct necessary as the modular build fails if the virtual method
/// IRAttribute::manifest is defined in the Attributor.cpp.		/// IRAttribute::manifest is defined in the Attributor.cpp.
struct IRAttributeManifest {		struct IRAttributeManifest {
static ChangeStatus manifestAttrs(Attributor &A, const IRPosition &IRP,		static ChangeStatus manifestAttrs(Attributor &A, const IRPosition &IRP,
const ArrayRef<Attribute> &DeducedAttrs);		const ArrayRef<Attribute> &DeducedAttrs,
		bool ForceReplace = false);
};		};

/// Helper to tie a abstract state implementation to an abstract attribute.		/// Helper to tie a abstract state implementation to an abstract attribute.
template <typename StateTy, typename BaseType, class... Ts>		template <typename StateTy, typename BaseType, class... Ts>
struct StateWrapper : public BaseType, public StateTy {		struct StateWrapper : public BaseType, public StateTy {
/// Provide static access to the type of the state.		/// Provide static access to the type of the state.
using StateType = StateTy;		using StateType = StateTy;

▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
struct AttributorCGSCCPass : public PassInfoMixin<AttributorCGSCCPass> {		struct AttributorCGSCCPass : public PassInfoMixin<AttributorCGSCCPass> {
PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,		PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
LazyCallGraph &CG, CGSCCUpdateResult &UR);		LazyCallGraph &CG, CGSCCUpdateResult &UR);
};		};

Pass *createAttributorLegacyPass();		Pass *createAttributorLegacyPass();
Pass *createAttributorCGSCCLegacyPass();		Pass *createAttributorCGSCCLegacyPass();

		/// Helper function to clamp a state \p S of type \p StateType with the
		/// information in \p R and indicate/return if \p S did change (as-in update is
		/// required to be run again).
		template <typename StateType>
		ChangeStatus clampStateAndIndicateChange(StateType &S, const StateType &R) {
		auto Assumed = S.getAssumed();
		S ^= R;
		return Assumed == S.getAssumed() ? ChangeStatus::UNCHANGED
		: ChangeStatus::CHANGED;
		}

/// ----------------------------------------------------------------------------		/// ----------------------------------------------------------------------------
/// Abstract Attribute Classes		/// Abstract Attribute Classes
/// ----------------------------------------------------------------------------		/// ----------------------------------------------------------------------------

/// An abstract attribute for the returned values of a function.		/// An abstract attribute for the returned values of a function.
struct AAReturnedValues		struct AAReturnedValues
: public IRAttribute<Attribute::Returned, AbstractAttribute> {		: public IRAttribute<Attribute::Returned, AbstractAttribute> {
AAReturnedValues(const IRPosition &IRP, Attributor &A) : IRAttribute(IRP) {}		AAReturnedValues(const IRPosition &IRP, Attributor &A) : IRAttribute(IRP) {}
▲ Show 20 Lines • Show All 1,847 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	void initializeAMDGPUDAGToDAGISelPass(PassRegistry&);			void initializeAMDGPUDAGToDAGISelPass(PassRegistry&);

	void initializeAMDGPUMachineCFGStructurizerPass(PassRegistry&);			void initializeAMDGPUMachineCFGStructurizerPass(PassRegistry&);
	extern char &AMDGPUMachineCFGStructurizerID;			extern char &AMDGPUMachineCFGStructurizerID;

	void initializeAMDGPUAlwaysInlinePass(PassRegistry&);			void initializeAMDGPUAlwaysInlinePass(PassRegistry&);

	Pass *createAMDGPUAnnotateKernelFeaturesPass();			Pass *createAMDGPUAnnotateKernelFeaturesPass();
				Pass *createAMDGPUAttributorPass();
				void initializeAMDGPUAttributorPass(PassRegistry &);
	void initializeAMDGPUAnnotateKernelFeaturesPass(PassRegistry &);			void initializeAMDGPUAnnotateKernelFeaturesPass(PassRegistry &);
	extern char &AMDGPUAnnotateKernelFeaturesID;			extern char &AMDGPUAnnotateKernelFeaturesID;

	FunctionPass *createAMDGPUAtomicOptimizerPass();			FunctionPass *createAMDGPUAtomicOptimizerPass();
	void initializeAMDGPUAtomicOptimizerPass(PassRegistry &);			void initializeAMDGPUAtomicOptimizerPass(PassRegistry &);
	extern char &AMDGPUAtomicOptimizerID;			extern char &AMDGPUAtomicOptimizerID;

	ModulePass *createAMDGPULowerIntrinsicsPass();			ModulePass *createAMDGPULowerIntrinsicsPass();
	▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

This file was added.

//===- AMDGPUAttributor.cpp -----------------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

/// \file This pass uses Attributor framework to deduce AMDGPU attributes.

//===----------------------------------------------------------------------===//

#include "AMDGPU.h"

#include "GCNSubtarget.h"

#include "llvm/CodeGen/TargetPassConfig.h"

#include "llvm/IR/IntrinsicsAMDGPU.h"

#include "llvm/IR/IntrinsicsR600.h"

#include "llvm/Target/TargetMachine.h"

#include "llvm/Transforms/IPO/Attributor.h"

#define DEBUG_TYPE "amdgpu-attributor"

using namespace llvm;

static constexpr StringLiteral ImplicitAttrNames[] = {

// X ids unnecessarily propagated to kernels.

"amdgpu-work-item-id-x", "amdgpu-work-item-id-y",

"amdgpu-work-item-id-z", "amdgpu-work-group-id-x",

"amdgpu-work-group-id-y", "amdgpu-work-group-id-z",

"amdgpu-dispatch-ptr", "amdgpu-dispatch-id",

"amdgpu-queue-ptr", "amdgpu-implicitarg-ptr"};

arsenmUnsubmitted

Not Done

These are the attributes as they exist now, but I do think these need to be inverted to be more sound. Assuming they are present is the conservative direction, so ideally we would operate on a no-* basis

arsenm: These are the attributes as they exist now, but I do think these need to be inverted to be more…

jdoerfertUnsubmitted

Not Done

I'd recommend to do that in a follow up as this is tested on it's own first, wdyt?

jdoerfert: I'd recommend to do that in a follow up as this is tested on it's own first, wdyt?

arsenmUnsubmitted

Not Done

Sure, but I would assume inference works naturally in the negative direction?

arsenm: Sure, but I would assume inference works naturally in the negative direction?

jdoerfertUnsubmitted

Not Done

yes, it should. Not sure I would say the direction of the inference is different but the lattice we use for the state is reversed, still pulling information into kernels (transitively).

jdoerfert: yes, it should. Not sure I would say the direction of the inference is different but the…

// We do not need to note the x workitem or workgroup id because they are always

// initialized.

// TODO: We should not add the attributes if the known compile time workgroup

// size is 1 for y/z.

static StringRef intrinsicToAttrName(Intrinsic::ID ID, bool &NonKernelOnly,

bool &IsQueuePtr) {

switch (ID) {

case Intrinsic::amdgcn_workitem_id_x:

NonKernelOnly = true;

return "amdgpu-work-item-id-x";

case Intrinsic::amdgcn_workgroup_id_x:

NonKernelOnly = true;

return "amdgpu-work-group-id-x";

case Intrinsic::amdgcn_workitem_id_y:

case Intrinsic::r600_read_tidig_y:

return "amdgpu-work-item-id-y";

case Intrinsic::amdgcn_workitem_id_z:

case Intrinsic::r600_read_tidig_z:

return "amdgpu-work-item-id-z";

case Intrinsic::amdgcn_workgroup_id_y:

case Intrinsic::r600_read_tgid_y:

return "amdgpu-work-group-id-y";

case Intrinsic::amdgcn_workgroup_id_z:

case Intrinsic::r600_read_tgid_z:

return "amdgpu-work-group-id-z";

case Intrinsic::amdgcn_dispatch_ptr:

return "amdgpu-dispatch-ptr";

case Intrinsic::amdgcn_dispatch_id:

return "amdgpu-dispatch-id";

case Intrinsic::amdgcn_kernarg_segment_ptr:

return "amdgpu-kernarg-segment-ptr";

case Intrinsic::amdgcn_implicitarg_ptr:

return "amdgpu-implicitarg-ptr";

case Intrinsic::amdgcn_queue_ptr:

case Intrinsic::amdgcn_is_shared:

case Intrinsic::amdgcn_is_private:

// TODO: Does not require queue ptr on gfx9+

case Intrinsic::trap:

case Intrinsic::debugtrap:

IsQueuePtr = true;

return "amdgpu-queue-ptr";

default:

return "";

}

static bool castRequiresQueuePtr(unsigned SrcAS) {

return SrcAS == AMDGPUAS::LOCAL_ADDRESS || SrcAS == AMDGPUAS::PRIVATE_ADDRESS;

}

static bool isDSAddress(const Constant *C) {

const GlobalValue *GV = dyn_cast<GlobalValue>(C);

if (!GV)

return false;

unsigned AS = GV->getAddressSpace();

return AS == AMDGPUAS::LOCAL_ADDRESS || AS == AMDGPUAS::REGION_ADDRESS;

}

class AMDGPUInformationCache : public InformationCache {

public:

AMDGPUInformationCache(const Module &M, AnalysisGetter &AG,

BumpPtrAllocator &Allocator,

SetVector<Function *> *CGSCC, TargetMachine &TM)

: InformationCache(M, AG, Allocator, CGSCC), TM(TM) {}

TargetMachine &TM;

enum ConstantStatus { DS_GLOBAL = 1 << 0, ADDR_SPACE_CAST = 1 << 1 };

/// Check if the subtarget has aperture regs.

bool hasApertureRegs(Function &F) {

const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);

return ST.hasApertureRegs();

}

private:

/// Check if the ConstantExpr \p CE requires queue ptr attribute.

static bool visitConstExpr(const ConstantExpr *CE) {

if (CE->getOpcode() == Instruction::AddrSpaceCast) {

unsigned SrcAS = CE->getOperand(0)->getType()->getPointerAddressSpace();

return castRequiresQueuePtr(SrcAS);

}

return false;

}

/// Get the constant access bitmap for \p C.

uint8_t getConstantAccess(const Constant *C) {

auto It = ConstantStatus.find(C);

if (It != ConstantStatus.end())

return It->second;

uint8_t Result = 0;

arsenmUnsubmitted

Not Done

I don't know why you are tracking this here. This isn't entirely true anymore, plus there's a dedicated pass for this?

arsenm: I don't know why you are tracking this here. This isn't entirely true anymore, plus there's a…

kuterAuthorUnsubmitted

Done

This is used for the deduction of amdgpu-queue-ptr attribute. For non entry entry functions if the function uses a ds global (even transitively) it gets the amdgpuqueue-ptr attribute.
I tried to replicate what AnnotateKernelFeatures.cpp does since there isn't any documentation available about the attribute(as far as I know).
How do you think we should proceed ?

kuter: This is used for the deduction of `amdgpu-queue-ptr` attribute. For non entry entry functions…

arsenmUnsubmitted

Not Done

Oh right, this is in case we were to insert a trap and using the legacy trap ABI which required the queue ptr. We can now handle some cases of DS instructions in functions without emitting traps, but I guess preserving this for now is fine

arsenm: Oh right, this is in case we were to insert a trap and using the legacy trap ABI which required…

if (isDSAddress(C))

Result = DS_GLOBAL;

jdoerfertUnsubmitted

Done

/// See AbstractAttribute::getName().

- const std::string getName() const override { return "AAAMDAttributes"; }

+ const std::string getName() const override { return "AAAMDWorkGroupSize"; }

/// See AbstractAttribute::getIdAddr().

jdoerfert:

if (const auto *CE = dyn_cast<ConstantExpr>(C))

if (visitConstExpr(CE))

jdoerfertUnsubmitted

Done

ChangeStatus Change = ChangeStatus::UNCHANGED;

- bool IsFunc = !AMDGPU::isEntryFunctionCC(F->getCallingConv());

+ bool IsNonEntryFunc = !AMDGPU::isEntryFunctionCC(F->getCallingConv());

auto AddAttribute = [&](StringRef AttrName) {

jdoerfert:

Result |= ADDR_SPACE_CAST;

for (const Use &U : C->operands()) {

const auto *OpC = dyn_cast<Constant>(U);

if (!OpC)

jdoerfertUnsubmitted

Not Done

There should be at least a todo. We should actually look at all call sites and if that succeeds we can propagate information just fine. Address taken doesn't need to be a bad thing per se.

jdoerfert: There should be at least a todo. We should actually look at all call sites and if that succeeds…

kuterAuthorUnsubmitted

Done

I agree, but simple-indirect-call.ll depends on this behavior.

kuter: I agree, but `simple-indirect-call.ll` depends on this behavior.

jdoerfertUnsubmitted

Not Done

Add a fixme for now then.

jdoerfert: Add a fixme for now then.

continue;

Result |= getConstantAccess(OpC);

}

return Result;

}

public:

/// Returns true if \p Fn needs a queue ptr attribute because of \p C.

bool needsQueuePtr(const Constant *C, Function &Fn) {

bool IsNonEntryFunc = !AMDGPU::isEntryFunctionCC(Fn.getCallingConv());

bool HasAperture = hasApertureRegs(Fn);

// No need to explore the constants.

if (!IsNonEntryFunc && HasAperture)

return false;

uint8_t Access = getConstantAccess(C);

// We need to trap on DS globals in non-entry functions.

if (IsNonEntryFunc && (Access & DS_GLOBAL))

return true;

return !HasAperture && (Access & ADDR_SPACE_CAST);

}

jdoerfertUnsubmitted

Done

Nit: Some doxygen docu about these methods and members would be good.

jdoerfert: Nit: Some doxygen docu about these methods and members would be good.

private:

/// Used to determine if the Constant needs a queue ptr attribute.

DenseMap<const Constant *, uint8_t> ConstantStatus;

};

struct AAAMDAttributes : public StateWrapper<BooleanState, AbstractAttribute> {

using Base = StateWrapper<BooleanState, AbstractAttribute>;

jdoerfertUnsubmitted

Not Done

Why not just go over CalleeAttributes here?

jdoerfert: Why not just go over CalleeAttributes here?

AAAMDAttributes(const IRPosition &IRP, Attributor &A) : Base(IRP) {}

/// Create an abstract attribute view for the position \p IRP.

static AAAMDAttributes &createForPosition(const IRPosition &IRP,

Attributor &A);

/// See AbstractAttribute::getName().

const std::string getName() const override { return "AAAMDAttributes"; }

jdoerfertUnsubmitted

Not Done

<< F->getName() << "\n");

- if (!F->hasExactDefinition()) {

- LLVM_DEBUG(dbgs() << "[AMDWorkGroupSize] Giving up: " << F->getName() << "\n");

- NewUniformWorkGroupSize = false;

- return true;

- }

const auto &CallerInfo = A.getAAFor<AAAMDWorkGroupSize>(

As I said before, this is not needed.

jdoerfert: As I said before, this is not needed.

kuterAuthorUnsubmitted

Done

uniform-work-group-propagate-attribute.ll specifically checks for this the @weak_func

kuter: `uniform-work-group-propagate-attribute.ll` specifically checks for this the `@weak_func`

/// See AbstractAttribute::getIdAddr().

const char *getIdAddr() const override { return &ID; }

/// This function should return true if the type of the \p AA is

/// AAAMDAttributes.

static bool classof(const AbstractAttribute *AA) {

return (AA->getIdAddr() == &ID);

}

virtual const DenseSet<StringRef> &getAttributes() const = 0;

jdoerfertUnsubmitted

Not Done

// Propagate implicit attributes from called function.

- for (StringRef AttrName : ImplicitAttrNames)

- if (CalleeAttributes.count(AttrName))

+ for (StringRef AttrName : CalleeAttributes)

AddAttribute(AttrName);

jdoerfert:

/// Unique ID (due to the unique address)

static const char ID;

};

const char AAAMDAttributes::ID = 0;

jdoerfertUnsubmitted

Not Done

might be more interesting to print them all out.

jdoerfert: might be more interesting to print them all out.

arsenmUnsubmitted

Not Done

I do not like this attribute and don't believe it's very sound, but I guess continuing with it doesn't make things worse

arsenm: I do not like this attribute and don't believe it's very sound, but I guess continuing with it…

struct AAAMDWorkGroupSize

: public StateWrapper<BooleanState, AbstractAttribute> {

using Base = StateWrapper<BooleanState, AbstractAttribute>;

AAAMDWorkGroupSize(const IRPosition &IRP, Attributor &A) : Base(IRP) {}

jdoerfertUnsubmitted

Not Done

you need to check the return value here, I think.

jdoerfert: you need to check the return value here, I think.

/// Create an abstract attribute view for the position \p IRP.

static AAAMDWorkGroupSize &createForPosition(const IRPosition &IRP,

Attributor &A);

arsenmUnsubmitted

Done

This also depends on the subtarget since we don't need this on newer ones for addrspacecast

arsenm: This also depends on the subtarget since we don't need this on newer ones for addrspacecast

/// See AbstractAttribute::getName().

const std::string getName() const override { return "AAAMDWorkGroupSize"; }

/// See AbstractAttribute::getIdAddr().

const char *getIdAddr() const override { return &ID; }

jdoerfertUnsubmitted

Not Done

I imagine we need to walk all instructions and look at all operands here, no?

jdoerfert: I imagine we need to walk all instructions and look at all operands here, no?

kuterAuthorUnsubmitted

Done

yes we do.

kuter: yes we do.

jdoerfertUnsubmitted

Not Done

Or, you start with all globals in the interesting address spaces and make your way down the use chains. Probably cheaper.

jdoerfert: Or, you start with all globals in the interesting address spaces and make your way down the use…

jdoerfertUnsubmitted

Not Done

Or, you ask AAMemoryLocation for all globals that can be accessed. The downside is (potentially) that it won't track non-access uses, e.g. return &shared_mem = ptr; Unsure if that is needed.

jdoerfert: Or, you ask AAMemoryLocation for all globals that can be accessed. The downside is…

/// This function should return true if the type of the \p AA is

/// AAAMDAttributes.

static bool classof(const AbstractAttribute *AA) {

return (AA->getIdAddr() == &ID);

}

/// Unique ID (due to the unique address)

static const char ID;

};

const char AAAMDWorkGroupSize::ID = 0;

struct AAAMDWorkGroupSizeFunction : public AAAMDWorkGroupSize {

AAAMDWorkGroupSizeFunction(const IRPosition &IRP, Attributor &A)

: AAAMDWorkGroupSize(IRP, A) {}

void initialize(Attributor &A) override {

jdoerfertUnsubmitted

Done

It is odd we initialize with a call graph. I'm not sure we need this at all. The TM can be checked in the runOnModule and we might want to not support the old-PM right away.

jdoerfert: It is odd we initialize with a call graph. I'm not sure we need this at all. The TM can be…

Function *F = getAssociatedFunction();

CallingConv::ID CC = F->getCallingConv();

if (CC != CallingConv::AMDGPU_KERNEL)

arsenmUnsubmitted

Not Done

Braces

arsenm: Braces

return;

bool InitialValue = false;

if (F->hasFnAttribute("uniform-work-group-size"))

InitialValue = F->getFnAttribute("uniform-work-group-size")

.getValueAsString()

.equals("true");

jdoerfertUnsubmitted

Not Done

Maybe you just use the boolean state instead? Having the additional boolean here doesn't seem to bring any benefit. You can probably just expose clampStateAndIndicateChange in the Attributor.h (AA namespace) and then use it in the CheckCallSite callback. There should not be anything to do but ask for the callee AA and clamp. This should remove 20 or so lines here.

jdoerfert: Maybe you just use the boolean state instead? Having the additional boolean here doesn't seem…

kuterAuthorUnsubmitted

Done

I don't think this would work. In some cases we need to be able turn uniform-work-group-size attribute to false even if it is initialized with true.

in uniform-work-group-attribute-missing.ll @foo get's initialized with uniform-work-group-size true. But we turn it to false because the kernel doesn't have the attribute (for kernels we assume it is false if it is not present).

If we where to use the BooleanState, @foo would automatically be at a fixpoint on initialization (since it is set to true).
and we wouldn't be able to turn it to false.

kuter: I don't think this would work. In some cases we need to be able turn `uniform-work-group-size`…

kuterAuthorUnsubmitted

Done

Also @jdoerfert There is now several attributes that use BooleanState as a "placeholder" state.
I think it would be great if we had something like a VoidState to avoid confusion.
even if it was as simple as using VoidState = BooleanState;

kuter: Also @jdoerfert There is now several attributes that use `BooleanState` as a "placeholder"…

jdoerfertUnsubmitted

Not Done

Hm, but we use the boolean in those cases don't we? To distinguish "good" and "bad". We can talk offline.

jdoerfert: Hm, but we use the boolean in those cases don't we? To distinguish "good" and "bad". We can…

jdoerfertUnsubmitted

Not Done

I don't think this would work. ..

It does, if you online finalize kernels in the initialization everything else should not even read the attribute from the IR but just start optimistic and then go on.

jdoerfert: > I don't think this would work. .. It does, if you online finalize kernels in the…

if (InitialValue)

indicateOptimisticFixpoint();

else

indicatePessimisticFixpoint();

}

ChangeStatus updateImpl(Attributor &A) override {

Function *F = getAssociatedFunction();

ChangeStatus Change = ChangeStatus::UNCHANGED;

auto CheckCallSite = [&](AbstractCallSite CS) {

Function *Caller = CS.getInstruction()->getFunction();

LLVM_DEBUG(dbgs() << "[AAAMDWorkGroupSize] Call " << Caller->getName()

<< "->" << F->getName() << "\n");

const auto &CallerInfo = A.getAAFor<AAAMDWorkGroupSize>(

*this, IRPosition::function(*Caller), DepClassTy::REQUIRED);

Change = Change | clampStateAndIndicateChange(this->getState(),

CallerInfo.getState());

return true;

};

bool AllCallSitesKnown = true;

if (!A.checkForAllCallSites(CheckCallSite, *this, true, AllCallSitesKnown))

indicatePessimisticFixpoint();

return Change;

}

ChangeStatus manifest(Attributor &A) override {

SmallVector<Attribute, 8> AttrList;

LLVMContext &Ctx = getAssociatedFunction()->getContext();

AttrList.push_back(Attribute::get(Ctx, "uniform-work-group-size",

getAssumed() ? "true" : "false"));

return IRAttributeManifest::manifestAttrs(A, getIRPosition(), AttrList,

/* ForceReplace */ true);

}

bool isValidState() const override {

// This state is always valid, even when the state is false.

return true;

}

const std::string getAsStr() const override {

return "AMDWorkGroupSize[" + std::to_string(getAssumed()) + "]";

jdoerfertUnsubmitted

Not Done

don't do this. Use the regular, checkforallcallsites mechanism instead. It will detect non-call site uses just fine but also allow callbacks and implicit calls (soon).

jdoerfert: don't do this. Use the regular, checkforallcallsites mechanism instead. It will detect non-call…

kuterAuthorUnsubmitted

Done

The flow of the deduction here is callee -> caller. I don't really understand why this needs to be done in the first place.
I just assumed that there is a special reason why this is needed simple-indirect-call.ll explicitly checks for this.

kuter: The flow of the deduction here is callee -> caller. I don't really understand why this needs…

}

/// See AbstractAttribute::trackStatistics()

void trackStatistics() const override {}

};

AAAMDWorkGroupSize &AAAMDWorkGroupSize::createForPosition(const IRPosition &IRP,

Attributor &A) {

if (IRP.getPositionKind() == IRPosition::IRP_FUNCTION)

return *new (A.Allocator) AAAMDWorkGroupSizeFunction(IRP, A);

llvm_unreachable("AAAMDWorkGroupSize is only valid for function position");

}

struct AAAMDAttributesFunction : public AAAMDAttributes {

AAAMDAttributesFunction(const IRPosition &IRP, Attributor &A)

: AAAMDAttributes(IRP, A) {}

void initialize(Attributor &A) override {

Function *F = getAssociatedFunction();

CallingConv::ID CC = F->getCallingConv();

bool CallingConvSupportsAllImplicits = (CC != CallingConv::AMDGPU_Gfx);

// Don't add attributes to instrinsics

if (F->isIntrinsic()) {

indicatePessimisticFixpoint();

return;

}

// Ignore functions with graphics calling conventions, these are currently

// not allowed to have kernel arguments.

if (AMDGPU::isGraphics(F->getCallingConv())) {

indicatePessimisticFixpoint();

return;

}

for (StringRef Attr : ImplicitAttrNames) {

if (F->hasFnAttribute(Attr))

Attributes.insert(Attr);

}

// TODO: We shouldn't need this in the future.

if (CallingConvSupportsAllImplicits &&

F->hasAddressTaken(nullptr, true, true, true)) {

for (StringRef AttrName : ImplicitAttrNames) {

Attributes.insert(AttrName);

}

ChangeStatus updateImpl(Attributor &A) override {

Function *F = getAssociatedFunction();

ChangeStatus Change = ChangeStatus::UNCHANGED;

bool IsNonEntryFunc = !AMDGPU::isEntryFunctionCC(F->getCallingConv());

CallingConv::ID CC = F->getCallingConv();

bool CallingConvSupportsAllImplicits = (CC != CallingConv::AMDGPU_Gfx);

auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());

auto AddAttribute = [&](StringRef AttrName) {

if (Attributes.insert(AttrName).second)

Change = ChangeStatus::CHANGED;

};

// Check for Intrinsics and propagate attributes.

const AACallEdges &AAEdges = A.getAAFor<AACallEdges>(

*this, this->getIRPosition(), DepClassTy::REQUIRED);

// We have to assume that we can reach a function with these attributes.

// We do not consider inline assembly as a unknown callee.

if (CallingConvSupportsAllImplicits && AAEdges.hasNonAsmUnknownCallee()) {

for (StringRef AttrName : ImplicitAttrNames) {

AddAttribute(AttrName);

}

jdoerfertUnsubmitted

Not Done

I don't think the address space stuff is good. We should use a dedicated AA to track global uses or reuse AAMemoryLocations to ask what globals are accessed. The latter doesn't capture non-access uses though, so maybe the former is needed.

jdoerfert: I don't think the address space stuff is good. We should use a dedicated AA to track global…

bool NeedsQueuePtr = false;

bool HasCall = false;

for (Function *Callee : AAEdges.getOptimisticEdges()) {

Intrinsic::ID IID = Callee->getIntrinsicID();

if (IID != Intrinsic::not_intrinsic) {

if (!IsNonEntryFunc && IID == Intrinsic::amdgcn_kernarg_segment_ptr) {

AddAttribute("amdgpu-kernarg-segment-ptr");

continue;

}

bool NonKernelOnly = false;

StringRef AttrName =

intrinsicToAttrName(IID, NonKernelOnly, NeedsQueuePtr);

if (!AttrName.empty() && (IsNonEntryFunc || !NonKernelOnly))

AddAttribute(AttrName);

continue;

}

HasCall = true;

const AAAMDAttributes &AAAMD = A.getAAFor<AAAMDAttributes>(

*this, IRPosition::function(*Callee), DepClassTy::REQUIRED);

const DenseSet<StringRef> &CalleeAttributes = AAAMD.getAttributes();

// Propagate implicit attributes from called function.

for (StringRef AttrName : ImplicitAttrNames)

if (CalleeAttributes.count(AttrName))

AddAttribute(AttrName);

}

HasCall |= AAEdges.hasUnknownCallee();

if (!IsNonEntryFunc && HasCall)

AddAttribute("amdgpu-calls");

// Check the function body.

auto CheckAlloca = [&](Instruction &I) {

AddAttribute("amdgpu-stack-objects");

return false;

};

bool UsedAssumedInformation = false;

A.checkForAllInstructions(CheckAlloca, *this, {Instruction::Alloca},

UsedAssumedInformation);

// If we found that we need amdgpu-queue-ptr, nothing else to do.

if (NeedsQueuePtr || Attributes.count("amdgpu-queue-ptr")) {

AddAttribute("amdgpu-queue-ptr");

return Change;

}

auto CheckAddrSpaceCasts = [&](Instruction &I) {

unsigned SrcAS = static_cast<AddrSpaceCastInst &>(I).getSrcAddressSpace();

if (castRequiresQueuePtr(SrcAS)) {

NeedsQueuePtr = true;

return false;

}

return true;

};

bool HasApertureRegs = InfoCache.hasApertureRegs(*F);

// `checkForAllInstructions` is much more cheaper than going through all

// instructions, try it first.

// amdgpu-queue-ptr is not needed if aperture regs is present.

if (!HasApertureRegs)

A.checkForAllInstructions(CheckAddrSpaceCasts, *this,

{Instruction::AddrSpaceCast},

UsedAssumedInformation);

// If we found that we need amdgpu-queue-ptr, nothing else to do.

if (NeedsQueuePtr) {

AddAttribute("amdgpu-queue-ptr");

return Change;

}

if (!IsNonEntryFunc && HasApertureRegs)

return Change;

for (BasicBlock &BB : *F) {

for (Instruction &I : BB) {

for (const Use &U : I.operands()) {

if (const auto *C = dyn_cast<Constant>(U)) {

if (InfoCache.needsQueuePtr(C, *F)) {

AddAttribute("amdgpu-queue-ptr");

return Change;

}

return Change;

}

ChangeStatus manifest(Attributor &A) override {

SmallVector<Attribute, 8> AttrList;

LLVMContext &Ctx = getAssociatedFunction()->getContext();

for (StringRef AttrName : Attributes)

AttrList.push_back(Attribute::get(Ctx, AttrName));

return IRAttributeManifest::manifestAttrs(A, getIRPosition(), AttrList,

/* ForceReplace */ true);

}

const std::string getAsStr() const override {

return "AMDInfo[" + std::to_string(Attributes.size()) + "]";

}

const DenseSet<StringRef> &getAttributes() const override {

return Attributes;

}

/// See AbstractAttribute::trackStatistics()

void trackStatistics() const override {}

private:

DenseSet<StringRef> Attributes;

};

AAAMDAttributes &AAAMDAttributes::createForPosition(const IRPosition &IRP,

Attributor &A) {

if (IRP.getPositionKind() == IRPosition::IRP_FUNCTION)

return *new (A.Allocator) AAAMDAttributesFunction(IRP, A);

llvm_unreachable("AAAMDAttributes is only valid for function position");

}

class AMDGPUAttributor : public ModulePass {

public:

AMDGPUAttributor() : ModulePass(ID) {}

/// doInitialization - Virtual method overridden by subclasses to do

/// any necessary initialization before any pass is run.

bool doInitialization(Module &) override {

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();

if (!TPC)

report_fatal_error("TargetMachine is required");

TM = &TPC->getTM<TargetMachine>();

return false;

}

bool runOnModule(Module &M) override {

SetVector<Function *> Functions;

AnalysisGetter AG;

for (Function &F : M)

Functions.insert(&F);

CallGraphUpdater CGUpdater;

BumpPtrAllocator Allocator;

AMDGPUInformationCache InfoCache(M, AG, Allocator, nullptr, *TM);

Attributor A(Functions, InfoCache, CGUpdater);

for (Function &F : M) {

A.getOrCreateAAFor<AAAMDAttributes>(IRPosition::function(F));

A.getOrCreateAAFor<AAAMDWorkGroupSize>(IRPosition::function(F));

}

ChangeStatus Change = A.run();

return Change == ChangeStatus::CHANGED;

}

StringRef getPassName() const override { return "AMDGPU Attributor"; }

TargetMachine *TM;

static char ID;

};

char AMDGPUAttributor::ID = 0;

Pass *llvm::createAMDGPUAttributorPass() { return new AMDGPUAttributor(); }

INITIALIZE_PASS(AMDGPUAttributor, DEBUG_TYPE, "AMDGPU Attributor", false, false)

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIPeepholeSDWAPass(*PR);		initializeSIPeepholeSDWAPass(*PR);
initializeSIShrinkInstructionsPass(*PR);		initializeSIShrinkInstructionsPass(*PR);
initializeSIOptimizeExecMaskingPreRAPass(*PR);		initializeSIOptimizeExecMaskingPreRAPass(*PR);
initializeSIOptimizeVGPRLiveRangePass(*PR);		initializeSIOptimizeVGPRLiveRangePass(*PR);
initializeSILoadStoreOptimizerPass(*PR);		initializeSILoadStoreOptimizerPass(*PR);
initializeAMDGPUFixFunctionBitcastsPass(*PR);		initializeAMDGPUFixFunctionBitcastsPass(*PR);
initializeAMDGPUAlwaysInlinePass(*PR);		initializeAMDGPUAlwaysInlinePass(*PR);
		initializeAMDGPUAttributorPass(*PR);
initializeAMDGPUAnnotateKernelFeaturesPass(*PR);		initializeAMDGPUAnnotateKernelFeaturesPass(*PR);
initializeAMDGPUAnnotateUniformValuesPass(*PR);		initializeAMDGPUAnnotateUniformValuesPass(*PR);
initializeAMDGPUArgumentUsageInfoPass(*PR);		initializeAMDGPUArgumentUsageInfoPass(*PR);
initializeAMDGPUAtomicOptimizerPass(*PR);		initializeAMDGPUAtomicOptimizerPass(*PR);
initializeAMDGPULowerKernelArgumentsPass(*PR);		initializeAMDGPULowerKernelArgumentsPass(*PR);
initializeAMDGPULowerKernelAttributesPass(*PR);		initializeAMDGPULowerKernelAttributesPass(*PR);
initializeAMDGPULowerIntrinsicsPass(*PR);		initializeAMDGPULowerIntrinsicsPass(*PR);
initializeAMDGPUOpenCLEnqueuedBlockLoweringPass(*PR);		initializeAMDGPUOpenCLEnqueuedBlockLoweringPass(*PR);
▲ Show 20 Lines • Show All 1,276 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

	Show All 38 Lines
	set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)			set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)
	tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)			tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)
	add_public_tablegen_target(InstCombineTableGen)			add_public_tablegen_target(InstCombineTableGen)

	add_llvm_target(AMDGPUCodeGen			add_llvm_target(AMDGPUCodeGen
	AMDGPUAliasAnalysis.cpp			AMDGPUAliasAnalysis.cpp
	AMDGPUAlwaysInlinePass.cpp			AMDGPUAlwaysInlinePass.cpp
	AMDGPUAnnotateKernelFeatures.cpp			AMDGPUAnnotateKernelFeatures.cpp
				AMDGPUAttributor.cpp
	AMDGPUAnnotateUniformValues.cpp			AMDGPUAnnotateUniformValues.cpp
	AMDGPUArgumentUsageInfo.cpp			AMDGPUArgumentUsageInfo.cpp
	AMDGPUAsmPrinter.cpp			AMDGPUAsmPrinter.cpp
	AMDGPUAtomicOptimizer.cpp			AMDGPUAtomicOptimizer.cpp
	AMDGPUCallLowering.cpp			AMDGPUCallLowering.cpp
	AMDGPUCodeGenPrepare.cpp			AMDGPUCodeGenPrepare.cpp
	AMDGPUExportClustering.cpp			AMDGPUExportClustering.cpp
	AMDGPUFixFunctionBitcasts.cpp			AMDGPUFixFunctionBitcasts.cpp
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/Attributor.cpp

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	static bool isEqualOrWorse(const Attribute &New, const Attribute &Old) {

return Old.getValueAsInt() >= New.getValueAsInt();		return Old.getValueAsInt() >= New.getValueAsInt();
}		}

/// Return true if the information provided by \p Attr was added to the		/// Return true if the information provided by \p Attr was added to the
/// attribute list \p Attrs. This is only the case if it was not already present		/// attribute list \p Attrs. This is only the case if it was not already present
/// in \p Attrs at the position describe by \p PK and \p AttrIdx.		/// in \p Attrs at the position describe by \p PK and \p AttrIdx.
static bool addIfNotExistent(LLVMContext &Ctx, const Attribute &Attr,		static bool addIfNotExistent(LLVMContext &Ctx, const Attribute &Attr,
AttributeList &Attrs, int AttrIdx) {		AttributeList &Attrs, int AttrIdx,
		bool ForceReplace = false) {

if (Attr.isEnumAttribute()) {		if (Attr.isEnumAttribute()) {
Attribute::AttrKind Kind = Attr.getKindAsEnum();		Attribute::AttrKind Kind = Attr.getKindAsEnum();
if (Attrs.hasAttribute(AttrIdx, Kind))		if (Attrs.hasAttribute(AttrIdx, Kind))
if (isEqualOrWorse(Attr, Attrs.getAttribute(AttrIdx, Kind)))		if (!ForceReplace &&
		isEqualOrWorse(Attr, Attrs.getAttribute(AttrIdx, Kind)))
return false;		return false;
Attrs = Attrs.addAttribute(Ctx, AttrIdx, Attr);		Attrs = Attrs.addAttribute(Ctx, AttrIdx, Attr);
return true;		return true;
}		}
if (Attr.isStringAttribute()) {		if (Attr.isStringAttribute()) {
StringRef Kind = Attr.getKindAsString();		StringRef Kind = Attr.getKindAsString();
if (Attrs.hasAttribute(AttrIdx, Kind))		if (Attrs.hasAttribute(AttrIdx, Kind))
if (isEqualOrWorse(Attr, Attrs.getAttribute(AttrIdx, Kind)))		if (!ForceReplace &&
		isEqualOrWorse(Attr, Attrs.getAttribute(AttrIdx, Kind)))
return false;		return false;
Attrs = Attrs.addAttribute(Ctx, AttrIdx, Attr);		Attrs = Attrs.addAttribute(Ctx, AttrIdx, Attr);
return true;		return true;
}		}
if (Attr.isIntAttribute()) {		if (Attr.isIntAttribute()) {
Attribute::AttrKind Kind = Attr.getKindAsEnum();		Attribute::AttrKind Kind = Attr.getKindAsEnum();
if (Attrs.hasAttribute(AttrIdx, Kind))		if (Attrs.hasAttribute(AttrIdx, Kind))
if (isEqualOrWorse(Attr, Attrs.getAttribute(AttrIdx, Kind)))		if (!ForceReplace &&
		isEqualOrWorse(Attr, Attrs.getAttribute(AttrIdx, Kind)))
return false;		return false;
Attrs = Attrs.removeAttribute(Ctx, AttrIdx, Kind);		Attrs = Attrs.removeAttribute(Ctx, AttrIdx, Kind);
Attrs = Attrs.addAttribute(Ctx, AttrIdx, Attr);		Attrs = Attrs.addAttribute(Ctx, AttrIdx, Attr);
return true;		return true;
}		}

llvm_unreachable("Expected enum or string attribute!");		llvm_unreachable("Expected enum or string attribute!");
}		}
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	ChangeStatus AbstractAttribute::update(Attributor &A) {
LLVM_DEBUG(dbgs() << "[Attributor] Update " << HasChanged << " " << *this		LLVM_DEBUG(dbgs() << "[Attributor] Update " << HasChanged << " " << *this
<< "\n");		<< "\n");

return HasChanged;		return HasChanged;
}		}

ChangeStatus		ChangeStatus
IRAttributeManifest::manifestAttrs(Attributor &A, const IRPosition &IRP,		IRAttributeManifest::manifestAttrs(Attributor &A, const IRPosition &IRP,
const ArrayRef<Attribute> &DeducedAttrs) {		const ArrayRef<Attribute> &DeducedAttrs,
		bool ForceReplace) {
Function *ScopeFn = IRP.getAnchorScope();		Function *ScopeFn = IRP.getAnchorScope();
IRPosition::Kind PK = IRP.getPositionKind();		IRPosition::Kind PK = IRP.getPositionKind();

// In the following some generic code that will manifest attributes in		// In the following some generic code that will manifest attributes in
// DeducedAttrs if they improve the current IR. Due to the different		// DeducedAttrs if they improve the current IR. Due to the different
// annotation positions we use the underlying AttributeList interface.		// annotation positions we use the underlying AttributeList interface.

AttributeList Attrs;		AttributeList Attrs;
Show All 11 Lines	IRAttributeManifest::manifestAttrs(Attributor &A, const IRPosition &IRP,
case IRPosition::IRP_CALL_SITE_ARGUMENT:		case IRPosition::IRP_CALL_SITE_ARGUMENT:
Attrs = cast<CallBase>(IRP.getAnchorValue()).getAttributes();		Attrs = cast<CallBase>(IRP.getAnchorValue()).getAttributes();
break;		break;
}		}

ChangeStatus HasChanged = ChangeStatus::UNCHANGED;		ChangeStatus HasChanged = ChangeStatus::UNCHANGED;
LLVMContext &Ctx = IRP.getAnchorValue().getContext();		LLVMContext &Ctx = IRP.getAnchorValue().getContext();
for (const Attribute &Attr : DeducedAttrs) {		for (const Attribute &Attr : DeducedAttrs) {
if (!addIfNotExistent(Ctx, Attr, Attrs, IRP.getAttrIdx()))		if (!addIfNotExistent(Ctx, Attr, Attrs, IRP.getAttrIdx(), ForceReplace))
continue;		continue;

HasChanged = ChangeStatus::CHANGED;		HasChanged = ChangeStatus::CHANGED;
}		}

if (HasChanged == ChangeStatus::UNCHANGED)		if (HasChanged == ChangeStatus::UNCHANGED)
return HasChanged;		return HasChanged;

▲ Show 20 Lines • Show All 454 Lines • ▼ Show 20 Lines	if (IsDeadAA->isAssumedDead()) {
if (!IsDeadAA->isKnownDead())		if (!IsDeadAA->isKnownDead())
UsedAssumedInformation = true;		UsedAssumedInformation = true;
return true;		return true;
}		}

return false;		return false;
}		}

		bool Attributor::isAssumedDead(const BasicBlock &BB,
		const AbstractAttribute *QueryingAA,
		const AAIsDead *FnLivenessAA,
		DepClassTy DepClass) {
		if (!FnLivenessAA)
		FnLivenessAA = lookupAAFor<AAIsDead>(IRPosition::function(*BB.getParent()),
		QueryingAA, DepClassTy::NONE);
		if (FnLivenessAA->isAssumedDead(&BB)) {
		if (QueryingAA)
		recordDependence(FnLivenessAA, QueryingAA, DepClass);
		return true;
		}

		return false;
		}

bool Attributor::checkForAllUses(function_ref<bool(const Use &, bool &)> Pred,		bool Attributor::checkForAllUses(function_ref<bool(const Use &, bool &)> Pred,
const AbstractAttribute &QueryingAA,		const AbstractAttribute &QueryingAA,
const Value &V, bool CheckBBLivenessOnly,		const Value &V, bool CheckBBLivenessOnly,
DepClassTy LivenessDepClass) {		DepClassTy LivenessDepClass) {

// Check the trivial case first as it catches void values.		// Check the trivial case first as it catches void values.
if (V.use_empty())		if (V.use_empty())
return true;		return true;
▲ Show 20 Lines • Show All 1,303 Lines • ▼ Show 20 Lines	for (Instruction &I : instructions(&F)) {
case Instruction::AtomicCmpXchg:		case Instruction::AtomicCmpXchg:
case Instruction::Br:		case Instruction::Br:
case Instruction::Resume:		case Instruction::Resume:
case Instruction::Ret:		case Instruction::Ret:
case Instruction::Load:		case Instruction::Load:
// The alignment of a pointer is interesting for loads.		// The alignment of a pointer is interesting for loads.
case Instruction::Store:		case Instruction::Store:
// The alignment of a pointer is interesting for stores.		// The alignment of a pointer is interesting for stores.
		case Instruction::Alloca:
		case Instruction::AddrSpaceCast:
IsInterestingOpcode = true;		IsInterestingOpcode = true;
}		}
if (IsInterestingOpcode) {		if (IsInterestingOpcode) {
auto *&Insts = FI.OpcodeInstMap[I.getOpcode()];		auto *&Insts = FI.OpcodeInstMap[I.getOpcode()];
if (!Insts)		if (!Insts)
Insts = new (Allocator) InstructionVectorTy();		Insts = new (Allocator) InstructionVectorTy();
Insts->push_back(&I);		Insts->push_back(&I);
}		}
▲ Show 20 Lines • Show All 693 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/AttributorAttributes.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
PIPE_OPERATOR(AAUndefinedBehavior)		PIPE_OPERATOR(AAUndefinedBehavior)
PIPE_OPERATOR(AAPotentialValues)		PIPE_OPERATOR(AAPotentialValues)
PIPE_OPERATOR(AANoUndef)		PIPE_OPERATOR(AANoUndef)
PIPE_OPERATOR(AACallEdges)		PIPE_OPERATOR(AACallEdges)
PIPE_OPERATOR(AAFunctionReachability)		PIPE_OPERATOR(AAFunctionReachability)
PIPE_OPERATOR(AAPointerInfo)		PIPE_OPERATOR(AAPointerInfo)

#undef PIPE_OPERATOR		#undef PIPE_OPERATOR

		template <>
		ChangeStatus clampStateAndIndicateChange<DerefState>(DerefState &S,
		const DerefState &R) {
		ChangeStatus CS0 =
		clampStateAndIndicateChange(S.DerefBytesState, R.DerefBytesState);
		ChangeStatus CS1 = clampStateAndIndicateChange(S.GlobalState, R.GlobalState);
		return CS0 \| CS1;
		}

} // namespace llvm		} // namespace llvm

/// Get pointer operand of memory accessing instruction. If \p I is		/// Get pointer operand of memory accessing instruction. If \p I is
/// not a memory accessing instruction, return nullptr. If \p AllowVolatile,		/// not a memory accessing instruction, return nullptr. If \p AllowVolatile,
/// is set to false and the instruction is volatile, return nullptr.		/// is set to false and the instruction is volatile, return nullptr.
static const Value getPointerOperand(const Instruction I,		static const Value getPointerOperand(const Instruction I,
bool AllowVolatile) {		bool AllowVolatile) {
if (!AllowVolatile && I->isVolatile())		if (!AllowVolatile && I->isVolatile())
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	getBasePointerOfAccessPointerOperand(const Instruction *I, int64_t &BytesOffset,
const Value Ptr = getPointerOperand(I, / AllowVolatile */ false);		const Value Ptr = getPointerOperand(I, / AllowVolatile */ false);
if (!Ptr)		if (!Ptr)
return nullptr;		return nullptr;

return GetPointerBaseWithConstantOffset(Ptr, BytesOffset, DL,		return GetPointerBaseWithConstantOffset(Ptr, BytesOffset, DL,
AllowNonInbounds);		AllowNonInbounds);
}		}

/// Helper function to clamp a state \p S of type \p StateType with the
/// information in \p R and indicate/return if \p S did change (as-in update is
/// required to be run again).
template <typename StateType>
ChangeStatus clampStateAndIndicateChange(StateType &S, const StateType &R) {
auto Assumed = S.getAssumed();
S ^= R;
return Assumed == S.getAssumed() ? ChangeStatus::UNCHANGED
: ChangeStatus::CHANGED;
}

/// Clamp the information known for all returned values of a function		/// Clamp the information known for all returned values of a function
/// (identified by \p QueryingAA) into \p S.		/// (identified by \p QueryingAA) into \p S.
template <typename AAType, typename StateType = typename AAType::StateType>		template <typename AAType, typename StateType = typename AAType::StateType>
static void clampReturnedValueStates(		static void clampReturnedValueStates(
Attributor &A, const AAType &QueryingAA, StateType &S,		Attributor &A, const AAType &QueryingAA, StateType &S,
const IRPosition::CallBaseContext *CBContext = nullptr) {		const IRPosition::CallBaseContext *CBContext = nullptr) {
LLVM_DEBUG(dbgs() << "[Attributor] Clamp return value states for "		LLVM_DEBUG(dbgs() << "[Attributor] Clamp return value states for "
<< QueryingAA << " into " << S << "\n");		<< QueryingAA << " into " << S << "\n");
▲ Show 20 Lines • Show All 3,467 Lines • ▼ Show 20 Lines	struct AAIsDeadCallSite final : AAIsDeadFunction {
}		}

/// See AbstractAttribute::trackStatistics()		/// See AbstractAttribute::trackStatistics()
void trackStatistics() const override {}		void trackStatistics() const override {}
};		};

/// -------------------- Dereferenceable Argument Attribute --------------------		/// -------------------- Dereferenceable Argument Attribute --------------------

template <>
ChangeStatus clampStateAndIndicateChange<DerefState>(DerefState &S,
const DerefState &R) {
ChangeStatus CS0 =
clampStateAndIndicateChange(S.DerefBytesState, R.DerefBytesState);
ChangeStatus CS1 = clampStateAndIndicateChange(S.GlobalState, R.GlobalState);
return CS0 \| CS1;
}

struct AADereferenceableImpl : AADereferenceable {		struct AADereferenceableImpl : AADereferenceable {
AADereferenceableImpl(const IRPosition &IRP, Attributor &A)		AADereferenceableImpl(const IRPosition &IRP, Attributor &A)
: AADereferenceable(IRP, A) {}		: AADereferenceable(IRP, A) {}
using StateType = DerefState;		using StateType = DerefState;

/// See AbstractAttribute::initialize(...).		/// See AbstractAttribute::initialize(...).
void initialize(Attributor &A) override {		void initialize(Attributor &A) override {
SmallVector<Attribute, 4> Attrs;		SmallVector<Attribute, 4> Attrs;
▲ Show 20 Lines • Show All 5,606 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefix=HSA %s			; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=HSA,AKF_HSA %s
				; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-attributor < %s \| FileCheck -check-prefixes=HSA,ATTRIBUTOR_HSA %s

	declare void @llvm.memcpy.p1i32.p4i32.i32(i32 addrspace(1)* nocapture, i32 addrspace(4)* nocapture, i32, i1) #0			declare void @llvm.memcpy.p1i32.p4i32.i32(i32 addrspace(1)* nocapture, i32 addrspace(4)* nocapture, i32, i1) #0

	@lds.i32 = unnamed_addr addrspace(3) global i32 undef, align 4			@lds.i32 = unnamed_addr addrspace(3) global i32 undef, align 4
	@lds.arr = unnamed_addr addrspace(3) global [256 x i32] undef, align 4			@lds.arr = unnamed_addr addrspace(3) global [256 x i32] undef, align 4

	@global.i32 = unnamed_addr addrspace(1) global i32 undef, align 4			@global.i32 = unnamed_addr addrspace(1) global i32 undef, align 4
	@global.arr = unnamed_addr addrspace(1) global [256 x i32] undef, align 4			@global.arr = unnamed_addr addrspace(1) global [256 x i32] undef, align 4
	▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; HSA-NEXT: ret i32 addrspace(3)* addrspacecast (i32 addrspace(4)* getelementptr ([256 x i32], [256 x i32] addrspace(4)* addrspacecast ([256 x i32] addrspace(3)* @lds.arr to [256 x i32] addrspace(4)), i64 0, i64 8) to i32 addrspace(3))			; HSA-NEXT: ret i32 addrspace(3)* addrspacecast (i32 addrspace(4)* getelementptr ([256 x i32], [256 x i32] addrspace(4)* addrspacecast ([256 x i32] addrspace(3)* @lds.arr to [256 x i32] addrspace(4)), i64 0, i64 8) to i32 addrspace(3))
	;			;
	ret i32 addrspace(3)* addrspacecast (i32 addrspace(4)* getelementptr ([256 x i32], [256 x i32] addrspace(4)* addrspacecast ([256 x i32] addrspace(3)* @lds.arr to [256 x i32] addrspace(4)), i64 0, i64 8) to i32 addrspace(3))			ret i32 addrspace(3)* addrspacecast (i32 addrspace(4)* getelementptr ([256 x i32], [256 x i32] addrspace(4)* addrspacecast ([256 x i32] addrspace(3)* @lds.arr to [256 x i32] addrspace(4)), i64 0, i64 8) to i32 addrspace(3))
	}			}

	attributes #0 = { argmemonly nounwind }			attributes #0 = { argmemonly nounwind }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
	;.			;.
	; HSA: attributes #[[ATTR0:[0-9]+]] = { argmemonly nofree nounwind willreturn }			; AKF_HSA: attributes #[[ATTR0:[0-9]+]] = { argmemonly nofree nounwind willreturn }
	; HSA: attributes #[[ATTR1]] = { nounwind }			; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
	; HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-queue-ptr" }			; AKF_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-queue-ptr" }
				;.
				; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { argmemonly nofree nounwind willreturn "uniform-work-group-size"="false" }
				; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "uniform-work-group-size"="false" }
				; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-queue-ptr" "uniform-work-group-size"="false" }
	;.			;.

llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefix=HSA %s		; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=HSA,AKF_HSA %s
		; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-attributor < %s \| FileCheck -check-prefixes=HSA,ATTRIBUTOR_HSA %s

declare i32 @llvm.amdgcn.workgroup.id.x() #0		declare i32 @llvm.amdgcn.workgroup.id.x() #0
declare i32 @llvm.amdgcn.workgroup.id.y() #0		declare i32 @llvm.amdgcn.workgroup.id.y() #0
declare i32 @llvm.amdgcn.workgroup.id.z() #0		declare i32 @llvm.amdgcn.workgroup.id.z() #0

declare i32 @llvm.amdgcn.workitem.id.x() #0		declare i32 @llvm.amdgcn.workitem.id.x() #0
declare i32 @llvm.amdgcn.workitem.id.y() #0		declare i32 @llvm.amdgcn.workitem.id.y() #0
declare i32 @llvm.amdgcn.workitem.id.z() #0		declare i32 @llvm.amdgcn.workitem.id.z() #0
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	;
%val0 = call i32 @llvm.amdgcn.workgroup.id.y()		%val0 = call i32 @llvm.amdgcn.workgroup.id.y()
%val1 = call i32 @llvm.amdgcn.workgroup.id.z()		%val1 = call i32 @llvm.amdgcn.workgroup.id.z()
store volatile i32 %val0, i32 addrspace(1)* undef		store volatile i32 %val0, i32 addrspace(1)* undef
store volatile i32 %val1, i32 addrspace(1)* undef		store volatile i32 %val1, i32 addrspace(1)* undef
ret void		ret void
}		}

define void @func_indirect_use_workitem_id_x() #1 {		define void @func_indirect_use_workitem_id_x() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_x		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_x
; HSA-SAME: () #[[ATTR1]] {		; AKF_HSA-SAME: () #[[ATTR1]] {
; HSA-NEXT: call void @use_workitem_id_x()		; AKF_HSA-NEXT: call void @use_workitem_id_x()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_x
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR1]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workitem_id_x() #[[ATTR22:[0-9]+]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workitem_id_x()		call void @use_workitem_id_x()
ret void		ret void
}		}

define void @kernel_indirect_use_workitem_id_x() #1 {		define void @kernel_indirect_use_workitem_id_x() #1 {
; HSA-LABEL: define {{[^@]+}}@kernel_indirect_use_workitem_id_x		; AKF_HSA-LABEL: define {{[^@]+}}@kernel_indirect_use_workitem_id_x
; HSA-SAME: () #[[ATTR1]] {		; AKF_HSA-SAME: () #[[ATTR1]] {
; HSA-NEXT: call void @use_workitem_id_x()		; AKF_HSA-NEXT: call void @use_workitem_id_x()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kernel_indirect_use_workitem_id_x
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR1]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workitem_id_x() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workitem_id_x()		call void @use_workitem_id_x()
ret void		ret void
}		}

define void @func_indirect_use_workitem_id_y() #1 {		define void @func_indirect_use_workitem_id_y() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_y		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_y
; HSA-SAME: () #[[ATTR2]] {		; AKF_HSA-SAME: () #[[ATTR2]] {
; HSA-NEXT: call void @use_workitem_id_y()		; AKF_HSA-NEXT: call void @use_workitem_id_y()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_y
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR2]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workitem_id_y() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workitem_id_y()		call void @use_workitem_id_y()
ret void		ret void
}		}

define void @func_indirect_use_workitem_id_z() #1 {		define void @func_indirect_use_workitem_id_z() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_z		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_z
; HSA-SAME: () #[[ATTR3]] {		; AKF_HSA-SAME: () #[[ATTR3]] {
; HSA-NEXT: call void @use_workitem_id_z()		; AKF_HSA-NEXT: call void @use_workitem_id_z()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workitem_id_z
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR3]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workitem_id_z() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workitem_id_z()		call void @use_workitem_id_z()
ret void		ret void
}		}

define void @func_indirect_use_workgroup_id_x() #1 {		define void @func_indirect_use_workgroup_id_x() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_x		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_x
; HSA-SAME: () #[[ATTR4]] {		; AKF_HSA-SAME: () #[[ATTR4]] {
; HSA-NEXT: call void @use_workgroup_id_x()		; AKF_HSA-NEXT: call void @use_workgroup_id_x()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_x
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR4]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workgroup_id_x() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workgroup_id_x()		call void @use_workgroup_id_x()
ret void		ret void
}		}

define void @kernel_indirect_use_workgroup_id_x() #1 {		define void @kernel_indirect_use_workgroup_id_x() #1 {
; HSA-LABEL: define {{[^@]+}}@kernel_indirect_use_workgroup_id_x		; AKF_HSA-LABEL: define {{[^@]+}}@kernel_indirect_use_workgroup_id_x
; HSA-SAME: () #[[ATTR4]] {		; AKF_HSA-SAME: () #[[ATTR4]] {
; HSA-NEXT: call void @use_workgroup_id_x()		; AKF_HSA-NEXT: call void @use_workgroup_id_x()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kernel_indirect_use_workgroup_id_x
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR4]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workgroup_id_x() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workgroup_id_x()		call void @use_workgroup_id_x()
ret void		ret void
}		}

define void @func_indirect_use_workgroup_id_y() #1 {		define void @func_indirect_use_workgroup_id_y() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_y		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_y
; HSA-SAME: () #[[ATTR5]] {		; AKF_HSA-SAME: () #[[ATTR5]] {
; HSA-NEXT: call void @use_workgroup_id_y()		; AKF_HSA-NEXT: call void @use_workgroup_id_y()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_y
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR5]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workgroup_id_y() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workgroup_id_y()		call void @use_workgroup_id_y()
ret void		ret void
}		}

define void @func_indirect_use_workgroup_id_z() #1 {		define void @func_indirect_use_workgroup_id_z() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_z		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_z
; HSA-SAME: () #[[ATTR6]] {		; AKF_HSA-SAME: () #[[ATTR6]] {
; HSA-NEXT: call void @use_workgroup_id_z()		; AKF_HSA-NEXT: call void @use_workgroup_id_z()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_z
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR6]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_workgroup_id_z() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_workgroup_id_z()		call void @use_workgroup_id_z()
ret void		ret void
}		}

define void @func_indirect_indirect_use_workgroup_id_y() #1 {		define void @func_indirect_indirect_use_workgroup_id_y() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_indirect_use_workgroup_id_y		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_indirect_use_workgroup_id_y
; HSA-SAME: () #[[ATTR5]] {		; AKF_HSA-SAME: () #[[ATTR5]] {
; HSA-NEXT: call void @func_indirect_use_workgroup_id_y()		; AKF_HSA-NEXT: call void @func_indirect_use_workgroup_id_y()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_indirect_use_workgroup_id_y
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR5]] {
		; ATTRIBUTOR_HSA-NEXT: call void @func_indirect_use_workgroup_id_y() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @func_indirect_use_workgroup_id_y()		call void @func_indirect_use_workgroup_id_y()
ret void		ret void
}		}

define void @indirect_x2_use_workgroup_id_y() #1 {		define void @indirect_x2_use_workgroup_id_y() #1 {
; HSA-LABEL: define {{[^@]+}}@indirect_x2_use_workgroup_id_y		; AKF_HSA-LABEL: define {{[^@]+}}@indirect_x2_use_workgroup_id_y
; HSA-SAME: () #[[ATTR5]] {		; AKF_HSA-SAME: () #[[ATTR5]] {
; HSA-NEXT: call void @func_indirect_indirect_use_workgroup_id_y()		; AKF_HSA-NEXT: call void @func_indirect_indirect_use_workgroup_id_y()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@indirect_x2_use_workgroup_id_y
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR5]] {
		; ATTRIBUTOR_HSA-NEXT: call void @func_indirect_indirect_use_workgroup_id_y() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @func_indirect_indirect_use_workgroup_id_y()		call void @func_indirect_indirect_use_workgroup_id_y()
ret void		ret void
}		}

define void @func_indirect_use_dispatch_ptr() #1 {		define void @func_indirect_use_dispatch_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_ptr
; HSA-SAME: () #[[ATTR7]] {		; AKF_HSA-SAME: () #[[ATTR7]] {
; HSA-NEXT: call void @use_dispatch_ptr()		; AKF_HSA-NEXT: call void @use_dispatch_ptr()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR7]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_dispatch_ptr() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_dispatch_ptr()		call void @use_dispatch_ptr()
ret void		ret void
}		}

define void @func_indirect_use_queue_ptr() #1 {		define void @func_indirect_use_queue_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_queue_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_queue_ptr
; HSA-SAME: () #[[ATTR8]] {		; AKF_HSA-SAME: () #[[ATTR8]] {
; HSA-NEXT: call void @use_queue_ptr()		; AKF_HSA-NEXT: call void @use_queue_ptr()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_queue_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR8]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_queue_ptr() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_queue_ptr()		call void @use_queue_ptr()
ret void		ret void
}		}

define void @func_indirect_use_dispatch_id() #1 {		define void @func_indirect_use_dispatch_id() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_id		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_id
; HSA-SAME: () #[[ATTR9]] {		; AKF_HSA-SAME: () #[[ATTR9]] {
; HSA-NEXT: call void @use_dispatch_id()		; AKF_HSA-NEXT: call void @use_dispatch_id()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_id
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR9]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_dispatch_id() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_dispatch_id()		call void @use_dispatch_id()
ret void		ret void
}		}

define void @func_indirect_use_workgroup_id_y_workgroup_id_z() #1 {		define void @func_indirect_use_workgroup_id_y_workgroup_id_z() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_y_workgroup_id_z		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_y_workgroup_id_z
; HSA-SAME: () #[[ATTR11:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR11:[0-9]+]] {
; HSA-NEXT: call void @func_indirect_use_workgroup_id_y_workgroup_id_z()		; AKF_HSA-NEXT: call void @func_indirect_use_workgroup_id_y_workgroup_id_z()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_workgroup_id_y_workgroup_id_z
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR11:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: unreachable
;		;
call void @func_indirect_use_workgroup_id_y_workgroup_id_z()		call void @func_indirect_use_workgroup_id_y_workgroup_id_z()
ret void		ret void
}		}

define void @recursive_use_workitem_id_y() #1 {		define void @recursive_use_workitem_id_y() #1 {
; HSA-LABEL: define {{[^@]+}}@recursive_use_workitem_id_y		; AKF_HSA-LABEL: define {{[^@]+}}@recursive_use_workitem_id_y
; HSA-SAME: () #[[ATTR2]] {		; AKF_HSA-SAME: () #[[ATTR2]] {
; HSA-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()		; AKF_HSA-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()
; HSA-NEXT: store volatile i32 [[VAL]], i32 addrspace(1)* undef, align 4		; AKF_HSA-NEXT: store volatile i32 [[VAL]], i32 addrspace(1)* undef, align 4
; HSA-NEXT: call void @recursive_use_workitem_id_y()		; AKF_HSA-NEXT: call void @recursive_use_workitem_id_y()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@recursive_use_workitem_id_y
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR12:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: [[VAL:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()
		; ATTRIBUTOR_HSA-NEXT: store volatile i32 [[VAL]], i32 addrspace(1)* undef, align 4
		; ATTRIBUTOR_HSA-NEXT: call void @recursive_use_workitem_id_y() #[[ATTR23:[0-9]+]]
		; ATTRIBUTOR_HSA-NEXT: unreachable
;		;
%val = call i32 @llvm.amdgcn.workitem.id.y()		%val = call i32 @llvm.amdgcn.workitem.id.y()
store volatile i32 %val, i32 addrspace(1)* undef		store volatile i32 %val, i32 addrspace(1)* undef
call void @recursive_use_workitem_id_y()		call void @recursive_use_workitem_id_y()
ret void		ret void
}		}

define void @call_recursive_use_workitem_id_y() #1 {		define void @call_recursive_use_workitem_id_y() #1 {
; HSA-LABEL: define {{[^@]+}}@call_recursive_use_workitem_id_y		; AKF_HSA-LABEL: define {{[^@]+}}@call_recursive_use_workitem_id_y
; HSA-SAME: () #[[ATTR2]] {		; AKF_HSA-SAME: () #[[ATTR2]] {
; HSA-NEXT: call void @recursive_use_workitem_id_y()		; AKF_HSA-NEXT: call void @recursive_use_workitem_id_y()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@call_recursive_use_workitem_id_y
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR2]] {
		; ATTRIBUTOR_HSA-NEXT: call void @recursive_use_workitem_id_y() #[[ATTR23]]
		; ATTRIBUTOR_HSA-NEXT: unreachable
;		;
call void @recursive_use_workitem_id_y()		call void @recursive_use_workitem_id_y()
ret void		ret void
}		}

define void @use_group_to_flat_addrspacecast(i32 addrspace(3)* %ptr) #1 {		define void @use_group_to_flat_addrspacecast(i32 addrspace(3)* %ptr) #1 {
; HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast		; HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast
; HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR8]] {		; HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR8]] {
; HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*		; HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*
; HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4		; HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4
; HSA-NEXT: ret void		; HSA-NEXT: ret void
;		;
%stof = addrspacecast i32 addrspace(3)* %ptr to i32 addrspace(4)*		%stof = addrspacecast i32 addrspace(3)* %ptr to i32 addrspace(4)*
store volatile i32 0, i32 addrspace(4)* %stof		store volatile i32 0, i32 addrspace(4)* %stof
ret void		ret void
}		}


define void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* %ptr) #2 {		define void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* %ptr) #2 {
; HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast_gfx9		; AKF_HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast_gfx9
; HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR12:[0-9]+]] {		; AKF_HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR12:[0-9]+]] {
; HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*		; AKF_HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*
; HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4		; AKF_HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast_gfx9
		; ATTRIBUTOR_HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR13:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*
		; ATTRIBUTOR_HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
%stof = addrspacecast i32 addrspace(3)* %ptr to i32 addrspace(4)*		%stof = addrspacecast i32 addrspace(3)* %ptr to i32 addrspace(4)*
store volatile i32 0, i32 addrspace(4)* %stof		store volatile i32 0, i32 addrspace(4)* %stof
ret void		ret void
}		}

define void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* %ptr) #2 {		define void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* %ptr) #2 {
; HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast_queue_ptr_gfx9		; AKF_HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast_queue_ptr_gfx9
; HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR13:[0-9]+]] {		; AKF_HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR13:[0-9]+]] {
; HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*		; AKF_HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*
; HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4		; AKF_HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4
; HSA-NEXT: call void @func_indirect_use_queue_ptr()		; AKF_HSA-NEXT: call void @func_indirect_use_queue_ptr()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_group_to_flat_addrspacecast_queue_ptr_gfx9
		; ATTRIBUTOR_HSA-SAME: (i32 addrspace(3)* [[PTR:%.*]]) #[[ATTR14:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: [[STOF:%.]] = addrspacecast i32 addrspace(3) [[PTR]] to i32 addrspace(4)*
		; ATTRIBUTOR_HSA-NEXT: store volatile i32 0, i32 addrspace(4)* [[STOF]], align 4
		; ATTRIBUTOR_HSA-NEXT: call void @func_indirect_use_queue_ptr() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
%stof = addrspacecast i32 addrspace(3)* %ptr to i32 addrspace(4)*		%stof = addrspacecast i32 addrspace(3)* %ptr to i32 addrspace(4)*
store volatile i32 0, i32 addrspace(4)* %stof		store volatile i32 0, i32 addrspace(4)* %stof
call void @func_indirect_use_queue_ptr()		call void @func_indirect_use_queue_ptr()
ret void		ret void
}		}

define void @indirect_use_group_to_flat_addrspacecast() #1 {		define void @indirect_use_group_to_flat_addrspacecast() #1 {
; HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast		; AKF_HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast
; HSA-SAME: () #[[ATTR8]] {		; AKF_HSA-SAME: () #[[ATTR8]] {
; HSA-NEXT: call void @use_group_to_flat_addrspacecast(i32 addrspace(3)* null)		; AKF_HSA-NEXT: call void @use_group_to_flat_addrspacecast(i32 addrspace(3)* null)
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR8]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_group_to_flat_addrspacecast(i32 addrspace(3)* null) #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_group_to_flat_addrspacecast(i32 addrspace(3)* null)		call void @use_group_to_flat_addrspacecast(i32 addrspace(3)* null)
ret void		ret void
}		}

define void @indirect_use_group_to_flat_addrspacecast_gfx9() #1 {		define void @indirect_use_group_to_flat_addrspacecast_gfx9() #1 {
; HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast_gfx9		; AKF_HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast_gfx9
; HSA-SAME: () #[[ATTR11]] {		; AKF_HSA-SAME: () #[[ATTR11]] {
; HSA-NEXT: call void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* null)		; AKF_HSA-NEXT: call void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* null)
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast_gfx9
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR15:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* null) #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* null)		call void @use_group_to_flat_addrspacecast_gfx9(i32 addrspace(3)* null)
ret void		ret void
}		}

define void @indirect_use_group_to_flat_addrspacecast_queue_ptr_gfx9() #1 {		define void @indirect_use_group_to_flat_addrspacecast_queue_ptr_gfx9() #1 {
; HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast_queue_ptr_gfx9		; AKF_HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast_queue_ptr_gfx9
; HSA-SAME: () #[[ATTR8]] {		; AKF_HSA-SAME: () #[[ATTR8]] {
; HSA-NEXT: call void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* null)		; AKF_HSA-NEXT: call void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* null)
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@indirect_use_group_to_flat_addrspacecast_queue_ptr_gfx9
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR8]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* null) #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* null)		call void @use_group_to_flat_addrspacecast_queue_ptr_gfx9(i32 addrspace(3)* null)
ret void		ret void
}		}

define void @use_kernarg_segment_ptr() #1 {		define void @use_kernarg_segment_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@use_kernarg_segment_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@use_kernarg_segment_ptr
; HSA-SAME: () #[[ATTR14:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR14:[0-9]+]] {
; HSA-NEXT: [[KERNARG_SEGMENT_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()		; AKF_HSA-NEXT: [[KERNARG_SEGMENT_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; HSA-NEXT: store volatile i8 addrspace(4)* [[KERNARG_SEGMENT_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8		; AKF_HSA-NEXT: store volatile i8 addrspace(4)* [[KERNARG_SEGMENT_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_kernarg_segment_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR16:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: [[KERNARG_SEGMENT_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
		; ATTRIBUTOR_HSA-NEXT: store volatile i8 addrspace(4)* [[KERNARG_SEGMENT_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
%kernarg.segment.ptr = call i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()		%kernarg.segment.ptr = call i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()
store volatile i8 addrspace(4)* %kernarg.segment.ptr, i8 addrspace(4)* addrspace(1)* undef		store volatile i8 addrspace(4)* %kernarg.segment.ptr, i8 addrspace(4)* addrspace(1)* undef
ret void		ret void
}		}

define void @func_indirect_use_kernarg_segment_ptr() #1 {		define void @func_indirect_use_kernarg_segment_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_kernarg_segment_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_kernarg_segment_ptr
; HSA-SAME: () #[[ATTR11]] {		; AKF_HSA-SAME: () #[[ATTR11]] {
; HSA-NEXT: call void @use_kernarg_segment_ptr()		; AKF_HSA-NEXT: call void @use_kernarg_segment_ptr()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_kernarg_segment_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR15]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_kernarg_segment_ptr() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_kernarg_segment_ptr()		call void @use_kernarg_segment_ptr()
ret void		ret void
}		}

define amdgpu_kernel void @kern_use_implicitarg_ptr() #1 {		define amdgpu_kernel void @kern_use_implicitarg_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@kern_use_implicitarg_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@kern_use_implicitarg_ptr
; HSA-SAME: () #[[ATTR15:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR15:[0-9]+]] {
; HSA-NEXT: [[IMPLICITARG_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.implicitarg.ptr()		; AKF_HSA-NEXT: [[IMPLICITARG_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.implicitarg.ptr()
; HSA-NEXT: store volatile i8 addrspace(4)* [[IMPLICITARG_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8		; AKF_HSA-NEXT: store volatile i8 addrspace(4)* [[IMPLICITARG_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_use_implicitarg_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: [[IMPLICITARG_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.implicitarg.ptr()
		; ATTRIBUTOR_HSA-NEXT: store volatile i8 addrspace(4)* [[IMPLICITARG_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()		%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef		store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef
ret void		ret void
}		}

define void @use_implicitarg_ptr() #1 {		define void @use_implicitarg_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@use_implicitarg_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@use_implicitarg_ptr
; HSA-SAME: () #[[ATTR16:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR16:[0-9]+]] {
; HSA-NEXT: [[IMPLICITARG_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.implicitarg.ptr()		; AKF_HSA-NEXT: [[IMPLICITARG_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.implicitarg.ptr()
; HSA-NEXT: store volatile i8 addrspace(4)* [[IMPLICITARG_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8		; AKF_HSA-NEXT: store volatile i8 addrspace(4)* [[IMPLICITARG_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_implicitarg_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR17]] {
		; ATTRIBUTOR_HSA-NEXT: [[IMPLICITARG_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.implicitarg.ptr()
		; ATTRIBUTOR_HSA-NEXT: store volatile i8 addrspace(4)* [[IMPLICITARG_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()		%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef		store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef
ret void		ret void
}		}

define void @func_indirect_use_implicitarg_ptr() #1 {		define void @func_indirect_use_implicitarg_ptr() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_implicitarg_ptr		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_implicitarg_ptr
; HSA-SAME: () #[[ATTR16]] {		; AKF_HSA-SAME: () #[[ATTR16]] {
; HSA-NEXT: call void @use_implicitarg_ptr()		; AKF_HSA-NEXT: call void @use_implicitarg_ptr()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_implicitarg_ptr
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR17]] {
		; ATTRIBUTOR_HSA-NEXT: call void @use_implicitarg_ptr() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @use_implicitarg_ptr()		call void @use_implicitarg_ptr()
ret void		ret void
}		}

declare void @external.func() #3		declare void @external.func() #3

		; This function gets deleted.
define internal void @defined.func() #3 {		define internal void @defined.func() #3 {
; HSA-LABEL: define {{[^@]+}}@defined.func		; AKF_HSA-LABEL: define {{[^@]+}}@defined.func
; HSA-SAME: () #[[ATTR17:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR17:[0-9]+]] {
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
;		;
ret void		ret void
}		}

define void @func_call_external() #3 {		define void @func_call_external() #3 {
; HSA-LABEL: define {{[^@]+}}@func_call_external		; AKF_HSA-LABEL: define {{[^@]+}}@func_call_external
; HSA-SAME: () #[[ATTR17]] {		; AKF_HSA-SAME: () #[[ATTR17]] {
; HSA-NEXT: call void @external.func()		; AKF_HSA-NEXT: call void @external.func()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_external
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: call void @external.func() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @external.func()		call void @external.func()
ret void		ret void
}		}

define void @func_call_defined() #3 {		define void @func_call_defined() #3 {
; HSA-LABEL: define {{[^@]+}}@func_call_defined		; AKF_HSA-LABEL: define {{[^@]+}}@func_call_defined
; HSA-SAME: () #[[ATTR17]] {		; AKF_HSA-SAME: () #[[ATTR17]] {
; HSA-NEXT: call void @defined.func()		; AKF_HSA-NEXT: call void @defined.func()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_defined
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR19:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @defined.func()		call void @defined.func()
ret void		ret void
}		}

define void @func_call_asm() #3 {		define void @func_call_asm() #3 {
; HSA-LABEL: define {{[^@]+}}@func_call_asm		; AKF_HSA-LABEL: define {{[^@]+}}@func_call_asm
; HSA-SAME: () #[[ATTR18:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR18:[0-9]+]] {
; HSA-NEXT: call void asm sideeffect "", ""() #[[ATTR18]]		; AKF_HSA-NEXT: call void asm sideeffect "", ""() #[[ATTR18]]
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_asm
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR19]] {
		; ATTRIBUTOR_HSA-NEXT: call void asm sideeffect "", ""() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void asm sideeffect "", ""() #3		call void asm sideeffect "", ""() #3
ret void		ret void
}		}

define amdgpu_kernel void @kern_call_external() #3 {		define amdgpu_kernel void @kern_call_external() #3 {
; HSA-LABEL: define {{[^@]+}}@kern_call_external		; AKF_HSA-LABEL: define {{[^@]+}}@kern_call_external
; HSA-SAME: () #[[ATTR19:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR19:[0-9]+]] {
; HSA-NEXT: call void @external.func()		; AKF_HSA-NEXT: call void @external.func()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_external
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR20:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: call void @external.func() #[[ATTR22]]
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @external.func()		call void @external.func()
ret void		ret void
}		}

define amdgpu_kernel void @func_kern_defined() #3 {		define amdgpu_kernel void @func_kern_defined() #3 {
; HSA-LABEL: define {{[^@]+}}@func_kern_defined		; AKF_HSA-LABEL: define {{[^@]+}}@func_kern_defined
; HSA-SAME: () #[[ATTR19]] {		; AKF_HSA-SAME: () #[[ATTR19]] {
; HSA-NEXT: call void @defined.func()		; AKF_HSA-NEXT: call void @defined.func()
; HSA-NEXT: ret void		; AKF_HSA-NEXT: ret void
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_kern_defined
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR19]] {
		; ATTRIBUTOR_HSA-NEXT: ret void
;		;
call void @defined.func()		call void @defined.func()
ret void		ret void
}		}

define i32 @use_dispatch_ptr_ret_type() #1 {		define i32 @use_dispatch_ptr_ret_type() #1 {
; HSA-LABEL: define {{[^@]+}}@use_dispatch_ptr_ret_type		; AKF_HSA-LABEL: define {{[^@]+}}@use_dispatch_ptr_ret_type
; HSA-SAME: () #[[ATTR20:[0-9]+]] {		; AKF_HSA-SAME: () #[[ATTR20:[0-9]+]] {
; HSA-NEXT: [[DISPATCH_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.dispatch.ptr()		; AKF_HSA-NEXT: [[DISPATCH_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.dispatch.ptr()
; HSA-NEXT: store volatile i8 addrspace(4)* [[DISPATCH_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8		; AKF_HSA-NEXT: store volatile i8 addrspace(4)* [[DISPATCH_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
; HSA-NEXT: ret i32 0		; AKF_HSA-NEXT: ret i32 0
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@use_dispatch_ptr_ret_type
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR21:[0-9]+]] {
		; ATTRIBUTOR_HSA-NEXT: [[DISPATCH_PTR:%.]] = call i8 addrspace(4) @llvm.amdgcn.dispatch.ptr()
		; ATTRIBUTOR_HSA-NEXT: store volatile i8 addrspace(4)* [[DISPATCH_PTR]], i8 addrspace(4)* addrspace(1)* undef, align 8
		; ATTRIBUTOR_HSA-NEXT: ret i32 0
;		;
%dispatch.ptr = call i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()		%dispatch.ptr = call i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
store volatile i8 addrspace(4)* %dispatch.ptr, i8 addrspace(4)* addrspace(1)* undef		store volatile i8 addrspace(4)* %dispatch.ptr, i8 addrspace(4)* addrspace(1)* undef
ret i32 0		ret i32 0
}		}

define float @func_indirect_use_dispatch_ptr_constexpr_cast_func() #1 {		define float @func_indirect_use_dispatch_ptr_constexpr_cast_func() #1 {
; HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_ptr_constexpr_cast_func		; AKF_HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_ptr_constexpr_cast_func
; HSA-SAME: () #[[ATTR20]] {		; AKF_HSA-SAME: () #[[ATTR20]] {
; HSA-NEXT: [[F:%.]] = call float bitcast (i32 () @use_dispatch_ptr_ret_type to float ()*)()		; AKF_HSA-NEXT: [[F:%.]] = call float bitcast (i32 () @use_dispatch_ptr_ret_type to float ()*)()
; HSA-NEXT: [[FADD:%.*]] = fadd float [[F]], 1.000000e+00		; AKF_HSA-NEXT: [[FADD:%.*]] = fadd float [[F]], 1.000000e+00
; HSA-NEXT: ret float [[FADD]]		; AKF_HSA-NEXT: ret float [[FADD]]
		;
		; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_use_dispatch_ptr_constexpr_cast_func
		; ATTRIBUTOR_HSA-SAME: () #[[ATTR21]] {
		; ATTRIBUTOR_HSA-NEXT: [[F:%.]] = call float bitcast (i32 () @use_dispatch_ptr_ret_type to float ()*)()
		; ATTRIBUTOR_HSA-NEXT: [[FADD:%.*]] = fadd float [[F]], 1.000000e+00
		; ATTRIBUTOR_HSA-NEXT: ret float [[FADD]]
;		;
%f = call float bitcast (i32()* @use_dispatch_ptr_ret_type to float()*)()		%f = call float bitcast (i32()* @use_dispatch_ptr_ret_type to float()*)()
%fadd = fadd float %f, 1.0		%fadd = fadd float %f, 1.0
ret float %fadd		ret float %fadd
}		}

attributes #0 = { nounwind readnone speculatable }		attributes #0 = { nounwind readnone speculatable }
attributes #1 = { nounwind "target-cpu"="fiji" }		attributes #1 = { nounwind "target-cpu"="fiji" }
attributes #2 = { nounwind "target-cpu"="gfx900" }		attributes #2 = { nounwind "target-cpu"="gfx900" }
attributes #3 = { nounwind }		attributes #3 = { nounwind }

;.		;.
; HSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn }		; AKF_HSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn }
; HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-work-item-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-work-item-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-item-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-item-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-item-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-item-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-group-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-group-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-group-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-group-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR7]] = { nounwind "amdgpu-dispatch-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR7]] = { nounwind "amdgpu-dispatch-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR8]] = { nounwind "amdgpu-queue-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR8]] = { nounwind "amdgpu-queue-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR9]] = { nounwind "amdgpu-dispatch-id" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR9]] = { nounwind "amdgpu-dispatch-id" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR10]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "target-cpu"="fiji" }		; AKF_HSA: attributes #[[ATTR10]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "target-cpu"="fiji" }
; HSA: attributes #[[ATTR11]] = { nounwind "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR11]] = { nounwind "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR12]] = { nounwind "target-cpu"="gfx900" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR12]] = { nounwind "target-cpu"="gfx900" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR13]] = { nounwind "amdgpu-queue-ptr" "target-cpu"="gfx900" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR13]] = { nounwind "amdgpu-queue-ptr" "target-cpu"="gfx900" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR14]] = { nounwind "amdgpu-kernarg-segment-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR14]] = { nounwind "amdgpu-kernarg-segment-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR15]] = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" }		; AKF_HSA: attributes #[[ATTR15]] = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" }
; HSA: attributes #[[ATTR16]] = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR16]] = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR17]] = { nounwind "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR17]] = { nounwind "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR18]] = { nounwind }		; AKF_HSA: attributes #[[ATTR18]] = { nounwind }
; HSA: attributes #[[ATTR19]] = { nounwind "amdgpu-calls" "uniform-work-group-size"="false" }		; AKF_HSA: attributes #[[ATTR19]] = { nounwind "amdgpu-calls" "uniform-work-group-size"="false" }
; HSA: attributes #[[ATTR20]] = { nounwind "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "target-cpu"="fiji" }		; AKF_HSA: attributes #[[ATTR20]] = { nounwind "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "target-cpu"="fiji" }
		;.
		; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-work-item-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-item-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-item-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-group-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-group-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR7]] = { nounwind "amdgpu-dispatch-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR8]] = { nounwind "amdgpu-queue-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR9]] = { nounwind "amdgpu-dispatch-id" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR10]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR11]] = { noreturn nounwind readnone "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR12]] = { noreturn nounwind "amdgpu-work-item-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR13]] = { nounwind "target-cpu"="gfx900" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR14]] = { nounwind "amdgpu-queue-ptr" "target-cpu"="gfx900" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR15]] = { nounwind "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR16]] = { nounwind "amdgpu-kernarg-segment-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR17]] = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR18]] = { nounwind "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR19]] = { nounwind "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR20]] = { nounwind "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR21]] = { nounwind "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR22]] = { nounwind }
		; ATTRIBUTOR_HSA: attributes #[[ATTR23]] = { noreturn nounwind }
;.		;.

llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefix=HSA %s		; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=HSA,AKF_HSA %s
		; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-attributor < %s \| FileCheck -check-prefixes=HSA,ATTRIBUTOR_HSA %s

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"		target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

declare i32 @llvm.amdgcn.workgroup.id.x() #0		declare i32 @llvm.amdgcn.workgroup.id.x() #0
declare i32 @llvm.amdgcn.workgroup.id.y() #0		declare i32 @llvm.amdgcn.workgroup.id.y() #0
declare i32 @llvm.amdgcn.workgroup.id.z() #0		declare i32 @llvm.amdgcn.workgroup.id.z() #0

declare i32 @llvm.amdgcn.workitem.id.x() #0		declare i32 @llvm.amdgcn.workitem.id.x() #0
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	;
store i32 0, i32 addrspace(5)* %alloca		store i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

attributes #0 = { nounwind readnone speculatable }		attributes #0 = { nounwind readnone speculatable }
attributes #1 = { nounwind }		attributes #1 = { nounwind }

;.		;.
; HSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn }		; AKF_HSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn }
; HSA: attributes #[[ATTR1]] = { nounwind }		; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
; HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" }		; AKF_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" }
; HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" }		; AKF_HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" }
; HSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" }		; AKF_HSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" }
; HSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" }		; AKF_HSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" }
; HSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" }		; AKF_HSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" }
; HSA: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" }		; AKF_HSA: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" }
; HSA: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }		; AKF_HSA: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
; HSA: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }		; AKF_HSA: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
; HSA: attributes #[[ATTR10]] = { nounwind "amdgpu-dispatch-ptr" }		; AKF_HSA: attributes #[[ATTR10]] = { nounwind "amdgpu-dispatch-ptr" }
; HSA: attributes #[[ATTR11]] = { nounwind "amdgpu-queue-ptr" }		; AKF_HSA: attributes #[[ATTR11]] = { nounwind "amdgpu-queue-ptr" }
; HSA: attributes #[[ATTR12]] = { nounwind "amdgpu-kernarg-segment-ptr" }		; AKF_HSA: attributes #[[ATTR12]] = { nounwind "amdgpu-kernarg-segment-ptr" }
; HSA: attributes #[[ATTR13]] = { nounwind "amdgpu-stack-objects" }		; AKF_HSA: attributes #[[ATTR13]] = { nounwind "amdgpu-stack-objects" }
		;.
		; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR10]] = { nounwind "amdgpu-dispatch-ptr" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR11]] = { nounwind "amdgpu-queue-ptr" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR12]] = { nounwind "amdgpu-kernarg-segment-ptr" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_HSA: attributes #[[ATTR13]] = { nounwind "amdgpu-stack-objects" "uniform-work-group-size"="false" }
;.		;.

llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-unknown-unknown -amdgpu-annotate-kernel-features < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-unknown-unknown -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-unknown-unknown -amdgpu-attributor < %s \| FileCheck -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

	declare i32 @llvm.r600.read.tgid.x() #0			declare i32 @llvm.r600.read.tgid.x() #0
	declare i32 @llvm.r600.read.tgid.y() #0			declare i32 @llvm.r600.read.tgid.y() #0
	declare i32 @llvm.r600.read.tgid.z() #0			declare i32 @llvm.r600.read.tgid.z() #0

	declare i32 @llvm.r600.read.tidig.x() #0			declare i32 @llvm.r600.read.tidig.x() #0
	declare i32 @llvm.r600.read.tidig.y() #0			declare i32 @llvm.r600.read.tidig.y() #0
	declare i32 @llvm.r600.read.tidig.z() #0			declare i32 @llvm.r600.read.tidig.z() #0

	declare i32 @llvm.r600.read.local.size.x() #0			declare i32 @llvm.r600.read.local.size.x() #0
	declare i32 @llvm.r600.read.local.size.y() #0			declare i32 @llvm.r600.read.local.size.y() #0
	declare i32 @llvm.r600.read.local.size.z() #0			declare i32 @llvm.r600.read.local.size.z() #0

	define amdgpu_kernel void @use_tgid_x(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_x(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_x(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_x
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tgid.x()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tgid.x()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.tgid.x()			%val = call i32 @llvm.r600.read.tgid.x()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tgid_y(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_y(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_y(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_y
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR2:[0-9]+]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.tgid.y()			%val = call i32 @llvm.r600.read.tgid.y()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @multi_use_tgid_y(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @multi_use_tgid_y(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @multi_use_tgid_y(			; CHECK-LABEL: define {{[^@]+}}@multi_use_tgid_y
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR2]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tgid.y()			%val0 = call i32 @llvm.r600.read.tgid.y()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	%val1 = call i32 @llvm.r600.read.tgid.y()			%val1 = call i32 @llvm.r600.read.tgid.y()
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tgid_x_y(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_x_y(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_x_y(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_x_y
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR2]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.x()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.x()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tgid.x()			%val0 = call i32 @llvm.r600.read.tgid.x()
	%val1 = call i32 @llvm.r600.read.tgid.y()			%val1 = call i32 @llvm.r600.read.tgid.y()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tgid_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_z(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR3:[0-9]+]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tgid.z()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tgid.z()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.tgid.z()			%val = call i32 @llvm.r600.read.tgid.z()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tgid_x_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_x_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_x_z(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_x_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.x()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.x()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.z()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.z()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tgid.x()			%val0 = call i32 @llvm.r600.read.tgid.x()
	%val1 = call i32 @llvm.r600.read.tgid.z()			%val1 = call i32 @llvm.r600.read.tgid.z()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tgid_y_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_y_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_y_z(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_y_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR4:[0-9]+]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.z()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.z()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tgid.y()			%val0 = call i32 @llvm.r600.read.tgid.y()
	%val1 = call i32 @llvm.r600.read.tgid.z()			%val1 = call i32 @llvm.r600.read.tgid.z()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tgid_x_y_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_x_y_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tgid_x_y_z(			; CHECK-LABEL: define {{[^@]+}}@use_tgid_x_y_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR4]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.x()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tgid.x()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: [[VAL2:%.*]] = call i32 @llvm.r600.read.tgid.z()			; CHECK-NEXT: [[VAL2:%.*]] = call i32 @llvm.r600.read.tgid.z()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL2]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL2]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tgid.x()			%val0 = call i32 @llvm.r600.read.tgid.x()
	%val1 = call i32 @llvm.r600.read.tgid.y()			%val1 = call i32 @llvm.r600.read.tgid.y()
	%val2 = call i32 @llvm.r600.read.tgid.z()			%val2 = call i32 @llvm.r600.read.tgid.z()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	store volatile i32 %val2, i32 addrspace(1)* %ptr			store volatile i32 %val2, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tidig_x(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tidig_x(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tidig_x(			; CHECK-LABEL: define {{[^@]+}}@use_tidig_x
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tidig.x()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tidig.x()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.tidig.x()			%val = call i32 @llvm.r600.read.tidig.x()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tidig_y(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tidig_y(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tidig_y(			; CHECK-LABEL: define {{[^@]+}}@use_tidig_y
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR5:[0-9]+]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tidig.y()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tidig.y()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.tidig.y()			%val = call i32 @llvm.r600.read.tidig.y()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tidig_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tidig_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tidig_z(			; CHECK-LABEL: define {{[^@]+}}@use_tidig_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR6:[0-9]+]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tidig.z()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.tidig.z()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.tidig.z()			%val = call i32 @llvm.r600.read.tidig.z()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tidig_x_tgid_x(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tidig_x_tgid_x(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tidig_x_tgid_x(			; CHECK-LABEL: define {{[^@]+}}@use_tidig_x_tgid_x
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.x()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.x()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.x()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.x()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tidig.x()			%val0 = call i32 @llvm.r600.read.tidig.x()
	%val1 = call i32 @llvm.r600.read.tgid.x()			%val1 = call i32 @llvm.r600.read.tgid.x()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tidig_y_tgid_y(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tidig_y_tgid_y(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tidig_y_tgid_y(			; CHECK-LABEL: define {{[^@]+}}@use_tidig_y_tgid_y
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR7:[0-9]+]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.y()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.y()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tidig.y()			%val0 = call i32 @llvm.r600.read.tidig.y()
	%val1 = call i32 @llvm.r600.read.tgid.y()			%val1 = call i32 @llvm.r600.read.tgid.y()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_tidig_x_y_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tidig_x_y_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_tidig_x_y_z(			; CHECK-LABEL: define {{[^@]+}}@use_tidig_x_y_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR8:[0-9]+]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.x()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.x()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tidig.y()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tidig.y()
	; CHECK-NEXT: [[VAL2:%.*]] = call i32 @llvm.r600.read.tidig.z()			; CHECK-NEXT: [[VAL2:%.*]] = call i32 @llvm.r600.read.tidig.z()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL2]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL2]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tidig.x()			%val0 = call i32 @llvm.r600.read.tidig.x()
	%val1 = call i32 @llvm.r600.read.tidig.y()			%val1 = call i32 @llvm.r600.read.tidig.y()
	%val2 = call i32 @llvm.r600.read.tidig.z()			%val2 = call i32 @llvm.r600.read.tidig.z()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	store volatile i32 %val2, i32 addrspace(1)* %ptr			store volatile i32 %val2, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_all_workitems(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_all_workitems(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_all_workitems(			; CHECK-LABEL: define {{[^@]+}}@use_all_workitems
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR9:[0-9]+]] {
	; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.x()			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.r600.read.tidig.x()
	; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tidig.y()			; CHECK-NEXT: [[VAL1:%.*]] = call i32 @llvm.r600.read.tidig.y()
	; CHECK-NEXT: [[VAL2:%.*]] = call i32 @llvm.r600.read.tidig.z()			; CHECK-NEXT: [[VAL2:%.*]] = call i32 @llvm.r600.read.tidig.z()
	; CHECK-NEXT: [[VAL3:%.*]] = call i32 @llvm.r600.read.tgid.x()			; CHECK-NEXT: [[VAL3:%.*]] = call i32 @llvm.r600.read.tgid.x()
	; CHECK-NEXT: [[VAL4:%.*]] = call i32 @llvm.r600.read.tgid.y()			; CHECK-NEXT: [[VAL4:%.*]] = call i32 @llvm.r600.read.tgid.y()
	; CHECK-NEXT: [[VAL5:%.*]] = call i32 @llvm.r600.read.tgid.z()			; CHECK-NEXT: [[VAL5:%.*]] = call i32 @llvm.r600.read.tgid.z()
	; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i32 [[VAL0]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL1]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL2]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL2]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL3]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL3]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL4]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL4]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: store volatile i32 [[VAL5]], i32 addrspace(1)* [[PTR]], align 4			; CHECK-NEXT: store volatile i32 [[VAL5]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val0 = call i32 @llvm.r600.read.tidig.x()			%val0 = call i32 @llvm.r600.read.tidig.x()
	%val1 = call i32 @llvm.r600.read.tidig.y()			%val1 = call i32 @llvm.r600.read.tidig.y()
	%val2 = call i32 @llvm.r600.read.tidig.z()			%val2 = call i32 @llvm.r600.read.tidig.z()
	%val3 = call i32 @llvm.r600.read.tgid.x()			%val3 = call i32 @llvm.r600.read.tgid.x()
	%val4 = call i32 @llvm.r600.read.tgid.y()			%val4 = call i32 @llvm.r600.read.tgid.y()
	%val5 = call i32 @llvm.r600.read.tgid.z()			%val5 = call i32 @llvm.r600.read.tgid.z()
	store volatile i32 %val0, i32 addrspace(1)* %ptr			store volatile i32 %val0, i32 addrspace(1)* %ptr
	store volatile i32 %val1, i32 addrspace(1)* %ptr			store volatile i32 %val1, i32 addrspace(1)* %ptr
	store volatile i32 %val2, i32 addrspace(1)* %ptr			store volatile i32 %val2, i32 addrspace(1)* %ptr
	store volatile i32 %val3, i32 addrspace(1)* %ptr			store volatile i32 %val3, i32 addrspace(1)* %ptr
	store volatile i32 %val4, i32 addrspace(1)* %ptr			store volatile i32 %val4, i32 addrspace(1)* %ptr
	store volatile i32 %val5, i32 addrspace(1)* %ptr			store volatile i32 %val5, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_get_local_size_x(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_get_local_size_x(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_get_local_size_x(			; CHECK-LABEL: define {{[^@]+}}@use_get_local_size_x
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.local.size.x()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.local.size.x()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.local.size.x()			%val = call i32 @llvm.r600.read.local.size.x()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_get_local_size_y(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_get_local_size_y(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_get_local_size_y(			; CHECK-LABEL: define {{[^@]+}}@use_get_local_size_y
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.local.size.y()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.local.size.y()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.local.size.y()			%val = call i32 @llvm.r600.read.local.size.y()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @use_get_local_size_z(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_get_local_size_z(i32 addrspace(1)* %ptr) #1 {
	; CHECK-LABEL: @use_get_local_size_z(			; CHECK-LABEL: define {{[^@]+}}@use_get_local_size_z
				; CHECK-SAME: (i32 addrspace(1)* [[PTR:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.local.size.z()			; CHECK-NEXT: [[VAL:%.*]] = call i32 @llvm.r600.read.local.size.z()
	; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 [[VAL]], i32 addrspace(1)* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%val = call i32 @llvm.r600.read.local.size.z()			%val = call i32 @llvm.r600.read.local.size.z()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	attributes #0 = { nounwind readnone }			attributes #0 = { nounwind readnone }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

	; HSA: attributes #0 = { nounwind readnone }			; ALL: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn "uniform-work-group-size"="false" }
	; HSA: attributes #1 = { nounwind }			; ALL: attributes #[[ATTR1]] = { nounwind "uniform-work-group-size"="false" }
	; HSA: attributes #2 = { nounwind "amdgpu-work-group-id-y" }			; ALL: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" "uniform-work-group-size"="false" }
	; HSA: attributes #3 = { nounwind "amdgpu-work-group-id-z" }			; ALL: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
	; HSA: attributes #4 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" }			; ALL: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
	; HSA: attributes #5 = { nounwind "amdgpu-work-item-id-y" }			; ALL: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
	; HSA: attributes #6 = { nounwind "amdgpu-work-item-id-z" }			; ALL: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
	; HSA: attributes #7 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" }			; ALL: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
	; HSA: attributes #8 = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; ALL: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
	; HSA: attributes #9 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; ALL: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
	; HSA: attributes #10 = { nounwind "amdgpu-dispatch-ptr" }			; NOHSA: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR1]] = { nounwind "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; NOHSA: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				;.
				; AKF_CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn }
				; AKF_CHECK: attributes #[[ATTR1]] = { nounwind }
				; AKF_CHECK: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" }
				; AKF_CHECK: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" }
				; AKF_CHECK: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" }
				; AKF_CHECK: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" }
				; AKF_CHECK: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" }
				; AKF_CHECK: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" }
				; AKF_CHECK: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
				; AKF_CHECK: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { nounwind "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { nounwind "amdgpu-work-group-id-y" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR3]] = { nounwind "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR4]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR5]] = { nounwind "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR6]] = { nounwind "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR7]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR8]] = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR9]] = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				;.

llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefix=GCN %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=GCN,AKF_GCN %s
				; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-attributor < %s \| FileCheck -check-prefixes=GCN,ATTRIBUTOR_GCN %s

	define internal void @indirect() {			define internal void @indirect() {
	; GCN-LABEL: define {{[^@]+}}@indirect			; GCN-LABEL: define {{[^@]+}}@indirect
	; GCN-SAME: () #[[ATTR0:[0-9]+]] {			; GCN-SAME: () #[[ATTR0:[0-9]+]] {
	; GCN-NEXT: ret void			; GCN-NEXT: ret void
	;			;
	ret void			ret void
	}			}

	define internal void @direct() {			define internal void @direct() {
	; GCN-LABEL: define {{[^@]+}}@direct			; AKF_GCN-LABEL: define {{[^@]+}}@direct
	; GCN-SAME: () #[[ATTR1:[0-9]+]] {			; AKF_GCN-SAME: () #[[ATTR1:[0-9]+]] {
	; GCN-NEXT: [[FPTR:%.]] = alloca void (), align 8			; AKF_GCN-NEXT: [[FPTR:%.]] = alloca void (), align 8
	; GCN-NEXT: store void ()* @indirect, void ()** [[FPTR]], align 8			; AKF_GCN-NEXT: store void ()* @indirect, void ()** [[FPTR]], align 8
	; GCN-NEXT: [[FP:%.]] = load void (), void ()** [[FPTR]], align 8			; AKF_GCN-NEXT: [[FP:%.]] = load void (), void ()** [[FPTR]], align 8
	; GCN-NEXT: call void [[FP]]()			; AKF_GCN-NEXT: call void [[FP]]()
	; GCN-NEXT: ret void			; AKF_GCN-NEXT: ret void
				;
				; ATTRIBUTOR_GCN-LABEL: define {{[^@]+}}@direct
				; ATTRIBUTOR_GCN-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_GCN-NEXT: [[FPTR:%.]] = alloca void (), align 8
				; ATTRIBUTOR_GCN-NEXT: store void ()* @indirect, void ()** [[FPTR]], align 8
				; ATTRIBUTOR_GCN-NEXT: call void @indirect()
				; ATTRIBUTOR_GCN-NEXT: ret void
	;			;
	%fptr = alloca void()*			%fptr = alloca void()*
	store void()* @indirect, void()** %fptr			store void()* @indirect, void()** %fptr
	%fp = load void(), void()* %fptr			%fp = load void(), void()* %fptr
	call void %fp()			call void %fp()
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_direct_indirect_call() {			define amdgpu_kernel void @test_direct_indirect_call() {
	; GCN-LABEL: define {{[^@]+}}@test_direct_indirect_call			; GCN-LABEL: define {{[^@]+}}@test_direct_indirect_call
	; GCN-SAME: () #[[ATTR2:[0-9]+]] {			; GCN-SAME: () #[[ATTR2:[0-9]+]] {
	; GCN-NEXT: call void @direct()			; GCN-NEXT: call void @direct()
	; GCN-NEXT: ret void			; GCN-NEXT: ret void
	;			;
	call void @direct()			call void @direct()
	ret void			ret void
	}			}
	;.			;.
	; GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; AKF_GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
	; GCN: attributes #[[ATTR1]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }			; AKF_GCN: attributes #[[ATTR1]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
	; GCN: attributes #[[ATTR2]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }			; AKF_GCN: attributes #[[ATTR2]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				;.
				; ATTRIBUTOR_GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_GCN: attributes #[[ATTR1]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_GCN: attributes #[[ATTR2]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
	;.			;.

llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefix=GCN %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefixes=GCN,AKF_GCN %s
				; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-attributor %s \| FileCheck -check-prefixes=GCN,ATTRIBUTOR_GCN %s

	define internal void @indirect() {			define internal void @indirect() {
	; GCN-LABEL: define {{[^@]+}}@indirect			; GCN-LABEL: define {{[^@]+}}@indirect
	; GCN-SAME: () #[[ATTR0:[0-9]+]] {			; GCN-SAME: () #[[ATTR0:[0-9]+]] {
	; GCN-NEXT: ret void			; GCN-NEXT: ret void
	;			;
				; CHECK-LABEL: define {{[^@]+}}@indirect
				; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: ret void
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_simple_indirect_call() #0 {			define amdgpu_kernel void @test_simple_indirect_call() #0 {
	; GCN-LABEL: define {{[^@]+}}@test_simple_indirect_call			; GCN-LABEL: define {{[^@]+}}@test_simple_indirect_call
	; GCN-SAME: () #[[ATTR1:[0-9]+]] {			; GCN-SAME: () #[[ATTR1:[0-9]+]] {
	; GCN-NEXT: [[FPTR:%.]] = alloca void (), align 8			; GCN-NEXT: [[FPTR:%.]] = alloca void (), align 8
	; GCN-NEXT: store void ()* @indirect, void ()** [[FPTR]], align 8			; GCN-NEXT: store void ()* @indirect, void ()** [[FPTR]], align 8
	; GCN-NEXT: [[FP:%.]] = load void (), void ()** [[FPTR]], align 8			; GCN-NEXT: [[FP:%.]] = load void (), void ()** [[FPTR]], align 8
	; GCN-NEXT: call void [[FP]]()			; GCN-NEXT: call void [[FP]]()
	; GCN-NEXT: ret void			; GCN-NEXT: ret void
	;			;
				; CHECK-LABEL: define {{[^@]+}}@test_simple_indirect_call
				; CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; CHECK-NEXT: [[FPTR:%.]] = alloca void (), align 8
				; CHECK-NEXT: store void ()* @indirect, void ()** [[FPTR]], align 8
				; CHECK-NEXT: [[FP:%.]] = load void (), void ()** [[FPTR]], align 8
				; CHECK-NEXT: call void [[FP]]()
				; CHECK-NEXT: ret void
	%fptr = alloca void()*			%fptr = alloca void()*
	store void()* @indirect, void()** %fptr			store void()* @indirect, void()** %fptr
	%fp = load void(), void()* %fptr			%fp = load void(), void()* %fptr
	call void %fp()			call void %fp()
	ret void			ret void
	}			}

	attributes #0 = { "amdgpu-dispatch-id" }			attributes #0 = { "amdgpu-dispatch-id" }

	;.			;.
	; GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; AKF_GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
	; GCN: attributes #[[ATTR1]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; AKF_GCN: attributes #[[ATTR1]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
				;.
				; ATTRIBUTOR_GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_GCN: attributes #[[ATTR1]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
	;.			;.

llvm/test/CodeGen/AMDGPU/pal-simple-indirect-call.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
; Check that no attributes are added to graphics functions		; Check that no attributes are added to graphics functions
; RUN: opt -S -mtriple=amdgcn-amd-amdpal -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefix=GCN %s		; RUN: opt -S -mtriple=amdgcn-amd-amdpal -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefixes=AKF_GCN %s
		; RUN: opt -S -mtriple=amdgcn-amd-amdpal -amdgpu-attributor %s \| FileCheck -check-prefixes=ATTRIBUTOR_GCN %s

; Check that it doesn't crash		; Check that it doesn't crash
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 < %s \| FileCheck -check-prefixes=GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 < %s \| FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s \| FileCheck -check-prefixes=GFX10 %s		; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s \| FileCheck -check-prefixes=GFX10 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -global-isel < %s \| FileCheck -check-prefixes=GFX10 %s		; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -global-isel < %s \| FileCheck -check-prefixes=GFX10 %s

target datalayout = "A5"		target datalayout = "A5"


define amdgpu_cs void @test_simple_indirect_call() {		define amdgpu_cs void @test_simple_indirect_call() {
; GCN-LABEL: define amdgpu_cs void @test_simple_indirect_call() {		; AKF_GCN-LABEL: define {{[^@]+}}@test_simple_indirect_call() {
		; AKF_GCN-NEXT: [[PC:%.*]] = call i64 @llvm.amdgcn.s.getpc()
		; AKF_GCN-NEXT: [[FUN:%.]] = inttoptr i64 [[PC]] to void ()
		; AKF_GCN-NEXT: call amdgpu_gfx void [[FUN]]()
		; AKF_GCN-NEXT: ret void
		;
		; ATTRIBUTOR_GCN-LABEL: define {{[^@]+}}@test_simple_indirect_call
		; ATTRIBUTOR_GCN-SAME: () #[[ATTR0:[0-9]+]] {
		; ATTRIBUTOR_GCN-NEXT: [[PC:%.*]] = call i64 @llvm.amdgcn.s.getpc()
		; ATTRIBUTOR_GCN-NEXT: [[FUN:%.]] = inttoptr i64 [[PC]] to void ()
		; ATTRIBUTOR_GCN-NEXT: call amdgpu_gfx void [[FUN]]()
		; ATTRIBUTOR_GCN-NEXT: ret void
		;
		; Attributor adds work-group-size attribute. This should be ok.
; GFX9-LABEL: test_simple_indirect_call:		; GFX9-LABEL: test_simple_indirect_call:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_getpc_b64 s[36:37]		; GFX9-NEXT: s_getpc_b64 s[36:37]
; GFX9-NEXT: s_mov_b32 s36, s0		; GFX9-NEXT: s_mov_b32 s36, s0
; GFX9-NEXT: s_load_dwordx4 s[36:39], s[36:37], 0x10		; GFX9-NEXT: s_load_dwordx4 s[36:39], s[36:37], 0x10
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_mov_b32 s32, 0		; GFX9-NEXT: s_mov_b32 s32, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_add_u32 s36, s36, s0		; GFX9-NEXT: s_add_u32 s36, s36, s0
; GFX9-NEXT: s_addc_u32 s37, s37, 0		; GFX9-NEXT: s_addc_u32 s37, s37, 0
; GFX9-NEXT: s_mov_b64 s[0:1], s[36:37]		; GFX9-NEXT: s_mov_b64 s[0:1], s[36:37]
; GFX9-NEXT: s_mov_b64 s[2:3], s[38:39]		; GFX9-NEXT: s_mov_b64 s[2:3], s[38:39]
; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;
; GFX10-LABEL: test_simple_indirect_call:		; GFX10-LABEL: test_simple_indirect_call:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_getpc_b64 s[36:37]		; GFX10-NEXT: s_getpc_b64 s[36:37]
; GFX10-NEXT: s_mov_b32 s36, s0		; GFX10-NEXT: s_mov_b32 s36, s0
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_load_dwordx4 s[36:39], s[36:37], 0x10		; GFX10-NEXT: s_load_dwordx4 s[36:39], s[36:37], 0x10
; GFX10-NEXT: s_mov_b32 s32, 0		; GFX10-NEXT: s_mov_b32 s32, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
Show All 11 Lines	; GFX10-NEXT: s_endpgm
call amdgpu_gfx void %fun()		call amdgpu_gfx void %fun()
ret void		ret void
}		}

; Function Attrs: nounwind readnone speculatable willreturn		; Function Attrs: nounwind readnone speculatable willreturn
declare i64 @llvm.amdgcn.s.getpc() #0		declare i64 @llvm.amdgcn.s.getpc() #0

attributes #0 = { nounwind readnone speculatable willreturn }		attributes #0 = { nounwind readnone speculatable willreturn }
		;.
		; AKF_GCN: attributes #[[ATTR0:[0-9]+]] = { nounwind readnone speculatable willreturn }
		;.
		; ATTRIBUTOR_GCN: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
		; ATTRIBUTOR_GCN: attributes #[[ATTR1:[0-9]+]] = { nounwind readnone speculatable willreturn "uniform-work-group-size"="false" }
		;.

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefix=GCN %s		; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefixes=GCN,AKF_GCN %s
		; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-attributor %s \| FileCheck -check-prefixes=GCN,ATTRIBUTOR_GCN %s

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GFX9 %s

target datalayout = "A5"		target datalayout = "A5"

; GFX9-LABEL: {{^}}indirect:		; GFX9-LABEL: {{^}}indirect:
define internal void @indirect() {		define internal void @indirect() {
; GCN-LABEL: define {{[^@]+}}@indirect		; GCN-LABEL: define {{[^@]+}}@indirect
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	;
%fptr = alloca void()*, addrspace(5)		%fptr = alloca void()*, addrspace(5)
%fptr.cast = addrspacecast void()* addrspace(5)* %fptr to void()**		%fptr.cast = addrspacecast void()* addrspace(5)* %fptr to void()**
store void()* @indirect, void()** %fptr.cast		store void()* @indirect, void()** %fptr.cast
%fp = load void(), void()* %fptr.cast		%fp = load void(), void()* %fptr.cast
call void %fp()		call void %fp()
ret void		ret void
}		}

; attributes #0 = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
; attributes #1 = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
;.		;.
		jdoerfertUnsubmitted Done Reply Inline Actions it doesnt? jdoerfert: it doesnt?
; GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }		; AKF_GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
; GCN: attributes #[[ATTR1]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }		; AKF_GCN: attributes #[[ATTR1]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
		;.
		; ATTRIBUTOR_GCN: attributes #[[ATTR0]] = { "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
		; ATTRIBUTOR_GCN: attributes #[[ATTR1]] = { "amdgpu-calls" "amdgpu-dispatch-id" "amdgpu-dispatch-ptr" "amdgpu-implicitarg-ptr" "amdgpu-queue-ptr" "amdgpu-stack-objects" "amdgpu-work-group-id-x" "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-x" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" "uniform-work-group-size"="false" }
;.		;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-attributor %s \| FileCheck -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

	; If the kernel does not have the uniform-work-group-attribute, set both callee and caller as false			; If the kernel does not have the uniform-work-group-attribute, set both callee and caller as false
				; We write to a global so that the attributor don't deletes the function.

				@x = global i32 0

				;.
				; CHECK: @[[X:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
	define void @foo() #0 {			define void @foo() #0 {
	; CHECK-LABEL: define {{[^@]+}}@foo			; CHECK-LABEL: define {{[^@]+}}@foo
	; CHECK-SAME: () #[[ATTR0:[0-9]+]] {			; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: store i32 0, i32* @x, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				store i32 0, i32* @x
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel1() #1 {			define amdgpu_kernel void @kernel1() #1 {
	; CHECK-LABEL: define {{[^@]+}}@kernel1			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel1
	; CHECK-SAME: () #[[ATTR1:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: call void @foo()			; AKF_CHECK-NEXT: call void @foo()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel1
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @foo() #[[ATTR2:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @foo()			call void @foo()
	ret void			ret void
	}			}

	attributes #0 = { "uniform-work-group-size"="true" }			attributes #0 = { "uniform-work-group-size"="true" }

	;.			;.
	; CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
	; CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { nounwind writeonly "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { nounwind writeonly }
	;.			;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
				; RUN: opt -S -mtriple=amdgcn-unknown-unknown -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-unknown-unknown -amdgpu-attributor < %s \| FileCheck -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

				;.
				; CHECK: @[[G1:[a-zA-Z0-9_$"\\.-]+]] = global i32* null
				; CHECK: @[[G2:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
				define weak void @weak() {
				; AKF_CHECK-LABEL: define {{[^@]+}}@weak
				; AKF_CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; AKF_CHECK-NEXT: call void @internal1()
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@weak
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @internal1() #[[ATTR5:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				call void @internal1()
				ret void
				}

				@G1 = global i32* null

				define internal void @internal1() {
				; AKF_CHECK-LABEL: define {{[^@]+}}@internal1
				; AKF_CHECK-SAME: () #[[ATTR0]] {
				; AKF_CHECK-NEXT: [[TMP1:%.]] = load i32, i32** @G1, align 8
				; AKF_CHECK-NEXT: store i32 0, i32* [[TMP1]], align 4
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@internal1
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: [[TMP1:%.]] = load i32, i32** @G1, align 8
				; ATTRIBUTOR_CHECK-NEXT: store i32 0, i32* [[TMP1]], align 4
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				%1 = load i32, i32* @G1
				store i32 0, i32* %1
				ret void
				}

				define amdgpu_kernel void @kernel1() #0 {
				; AKF_CHECK-LABEL: define {{[^@]+}}@kernel1
				; AKF_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; AKF_CHECK-NEXT: call void @weak()
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel1
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR2:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @weak()
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				call void @weak()
				ret void
				}

				@G2 = global i32 0

				define internal void @internal3() {
				; AKF_CHECK-LABEL: define {{[^@]+}}@internal3
				; AKF_CHECK-SAME: () #[[ATTR2:[0-9]+]] {
				; AKF_CHECK-NEXT: [[TMP1:%.]] = load i32, i32 @G2, align 4
				; AKF_CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP1]], 0
				; AKF_CHECK-NEXT: br i1 [[TMP2]], label [[TMP3:%.]], label [[TMP4:%.]]
				; AKF_CHECK: 3:
				; AKF_CHECK-NEXT: call void @internal4()
				; AKF_CHECK-NEXT: call void @internal3()
				; AKF_CHECK-NEXT: br label [[TMP4]]
				; AKF_CHECK: 4:
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@internal3
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR3:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: [[TMP1:%.]] = load i32, i32 @G2, align 4
				; ATTRIBUTOR_CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP1]], 0
				; ATTRIBUTOR_CHECK-NEXT: br i1 [[TMP2]], label [[TMP3:%.]], label [[TMP4:%.]]
				; ATTRIBUTOR_CHECK: 3:
				; ATTRIBUTOR_CHECK-NEXT: call void @internal4() #[[ATTR6:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: call void @internal3() #[[ATTR7:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: br label [[TMP4]]
				; ATTRIBUTOR_CHECK: 4:
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				%1 = load i32, i32* @G2, align 4
				%2 = icmp eq i32 %1, 0
				br i1 %2, label %3, label %4
				3:
				call void @internal4()
				call void @internal3()
				br label %4
				4:
				ret void
				}

				define internal void @internal4() {
				; AKF_CHECK-LABEL: define {{[^@]+}}@internal4
				; AKF_CHECK-SAME: () #[[ATTR2]] {
				; AKF_CHECK-NEXT: store i32 1, i32* @G2, align 4
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@internal4
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR4:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: store i32 1, i32* @G2, align 4
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				store i32 1, i32* @G2, align 4
				ret void
				}

				define internal void @internal2() {
				; AKF_CHECK-LABEL: define {{[^@]+}}@internal2
				; AKF_CHECK-SAME: () #[[ATTR2]] {
				; AKF_CHECK-NEXT: call void @internal3()
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@internal2
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR3]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @internal3() #[[ATTR7]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				call void @internal3()
				ret void
				}

				define amdgpu_kernel void @kernel2() #0 {
				; AKF_CHECK-LABEL: define {{[^@]+}}@kernel2
				; AKF_CHECK-SAME: () #[[ATTR1]] {
				; AKF_CHECK-NEXT: call void @internal2()
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel2
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR2]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @internal2() #[[ATTR5]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				call void @internal2()
				ret void
				}

				attributes #0 = { "uniform-work-group-size"="true" }

				;.
				; AKF_CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
				; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				; AKF_CHECK: attributes #[[ATTR2]] = { "uniform-work-group-size"="true" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { nofree nosync nounwind willreturn "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR3]] = { nofree nosync nounwind "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR4]] = { nofree nosync nounwind willreturn writeonly "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR5]] = { nounwind }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR6]] = { nofree nosync nounwind willreturn writeonly }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR7]] = { nofree nosync nounwind }
				;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-attributor < %s \| FileCheck -check-prefixes=ATTRIBUTOR_CHECK %s

	; Test to verify if the attribute gets propagated across nested function calls			; Test to verify if the attribute gets propagated across nested function calls

				; Added to prevent Attributor from deleting calls.
				@x = global i32 0

				;.
				; AKF_CHECK: @[[X:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
				; ATTRIBUTOR_CHECK: @[[X:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
	define void @func1() #0 {			define void @func1() #0 {
	; CHECK-LABEL: define {{[^@]+}}@func1			; AKF_CHECK-LABEL: define {{[^@]+}}@func1
	; CHECK-SAME: () #[[ATTR0:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: store i32 0, i32* @x, align 4
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@func1
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: store i32 0, i32* @x, align 4
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
				store i32 0, i32* @x
	ret void			ret void
	}			}

	define void @func2() #1 {			define void @func2() #1 {
	; CHECK-LABEL: define {{[^@]+}}@func2			; AKF_CHECK-LABEL: define {{[^@]+}}@func2
	; CHECK-SAME: () #[[ATTR0]] {			; AKF_CHECK-SAME: () #[[ATTR0]] {
	; CHECK-NEXT: call void @func1()			; AKF_CHECK-NEXT: call void @func1()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@func2
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR0]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func1() #[[ATTR2:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func1()			call void @func1()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel3() #2 {			define amdgpu_kernel void @kernel3() #2 {
	; CHECK-LABEL: define {{[^@]+}}@kernel3			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel3
	; CHECK-SAME: () #[[ATTR1:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: call void @func2()			; AKF_CHECK-NEXT: call void @func2()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel3
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func2() #[[ATTR2]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func2()			call void @func2()
	ret void			ret void
	}			}

	attributes #2 = { "uniform-work-group-size"="true" }			attributes #2 = { "uniform-work-group-size"="true" }

	;.			;.
	; CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="true" }			; AKF_CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="true" }
	; CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }			; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { nounwind writeonly "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { nounwind writeonly }
	;.			;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s
				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-attributor < %s \| FileCheck -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

				; Function added to prevent attributor from deleting call sites.

	; Two kernels with different values of the uniform-work-group-attribute call the same function			; Two kernels with different values of the uniform-work-group-attribute call the same function
				@x = global i32 0

				;.
				; CHECK: @[[X:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
	define void @func() #0 {			define void @func() #0 {
	; CHECK-LABEL: define {{[^@]+}}@func			; CHECK-LABEL: define {{[^@]+}}@func
	; CHECK-SAME: () #[[ATTR0:[0-9]+]] {			; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: store i32 0, i32* @x, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				store i32 0, i32* @x
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel1() #1 {			define amdgpu_kernel void @kernel1() #1 {
	; CHECK-LABEL: define {{[^@]+}}@kernel1			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel1
	; CHECK-SAME: () #[[ATTR1:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: call void @func()			; AKF_CHECK-NEXT: call void @func()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel1
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func() #[[ATTR3:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func()			call void @func()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel2() #2 {			define amdgpu_kernel void @kernel2() #2 {
	; CHECK-LABEL: define {{[^@]+}}@kernel2			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel2
	; CHECK-SAME: () #[[ATTR2:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR2:[0-9]+]] {
	; CHECK-NEXT: call void @func()			; AKF_CHECK-NEXT: call void @func()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel2
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR2:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func() #[[ATTR3]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func()			call void @func()
	ret void			ret void
	}			}

	attributes #1 = { "uniform-work-group-size"="true" }			attributes #1 = { "uniform-work-group-size"="true" }

	;.			;.
	; CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
	; CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }			; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
	; CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { nounwind writeonly "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR3]] = { nounwind writeonly }
	;.			;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-attributor %s \| FileCheck -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

				@x = global i32 0

	; Propagate the uniform-work-group-attribute from the kernel to callee if it doesn't have it			; Propagate the uniform-work-group-attribute from the kernel to callee if it doesn't have it
				;.
				; CHECK: @[[X:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
	define void @func() #0 {			define void @func() #0 {
	; CHECK-LABEL: define {{[^@]+}}@func			; CHECK-LABEL: define {{[^@]+}}@func
	; CHECK-SAME: () #[[ATTR0:[0-9]+]] {			; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: store i32 0, i32* @x, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				store i32 0, i32* @x
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel1() #1 {			define amdgpu_kernel void @kernel1() #1 {
	; CHECK-LABEL: define {{[^@]+}}@kernel1			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel1
	; CHECK-SAME: () #[[ATTR1:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: call void @func()			; AKF_CHECK-NEXT: call void @func()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel1
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func() #[[ATTR4:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func()			call void @func()
	ret void			ret void
	}			}

	; External declaration of a function			; External declaration of a function
	define weak_odr void @weak_func() #0 {			define weak_odr void @weak_func() #0 {
	; CHECK-LABEL: define {{[^@]+}}@weak_func			; AKF_CHECK-LABEL: define {{[^@]+}}@weak_func
	; CHECK-SAME: () #[[ATTR0]] {			; AKF_CHECK-SAME: () #[[ATTR0]] {
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: store i32 0, i32* @x, align 4
				; AKF_CHECK-NEXT: ret void
	;			;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@weak_func
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR2:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: store i32 0, i32* @x, align 4
				; ATTRIBUTOR_CHECK-NEXT: ret void
				;
				store i32 0, i32* @x
	ret void			ret void
	}			}
				jdoerfertUnsubmitted Not Done Reply Inline Actions `weak_func` can be annotated with `"uniform-work-group-size"="true"` because there is no way it is called from a kernel with `"uniform-work-group-size"="false"`. The test is just too conservative here. Please don't keep the linkage check as it is not needed. jdoerfert: `weak_func` can be annotated with `"uniform-work-group-size"="true"` because there is no way it…

	define amdgpu_kernel void @kernel2() #2 {			define amdgpu_kernel void @kernel2() #2 {
	; CHECK-LABEL: define {{[^@]+}}@kernel2			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel2
	; CHECK-SAME: () #[[ATTR2:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR2:[0-9]+]] {
	; CHECK-NEXT: call void @weak_func()			; AKF_CHECK-NEXT: call void @weak_func()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel2
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR3:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @weak_func() #[[ATTR5:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @weak_func()			call void @weak_func()
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { "uniform-work-group-size"="false" }			attributes #1 = { "uniform-work-group-size"="false" }
	attributes #2 = { "uniform-work-group-size"="true" }			attributes #2 = { "uniform-work-group-size"="true" }

	;.			;.
	; CHECK: attributes #[[ATTR0]] = { nounwind "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR0]] = { nounwind "uniform-work-group-size"="false" }
	; CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
	; CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="true" }			; AKF_CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { nounwind writeonly "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { nounwind "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR3]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR4]] = { nounwind writeonly }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR5]] = { nounwind }
	;.			;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck --allow-unused-prefixes -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-attributor %s \| FileCheck --allow-unused-prefixes -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

	; Test to ensure recursive functions exhibit proper behaviour			; Test to ensure recursive functions exhibit proper behaviour
	; Test to generate fibonacci numbers			; Test to generate fibonacci numbers

	define i32 @fib(i32 %n) #0 {			define i32 @fib(i32 %n) #0 {
	; CHECK-LABEL: define {{[^@]+}}@fib			; AKF_CHECK-LABEL: define {{[^@]+}}@fib
	; CHECK-SAME: (i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {			; AKF_CHECK-SAME: (i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[N]], 0			; AKF_CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[N]], 0
	; CHECK-NEXT: br i1 [[CMP1]], label [[EXIT:%.]], label [[CONT1:%.]]			; AKF_CHECK-NEXT: br i1 [[CMP1]], label [[EXIT:%.]], label [[CONT1:%.]]
	; CHECK: cont1:			; AKF_CHECK: cont1:
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[N]], 1			; AKF_CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[N]], 1
	; CHECK-NEXT: br i1 [[CMP2]], label [[EXIT]], label [[CONT2:%.*]]			; AKF_CHECK-NEXT: br i1 [[CMP2]], label [[EXIT]], label [[CONT2:%.*]]
	; CHECK: cont2:			; AKF_CHECK: cont2:
	; CHECK-NEXT: [[NM1:%.*]] = sub i32 [[N]], 1			; AKF_CHECK-NEXT: [[NM1:%.*]] = sub i32 [[N]], 1
	; CHECK-NEXT: [[FIBM1:%.*]] = call i32 @fib(i32 [[NM1]])			; AKF_CHECK-NEXT: [[FIBM1:%.*]] = call i32 @fib(i32 [[NM1]])
	; CHECK-NEXT: [[NM2:%.*]] = sub i32 [[N]], 2			; AKF_CHECK-NEXT: [[NM2:%.*]] = sub i32 [[N]], 2
	; CHECK-NEXT: [[FIBM2:%.*]] = call i32 @fib(i32 [[NM2]])			; AKF_CHECK-NEXT: [[FIBM2:%.*]] = call i32 @fib(i32 [[NM2]])
	; CHECK-NEXT: [[RETVAL:%.*]] = add i32 [[FIBM1]], [[FIBM2]]			; AKF_CHECK-NEXT: [[RETVAL:%.*]] = add i32 [[FIBM1]], [[FIBM2]]
	; CHECK-NEXT: ret i32 [[RETVAL]]			; AKF_CHECK-NEXT: ret i32 [[RETVAL]]
	; CHECK: exit:			; AKF_CHECK: exit:
	; CHECK-NEXT: ret i32 1			; AKF_CHECK-NEXT: ret i32 1
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@fib
				; ATTRIBUTOR_CHECK-SAME: (i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[N]], 0
				; ATTRIBUTOR_CHECK-NEXT: br i1 [[CMP1]], label [[EXIT:%.]], label [[CONT1:%.]]
				; ATTRIBUTOR_CHECK: cont1:
				; ATTRIBUTOR_CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[N]], 1
				; ATTRIBUTOR_CHECK-NEXT: br i1 [[CMP2]], label [[EXIT]], label [[CONT2:%.*]]
				; ATTRIBUTOR_CHECK: cont2:
				; ATTRIBUTOR_CHECK-NEXT: [[NM1:%.*]] = sub i32 [[N]], 1
				; ATTRIBUTOR_CHECK-NEXT: [[FIBM1:%.*]] = call i32 @fib(i32 [[NM1]]) #[[ATTR3:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: [[NM2:%.*]] = sub i32 [[N]], 2
				; ATTRIBUTOR_CHECK-NEXT: [[FIBM2:%.*]] = call i32 @fib(i32 [[NM2]]) #[[ATTR3]]
				; ATTRIBUTOR_CHECK-NEXT: [[RETVAL:%.*]] = add i32 [[FIBM1]], [[FIBM2]]
				; ATTRIBUTOR_CHECK-NEXT: ret i32 [[RETVAL]]
				; ATTRIBUTOR_CHECK: exit:
				; ATTRIBUTOR_CHECK-NEXT: ret i32 1
	;			;
	%cmp1 = icmp eq i32 %n, 0			%cmp1 = icmp eq i32 %n, 0
	br i1 %cmp1, label %exit, label %cont1			br i1 %cmp1, label %exit, label %cont1

	cont1:			cont1:
	%cmp2 = icmp eq i32 %n, 1			%cmp2 = icmp eq i32 %n, 1
	br i1 %cmp2, label %exit, label %cont2			br i1 %cmp2, label %exit, label %cont2

	cont2:			cont2:
	%nm1 = sub i32 %n, 1			%nm1 = sub i32 %n, 1
	%fibm1 = call i32 @fib(i32 %nm1)			%fibm1 = call i32 @fib(i32 %nm1)
	%nm2 = sub i32 %n, 2			%nm2 = sub i32 %n, 2
	%fibm2 = call i32 @fib(i32 %nm2)			%fibm2 = call i32 @fib(i32 %nm2)
	%retval = add i32 %fibm1, %fibm2			%retval = add i32 %fibm1, %fibm2

	ret i32 %retval			ret i32 %retval

	exit:			exit:
	ret i32 1			ret i32 1
	}			}

				define internal i32 @fib_internal(i32 %n) #0 {
				; AKF_CHECK-LABEL: define {{[^@]+}}@fib_internal
				; AKF_CHECK-SAME: (i32 [[N:%.*]]) #[[ATTR0]] {
				; AKF_CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[N]], 0
				; AKF_CHECK-NEXT: br i1 [[CMP1]], label [[EXIT:%.]], label [[CONT1:%.]]
				; AKF_CHECK: cont1:
				; AKF_CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[N]], 1
				; AKF_CHECK-NEXT: br i1 [[CMP2]], label [[EXIT]], label [[CONT2:%.*]]
				; AKF_CHECK: cont2:
				; AKF_CHECK-NEXT: [[NM1:%.*]] = sub i32 [[N]], 1
				; AKF_CHECK-NEXT: [[FIBM1:%.*]] = call i32 @fib_internal(i32 [[NM1]])
				; AKF_CHECK-NEXT: [[NM2:%.*]] = sub i32 [[N]], 2
				; AKF_CHECK-NEXT: [[FIBM2:%.*]] = call i32 @fib_internal(i32 [[NM2]])
				; AKF_CHECK-NEXT: [[RETVAL:%.*]] = add i32 [[FIBM1]], [[FIBM2]]
				; AKF_CHECK-NEXT: ret i32 [[RETVAL]]
				; AKF_CHECK: exit:
				; AKF_CHECK-NEXT: ret i32 1
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@fib_internal
				; ATTRIBUTOR_CHECK-SAME: (i32 [[N:%.*]]) #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[N]], 0
				; ATTRIBUTOR_CHECK-NEXT: br i1 [[CMP1]], label [[EXIT:%.]], label [[CONT1:%.]]
				; ATTRIBUTOR_CHECK: cont1:
				; ATTRIBUTOR_CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[N]], 1
				; ATTRIBUTOR_CHECK-NEXT: br i1 [[CMP2]], label [[EXIT]], label [[CONT2:%.*]]
				; ATTRIBUTOR_CHECK: cont2:
				; ATTRIBUTOR_CHECK-NEXT: [[NM1:%.*]] = sub i32 [[N]], 1
				; ATTRIBUTOR_CHECK-NEXT: [[FIBM1:%.*]] = call i32 @fib_internal(i32 [[NM1]]) #[[ATTR4:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: [[NM2:%.*]] = sub i32 [[N]], 2
				; ATTRIBUTOR_CHECK-NEXT: [[FIBM2:%.*]] = call i32 @fib_internal(i32 [[NM2]]) #[[ATTR4]]
				; ATTRIBUTOR_CHECK-NEXT: [[RETVAL:%.*]] = add i32 [[FIBM1]], [[FIBM2]]
				; ATTRIBUTOR_CHECK-NEXT: ret i32 [[RETVAL]]
				; ATTRIBUTOR_CHECK: exit:
				; ATTRIBUTOR_CHECK-NEXT: ret i32 1
				;
				%cmp1 = icmp eq i32 %n, 0
				br i1 %cmp1, label %exit, label %cont1

				cont1:
				%cmp2 = icmp eq i32 %n, 1
				br i1 %cmp2, label %exit, label %cont2

				cont2:
				%nm1 = sub i32 %n, 1
				%fibm1 = call i32 @fib_internal(i32 %nm1)
				%nm2 = sub i32 %n, 2
				%fibm2 = call i32 @fib_internal(i32 %nm2)
				%retval = add i32 %fibm1, %fibm2

				ret i32 %retval

				exit:
				ret i32 1
				}

	define amdgpu_kernel void @kernel(i32 addrspace(1)* %m) #1 {			define amdgpu_kernel void @kernel(i32 addrspace(1)* %m) #1 {
	; CHECK-LABEL: define {{[^@]+}}@kernel			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel
	; CHECK-SAME: (i32 addrspace(1)* [[M:%.*]]) #[[ATTR1:[0-9]+]] {			; AKF_CHECK-SAME: (i32 addrspace(1)* [[M:%.*]]) #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: [[R:%.*]] = call i32 @fib(i32 5)			; AKF_CHECK-NEXT: [[R:%.*]] = call i32 @fib(i32 5)
	; CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[M]], align 4			; AKF_CHECK-NEXT: [[R2:%.*]] = call i32 @fib_internal(i32 5)
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[M]], align 4
				; AKF_CHECK-NEXT: store i32 [[R2]], i32 addrspace(1)* [[M]], align 4
				; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel
				; ATTRIBUTOR_CHECK-SAME: (i32 addrspace(1)* [[M:%.*]]) #[[ATTR2:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: [[R:%.*]] = call i32 @fib(i32 5) #[[ATTR3]]
				; ATTRIBUTOR_CHECK-NEXT: [[R2:%.*]] = call i32 @fib_internal(i32 noundef 5) #[[ATTR3]]
				; ATTRIBUTOR_CHECK-NEXT: store i32 [[R]], i32 addrspace(1)* [[M]], align 4
				; ATTRIBUTOR_CHECK-NEXT: store i32 [[R2]], i32 addrspace(1)* [[M]], align 4
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	%r = call i32 @fib(i32 5)			%r = call i32 @fib(i32 5)
				%r2 = call i32 @fib_internal(i32 5)

	store i32 %r, i32 addrspace(1)* %m			store i32 %r, i32 addrspace(1)* %m
				store i32 %r2, i32 addrspace(1)* %m
	ret void			ret void
	}			}

				; nounwind and readnone are added to match attributor results.
				attributes #0 = { nounwind readnone }
	attributes #1 = { "uniform-work-group-size"="true" }			attributes #1 = { "uniform-work-group-size"="true" }

	;.			;.
	; CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="true" }			; AKF_CHECK: attributes #[[ATTR0]] = { nounwind readnone "uniform-work-group-size"="true" }
	; CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }			; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { nounwind readnone "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { nofree nosync nounwind readnone "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { "amdgpu-calls" "uniform-work-group-size"="true" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR3]] = { nounwind readnone }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR4]] = { nofree nounwind readnone }
	;.			;.

llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-annotate-kernel-features %s \| FileCheck -allow-unused-prefixes -check-prefixes=CHECK,AKF_CHECK %s
				; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-attributor %s \| FileCheck -allow-unused-prefixes -check-prefixes=CHECK,ATTRIBUTOR_CHECK %s

				@x = global i32 0
				;.
				; CHECK: @[[X:[a-zA-Z0-9_$"\\.-]+]] = global i32 0
				;.
	define void @func1() {			define void @func1() {
	; CHECK-LABEL: define {{[^@]+}}@func1			; CHECK-LABEL: define {{[^@]+}}@func1
	; CHECK-SAME: () #[[ATTR0:[0-9]+]] {			; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: store i32 0, i32* @x, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				store i32 0, i32* @x
	ret void			ret void
	}			}

	define void @func4() {			define void @func4() {
	; CHECK-LABEL: define {{[^@]+}}@func4			; CHECK-LABEL: define {{[^@]+}}@func4
	; CHECK-SAME: () #[[ATTR0]] {			; CHECK-SAME: () #[[ATTR0]] {
				; CHECK-NEXT: store i32 0, i32* @x, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				store i32 0, i32* @x
	ret void			ret void
	}			}

	define void @func2() #0 {			define void @func2() #0 {
	; CHECK-LABEL: define {{[^@]+}}@func2			; AKF_CHECK-LABEL: define {{[^@]+}}@func2
	; CHECK-SAME: () #[[ATTR0]] {			; AKF_CHECK-SAME: () #[[ATTR0]] {
	; CHECK-NEXT: call void @func4()			; AKF_CHECK-NEXT: call void @func4()
	; CHECK-NEXT: call void @func1()			; AKF_CHECK-NEXT: call void @func1()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@func2
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR0]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func4() #[[ATTR2:[0-9]+]]
				; ATTRIBUTOR_CHECK-NEXT: call void @func1() #[[ATTR2]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func4()			call void @func4()
	call void @func1()			call void @func1()
	ret void			ret void
	}			}

	define void @func3() {			define void @func3() {
	; CHECK-LABEL: define {{[^@]+}}@func3			; AKF_CHECK-LABEL: define {{[^@]+}}@func3
	; CHECK-SAME: () #[[ATTR0]] {			; AKF_CHECK-SAME: () #[[ATTR0]] {
	; CHECK-NEXT: call void @func1()			; AKF_CHECK-NEXT: call void @func1()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@func3
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR0]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func1() #[[ATTR2]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func1()			call void @func1()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel3() #0 {			define amdgpu_kernel void @kernel3() #0 {
	; CHECK-LABEL: define {{[^@]+}}@kernel3			; AKF_CHECK-LABEL: define {{[^@]+}}@kernel3
	; CHECK-SAME: () #[[ATTR1:[0-9]+]] {			; AKF_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: call void @func2()			; AKF_CHECK-NEXT: call void @func2()
	; CHECK-NEXT: call void @func3()			; AKF_CHECK-NEXT: call void @func3()
	; CHECK-NEXT: ret void			; AKF_CHECK-NEXT: ret void
				;
				; ATTRIBUTOR_CHECK-LABEL: define {{[^@]+}}@kernel3
				; ATTRIBUTOR_CHECK-SAME: () #[[ATTR1:[0-9]+]] {
				; ATTRIBUTOR_CHECK-NEXT: call void @func2() #[[ATTR2]]
				; ATTRIBUTOR_CHECK-NEXT: call void @func3() #[[ATTR2]]
				; ATTRIBUTOR_CHECK-NEXT: ret void
	;			;
	call void @func2()			call void @func2()
	call void @func3()			call void @func3()
	ret void			ret void
	}			}

	attributes #0 = { "uniform-work-group-size"="false" }			attributes #0 = { "uniform-work-group-size"="false" }

	;.			;.
	; CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
	; CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }			; AKF_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				;.
				; ATTRIBUTOR_CHECK: attributes #[[ATTR0]] = { nounwind writeonly "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR1]] = { "amdgpu-calls" "uniform-work-group-size"="false" }
				; ATTRIBUTOR_CHECK: attributes #[[ATTR2]] = { nounwind writeonly }
	;.			;.

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Deduce attributes with the AttributorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 361408

llvm/include/llvm/Transforms/IPO/Attributor.h

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Transforms/IPO/Attributor.cpp

llvm/lib/Transforms/IPO/AttributorAttributes.cpp

llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll

llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll

llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll

llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll

llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll

llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll

llvm/test/CodeGen/AMDGPU/pal-simple-indirect-call.ll

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll

llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll

[AMDGPU] Deduce attributes with the Attributor
ClosedPublic