This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/Basic/
-
clang/
-
Basic/
-
LangOptions.h
-
lib/CodeGen/
-
CodeGen/
-
CGCall.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
-
amdgpu-attrs.cl
6
convergent.cl

Differential D38113

OpenCL: Assume functions are convergent
ClosedPublic

Authored by arsenm on Sep 20 2017, 5:32 PM.

Download Raw Diff

Details

Reviewers

yaxunl
jlebar
Anastasia

Summary

This was done for CUDA functions in r261779, and for the same
reason this also needs to be done for OpenCL. An arbitrary
function could have a barrier() call in it, which in turn
requires the calling function to be convergent.

Diff Detail

Event Timeline

arsenm created this revision.Sep 20 2017, 5:32 PM

Herald added a subscriber: wdng. · View Herald TranscriptSep 20 2017, 5:32 PM

LGTM for the changes other than the test (I don't read opencl).

Missed test update

Herald added a subscriber: nhaehnle. · View Herald TranscriptSep 20 2017, 5:37 PM

The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen. I agree to commit this as a temporary fix to guarantee correctness of generated code. But if we ask to add the convergent attribute into the spec we can avoid doing this in the compiler?

The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen.

function-attrs removes the convergent attribute from anything it can prove does not call a convergent function.

I agree this is a nonoptimal solution. A better way would be to assume that any cuda/opencl function is convergent and then figure out what isn't. This would let you generate correct cuda/opencl code in a front-end without worrying about this attribute.

One problem with this approach is, suppose you call an external function, whose body llvm cannot see. We need some way to mark this function as not-convergent, so that its callers can also be inferred to be not convergent. LLVM currently only has a "convergent" attribute. In the absence of a new "not-convergent" attribute, the only way we can tell LLVM that this external function is not convergent is to leave off the attribute. But then this means we assume all functions without the convergent attribute are not convergent, and thus we have to add the attribute everywhere, as this patch does.

OTOH if we added a not-convergent attribute, we'd have to have rules about what happens if both attributes are on a function, and everywhere that checked whether a function was convergent would become significantly more complicated. I'm not sure that's worthwhile.

In D38113#877906, @jlebar wrote:

The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen.

function-attrs removes the convergent attribute from anything it can prove does not call a convergent function.

I agree this is a nonoptimal solution. A better way would be to assume that any cuda/opencl function is convergent and then figure out what isn't. This would let you generate correct cuda/opencl code in a front-end without worrying about this attribute.

One problem with this approach is, suppose you call an external function, whose body llvm cannot see. We need some way to mark this function as not-convergent, so that its callers can also be inferred to be not convergent. LLVM currently only has a "convergent" attribute. In the absence of a new "not-convergent" attribute, the only way we can tell LLVM that this external function is not convergent is to leave off the attribute. But then this means we assume all functions without the convergent attribute are not convergent, and thus we have to add the attribute everywhere, as this patch does.

OTOH if we added a not-convergent attribute, we'd have to have rules about what happens if both attributes are on a function, and everywhere that checked whether a function was convergent would become significantly more complicated. I'm not sure that's worthwhile.

Yes, that's why if it would be responsibility of the kernel developer to specify this explicitly we could avoid this complications in the compiler. But if we add it into the language now we still need to support the correctness for the code written with the earlier standards. And also it adds the complexity to the programmer to make sure it's specified correctly. But I think it is still worth discussing with the spec committee.

The deduction of convergent is indeed tricky. So if there is any function in the CFG path which is marked as convergent ( or "non-convergent") this will have to be back propagated to the callers unless we force to explicitly specify it but it would be too error prone for the kernel writers I guess. Btw, what is the advantage of having "non-convergent" instead and why is the deduction of convergent property more complicated with it?

Anastasia added inline comments.Sep 22 2017, 4:29 AM

test/CodeGenOpenCL/convergent.cl
130	We won't have noduplicate any more?

Yes, that's why if it would be responsibility of the kernel developer to specify this explicitly we could avoid this complications in the compiler. But if we add it into the language now we still need to support the correctness for the code written with the earlier standards. And also it adds the complexity to the programmer to make sure it's specified correctly. But I think it is still worth discussing with the spec committee.

To me this seems like a small complication in the compiler to avoid an extremely easy bug for users to write. But, not my language. :)

The deduction of convergent is indeed tricky. So if there is any function in the CFG path which is marked as convergent ( or "non-convergent") this will have to be back propagated to the callers unless we force to explicitly specify it but it would be too error prone for the kernel writers I guess.

This probably isn't the right forum to discuss proposals to change the LLVM IR spec. But if you want to propose something like this, please cc me on the thread, I probably have opinions. :)

Btw, what is the advantage of having "non-convergent" instead and why is the deduction of convergent property more complicated with it?

The advantage of switching LLVM IR to non-convergent would be that front-ends wouldn't have the bug that arsenm is fixing here. "Unadorned" IR would be correct. And, in the absence of external or unanalyzable indirect calls, you'd get the same performance as we get today even if you had no annotations.

The complexity I was referring to occurs if you add the non-convergent attribute and keep the convergent attr. I don't think we want that.

But I'm not really proposing a change to the convergent attribute in LLVM IR -- it's probably better to leave it as-is, since we all understand how it works, it ain't broke.

In D38113#877874, @Anastasia wrote:

The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen. I agree to commit this as a temporary fix to guarantee correctness of generated code.

This is one of those unfortunate things we had to do for correctness in CUDA, and the situation seems the same here. When we're not doing separate compilation (which I imagine we're also generally not doing for OpenCL complication), I'm under the impression that the attribute removal is fairly effective.

But if we ask to add the convergent attribute into the spec we can avoid doing this in the compiler?

But even if you do that, would that not be in a future version of OpenCL? If so, for code complying to current standards, you'd need this behavior.

Do we need an option to disable this? In case it causes regression in some applications and users want to disable it. At least for debugging.

test/CodeGenOpenCL/convergent.cl
73	check the attribute has convergent
95	need to check the attribute is convergent
118	need to check the attribute has noduplicate
127	check the attribute has noduplicate

In D38113#878852, @hfinkel wrote:

In D38113#877874, @Anastasia wrote:

The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen. I agree to commit this as a temporary fix to guarantee correctness of generated code.

This is one of those unfortunate things we had to do for correctness in CUDA, and the situation seems the same here. When we're not doing separate compilation (which I imagine we're also generally not doing for OpenCL complication), I'm under the impression that the attribute removal is fairly effective.

I agree both communities would benefit so it feels like it might be worth the effort.

But if we ask to add the convergent attribute into the spec we can avoid doing this in the compiler?

But even if you do that, would that not be in a future version of OpenCL? If so, for code complying to current standards, you'd need this behavior.

Yes, the fix is needed anyway to provide backwards compatibility.

In D38113#878840, @jlebar wrote:

Yes, that's why if it would be responsibility of the kernel developer to specify this explicitly we could avoid this complications in the compiler. But if we add it into the language now we still need to support the correctness for the code written with the earlier standards. And also it adds the complexity to the programmer to make sure it's specified correctly. But I think it is still worth discussing with the spec committee.

To me this seems like a small complication in the compiler to avoid an extremely easy bug for users to write. But, not my language. :)

Yes, I think I would perhaps argue for inclusion of non-convergent instead since this option seems to make more sense.

The deduction of convergent is indeed tricky. So if there is any function in the CFG path which is marked as convergent ( or "non-convergent") this will have to be back propagated to the callers unless we force to explicitly specify it but it would be too error prone for the kernel writers I guess.

This probably isn't the right forum to discuss proposals to change the LLVM IR spec. But if you want to propose something like this, please cc me on the thread, I probably have opinions. :)

Will do! If we have bigger use case it would be easier to get accepted. I will check with the OpenCL community first and see if there is an agreement internally.

Btw, what is the advantage of having "non-convergent" instead and why is the deduction of convergent property more complicated with it?

The advantage of switching LLVM IR to non-convergent would be that front-ends wouldn't have the bug that arsenm is fixing here. "Unadorned" IR would be correct. And, in the absence of external or unanalyzable indirect calls, you'd get the same performance as we get today even if you had no annotations.

Yes, I see this sounds more reasonable indeed. Btw, currently LLVM can remove convergent in some cases to recover the performance loss?

The complexity I was referring to occurs if you add the non-convergent attribute and keep the convergent attr. I don't think we want that.

I think keeping both would add more confusions and hence result in even more errors/complications.

But I'm not really proposing a change to the convergent attribute in LLVM IR -- it's probably better to leave it as-is, since we all understand how it works, it ain't broke.

But at the same time since we already know what we needed redesign should be easier. Alternative option would be to add convergent during IR generation as default option and no attribute where non-convergent is set. This way at least we give programmer a way to achieve higher performance. But of course it wouldn't be ideal to be inconsistent with the IR. Currently as I can see there is a little use of convergent in the frontend since we set it for all functions anyways. The problem is that it wouldn't be possible to remove it immediately. But we can at least deprecate it for a start.

In D38113#882187, @Anastasia wrote:

In D38113#878840, @jlebar wrote:

...

Yes, I see this sounds more reasonable indeed. Btw, currently LLVM can remove convergent in some cases to recover the performance loss?

Yes. See removeConvergentAttrs in lib/Transforms/IPO/FunctionAttrs.cpp.

...

Check noduplicate

LGTM!

This revision is now accepted and ready to land.Oct 6 2017, 6:04 AM

r315094

test/CodeGenOpenCL/convergent.cl
130	noduplicate is problematic for the same reason that an unknown call could have noduplicate. We should probably just remove noduplicate entirely.

Revision Contents

Path

Size

include/

clang/

Basic/

LangOptions.h

4 lines

lib/

CodeGen/

CGCall.cpp

13 lines

test/

CodeGenOpenCL/

amdgpu-attrs.cl

50 lines

convergent.cl

72 lines

Diff 117855

include/clang/Basic/LangOptions.h

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	#include "clang/Basic/LangOptions.def"
/// \brief Is this a libc/libm function that is no longer recognized as a		/// \brief Is this a libc/libm function that is no longer recognized as a
/// builtin because a -fno-builtin-* option has been specified?		/// builtin because a -fno-builtin-* option has been specified?
bool isNoBuiltinFunc(StringRef Name) const;		bool isNoBuiltinFunc(StringRef Name) const;

/// \brief True if any ObjC types may have non-trivial lifetime qualifiers.		/// \brief True if any ObjC types may have non-trivial lifetime qualifiers.
bool allowsNonTrivialObjCLifetimeQualifiers() const {		bool allowsNonTrivialObjCLifetimeQualifiers() const {
return ObjCAutoRefCount \|\| ObjCWeak;		return ObjCAutoRefCount \|\| ObjCWeak;
}		}

		bool assumeFunctionsAreConvergent() const {
		return (CUDA && CUDAIsDevice) \|\| OpenCL;
		}
};		};

/// \brief Floating point control options		/// \brief Floating point control options
class FPOptions {		class FPOptions {
public:		public:
FPOptions() : fp_contract(LangOptions::FPC_Off) {}		FPOptions() : fp_contract(LangOptions::FPC_Off) {}

// Used for serializing.		// Used for serializing.
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,744 Lines • ▼ Show 20 Lines	if (!Recips.empty())
llvm::join(Recips.begin(), Recips.end(), ","));		llvm::join(Recips.begin(), Recips.end(), ","));

if (CodeGenOpts.StackRealignment)		if (CodeGenOpts.StackRealignment)
FuncAttrs.addAttribute("stackrealign");		FuncAttrs.addAttribute("stackrealign");
if (CodeGenOpts.Backchain)		if (CodeGenOpts.Backchain)
FuncAttrs.addAttribute("backchain");		FuncAttrs.addAttribute("backchain");
}		}

if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {		if (getLangOpts().assumeFunctionsAreConvergent()) {
// Conservatively, mark all functions and calls in CUDA as convergent		// Conservatively, mark all functions and calls in CUDA and OpenCL as
// (meaning, they may call an intrinsically convergent op, such as		// convergent (meaning, they may call an intrinsically convergent op, such
// __syncthreads(), and so can't have certain optimizations applied around		// as __syncthreads() / barrier(), and so can't have certain optimizations
// them). LLVM will remove this attribute where it safely can.		// applied around them). LLVM will remove this attribute where it safely
		// can.
FuncAttrs.addAttribute(llvm::Attribute::Convergent);		FuncAttrs.addAttribute(llvm::Attribute::Convergent);
		}

		if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
// Exceptions aren't supported in CUDA device code.		// Exceptions aren't supported in CUDA device code.
FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);		FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);

// Respect -fcuda-flush-denormals-to-zero.		// Respect -fcuda-flush-denormals-to-zero.
if (getLangOpts().CUDADeviceFlushDenormalsToZero)		if (getLangOpts().CUDADeviceFlushDenormalsToZero)
FuncAttrs.addAttribute("nvptx-f32ftz", "true");		FuncAttrs.addAttribute("nvptx-f32ftz", "true");
}		}
}		}
▲ Show 20 Lines • Show All 2,622 Lines • Show Last 20 Lines

test/CodeGenOpenCL/amdgpu-attrs.cl

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	// X86-NOT: "amdgpu-num-sgpr"			// X86-NOT: "amdgpu-num-sgpr"

	// CHECK-NOT: "amdgpu-flat-work-group-size"="0,0"			// CHECK-NOT: "amdgpu-flat-work-group-size"="0,0"
	// CHECK-NOT: "amdgpu-waves-per-eu"="0"			// CHECK-NOT: "amdgpu-waves-per-eu"="0"
	// CHECK-NOT: "amdgpu-waves-per-eu"="0,0"			// CHECK-NOT: "amdgpu-waves-per-eu"="0,0"
	// CHECK-NOT: "amdgpu-num-sgpr"="0"			// CHECK-NOT: "amdgpu-num-sgpr"="0"
	// CHECK-NOT: "amdgpu-num-vgpr"="0"			// CHECK-NOT: "amdgpu-num-vgpr"="0"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { noinline nounwind optnone "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind optnone "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { noinline nounwind optnone "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind optnone "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32"			// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-num-sgpr"="32"
	// CHECK-DAG: attributes [[NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-vgpr"="64"			// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-num-vgpr"="64"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64"			// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"

	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
	// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32_NUM_VGPR_64]] = { noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"			// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-num-sgpr"="32" "amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"

test/CodeGenOpenCL/convergent.cl

	// RUN: %clang_cc1 -triple spir-unknown-unknown -emit-llvm %s -o - \| opt -instnamer -S \| FileCheck %s			// RUN: %clang_cc1 -triple spir-unknown-unknown -emit-llvm %s -o - \| opt -instnamer -S \| FileCheck -enable-var-scope %s

				// This is initially assumed convergent, but can be deduced to not require it.

				// CHECK-LABEL: define spir_func void @non_convfun() local_unnamed_addr #0
				// CHECK: ret void
				__attribute__((noinline))
				void non_convfun(void) {
				volatile int* p;
				*p = 0;
				}

	void convfun(void) __attribute__((convergent));			void convfun(void) __attribute__((convergent));
	void non_convfun(void);
	void nodupfun(void) __attribute__((noduplicate));			void nodupfun(void) __attribute__((noduplicate));

				// External functions should be assumed convergent.
	void f(void);			void f(void);
	void g(void);			void g(void);

	// Test two if's are merged and non_convfun duplicated.			// Test two if's are merged and non_convfun duplicated.
	// The LLVM IR is equivalent to:			// The LLVM IR is equivalent to:
	// if (a) {			// if (a) {
	// f();			// f();
	// non_convfun();			// non_convfun();
	// g();			// g();
	// } else {			// } else {
	// non_convfun();			// non_convfun();
	// }			// }
	//			//
	// CHECK: define spir_func void @test_merge_if(i32 %[[a:.+]])			// CHECK-LABEL: define spir_func void @test_merge_if(i32 %a) local_unnamed_addr #1 {
	// CHECK: %[[tobool:.+]] = icmp eq i32 %[[a]], 0			// CHECK: %[[tobool:.+]] = icmp eq i32 %a, 0
	// CHECK: br i1 %[[tobool]], label %[[if_end3_critedge:.+]], label %[[if_then:.+]]			// CHECK: br i1 %[[tobool]], label %[[if_end3_critedge:.+]], label %[[if_then:.+]]

	// CHECK: [[if_then]]:			// CHECK: [[if_then]]:
	// CHECK: tail call spir_func void @f()			// CHECK: tail call spir_func void @f()
	// CHECK: tail call spir_func void @non_convfun()			// CHECK: tail call spir_func void @non_convfun()
	// CHECK: tail call spir_func void @g()			// CHECK: tail call spir_func void @g()

	// CHECK: br label %[[if_end3:.+]]			// CHECK: br label %[[if_end3:.+]]

	// CHECK: [[if_end3_critedge]]:			// CHECK: [[if_end3_critedge]]:
	// CHECK: tail call spir_func void @non_convfun()			// CHECK: tail call spir_func void @non_convfun()
	// CHECK: br label %[[if_end3]]			// CHECK: br label %[[if_end3]]

	// CHECK: [[if_end3]]:			// CHECK: [[if_end3]]:
	// CHECK-LABEL: ret void			// CHECK: ret void

	void test_merge_if(int a) {			void test_merge_if(int a) {
	if (a) {			if (a) {
	f();			f();
	}			}
	non_convfun();			non_convfun();
	if (a) {			if (a) {
	g();			g();
	}			}
	}			}

	// CHECK-DAG: declare spir_func void @f()			// CHECK-DAG: declare spir_func void @f() local_unnamed_addr #2
	// CHECK-DAG: declare spir_func void @non_convfun()			// CHECK-DAG: declare spir_func void @g() local_unnamed_addr #2
	// CHECK-DAG: declare spir_func void @g()

	// Test two if's are not merged.			// Test two if's are not merged.
	// CHECK: define spir_func void @test_no_merge_if(i32 %[[a:.+]])			// CHECK-LABEL: define spir_func void @test_no_merge_if(i32 %a) local_unnamed_addr #1
	// CHECK: %[[tobool:.+]] = icmp eq i32 %[[a]], 0			// CHECK: %[[tobool:.+]] = icmp eq i32 %a, 0
	// CHECK: br i1 %[[tobool]], label %[[if_end:.+]], label %[[if_then:.+]]			// CHECK: br i1 %[[tobool]], label %[[if_end:.+]], label %[[if_then:.+]]
	// CHECK: [[if_then]]:			// CHECK: [[if_then]]:
	// CHECK: tail call spir_func void @f()			// CHECK: tail call spir_func void @f()
	// CHECK-NOT: call spir_func void @convfun()			// CHECK-NOT: call spir_func void @convfun()
	// CHECK-NOT: call spir_func void @g()			// CHECK-NOT: call spir_func void @g()
	// CHECK: br label %[[if_end]]			// CHECK: br label %[[if_end]]
	// CHECK: [[if_end]]:			// CHECK: [[if_end]]:
	// CHECK: %[[tobool_pr:.+]] = phi i1 [ true, %[[if_then]] ], [ false, %{{.+}} ]			// CHECK: %[[tobool_pr:.+]] = phi i1 [ true, %[[if_then]] ], [ false, %{{.+}} ]
	// CHECK: tail call spir_func void @convfun() #[[attr5:.+]]			// CHECK: tail call spir_func void @convfun() #[[attr4:.+]]
				yaxunlUnsubmitted Not Done Reply Inline Actions check the attribute has convergent yaxunl: check the attribute has convergent
	// CHECK: br i1 %[[tobool_pr]], label %[[if_then2:.+]], label %[[if_end3:.+]]			// CHECK: br i1 %[[tobool_pr]], label %[[if_then2:.+]], label %[[if_end3:.+]]
	// CHECK: [[if_then2]]:			// CHECK: [[if_then2]]:
	// CHECK: tail call spir_func void @g()			// CHECK: tail call spir_func void @g()
	// CHECK: br label %[[if_end3:.+]]			// CHECK: br label %[[if_end3:.+]]
	// CHECK: [[if_end3]]:			// CHECK: [[if_end3]]:
	// CHECK-LABEL: ret void			// CHECK-LABEL: ret void

	void test_no_merge_if(int a) {			void test_no_merge_if(int a) {
	if (a) {			if (a) {
	f();			f();
	}			}
	convfun();			convfun();
	if(a) {			if(a) {
	g();			g();
	}			}
	}			}

	// CHECK: declare spir_func void @convfun(){{[^#]*}} #[[attr2:[0-9]+]]			// CHECK: declare spir_func void @convfun(){{[^#]*}} #2

	// Test loop is unrolled for convergent function.			// Test loop is unrolled for convergent function.
	// CHECK-LABEL: define spir_func void @test_unroll()			// CHECK-LABEL: define spir_func void @test_unroll() local_unnamed_addr #1
	// CHECK: tail call spir_func void @convfun() #[[attr5:[0-9]+]]			// CHECK: tail call spir_func void @convfun() #[[attr4:[0-9]+]]
				yaxunlUnsubmitted Not Done Reply Inline Actions need to check the attribute is convergent yaxunl: need to check the attribute is convergent
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK: tail call spir_func void @convfun() #[[attr5]]			// CHECK: tail call spir_func void @convfun() #[[attr4]]
	// CHECK-LABEL: ret void			// CHECK-LABEL: ret void

	void test_unroll() {			void test_unroll() {
	for (int i = 0; i < 10; i++)			for (int i = 0; i < 10; i++)
	convfun();			convfun();
	}			}

	// Test loop is not unrolled for noduplicate function.			// Test loop is not unrolled for noduplicate function.
	// CHECK-LABEL: define spir_func void @test_not_unroll()			// CHECK-LABEL: define spir_func void @test_not_unroll()
	// CHECK: br label %[[for_body:.+]]			// CHECK: br label %[[for_body:.+]]
	// CHECK: [[for_cond_cleanup:.+]]:			// CHECK: [[for_cond_cleanup:.+]]:
	// CHECK: ret void			// CHECK: ret void
	// CHECK: [[for_body]]:			// CHECK: [[for_body]]:
	// CHECK: tail call spir_func void @nodupfun() #[[attr6:[0-9]+]]			// CHECK: tail call spir_func void @nodupfun() #[[attr5:[0-9]+]]
				yaxunlUnsubmitted Not Done Reply Inline Actions need to check the attribute has noduplicate yaxunl: need to check the attribute has noduplicate
	// CHECK-NOT: call spir_func void @nodupfun()			// CHECK-NOT: call spir_func void @nodupfun()
	// CHECK: br i1 %{{.+}}, label %[[for_body]], label %[[for_cond_cleanup]]			// CHECK: br i1 %{{.+}}, label %[[for_body]], label %[[for_cond_cleanup]]

	void test_not_unroll() {			void test_not_unroll() {
	for (int i = 0; i < 10; i++)			for (int i = 0; i < 10; i++)
	nodupfun();			nodupfun();
	}			}

	// CHECK: declare spir_func void @nodupfun(){{[^#]*}} #[[attr3:[0-9]+]]			// CHECK: declare spir_func void @nodupfun(){{[^#]*}} #[[attr3:[0-9]+]]
				yaxunlUnsubmitted Not Done Reply Inline Actions check the attribute has noduplicate yaxunl: check the attribute has noduplicate

	// CHECK-DAG: attributes #[[attr2]] = { {{[^}]}}convergent{{[^}]}} }			// CHECK: attributes #0 = { noinline norecurse nounwind "
	// CHECK-DAG: attributes #[[attr3]] = { {{[^}]}}noduplicate{{[^}]}} }			// CHECK: attributes #1 = { {{[^}]}}convergent{{[^}]}} }
				AnastasiaUnsubmitted Not Done Reply Inline Actions We won't have noduplicate any more? Anastasia: We won't have noduplicate any more?
				arsenmAuthorUnsubmitted Not Done Reply Inline Actions noduplicate is problematic for the same reason that an unknown call could have noduplicate. We should probably just remove noduplicate entirely. arsenm: noduplicate is problematic for the same reason that an unknown call could have noduplicate. We…
	// CHECK-DAG: attributes #[[attr5]] = { {{[^}]}}convergent{{[^}]}} }			// CHECK: attributes #2 = { {{[^}]}}convergent{{[^}]}} }
	// CHECK-DAG: attributes #[[attr6]] = { {{[^}]}}noduplicate{{[^}]}} }			// CHECK: attributes #3 = { {{[^}]}}convergent noduplicate{{[^}]}} }
				// CHECK: attributes #4 = { {{[^}]}}convergent{{[^}]}} }
				// CHECK: attributes #5 = { {{[^}]}}convergent noduplicate{{[^}]}} }