Download Raw Diff

Details

Reviewers

arsenm
nhaehnle
critson

Commits

rG5f6fec2404c5: AMDGPU: Fix handling of infinite loops in fragment shaders
rG87d98c149504: AMDGPU: Fix handling of infinite loops in fragment shaders
rG0994c485e613: AMDGPU: Fix handling of infinite loops in fragment shaders

Summary

Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.

While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.

This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cwabbott created this revision.Nov 27 2019, 6:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2019, 6:21 AM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 6 others. · View Herald Transcript

Harbormaster completed remote builds in B41553: Diff 231238.Nov 27 2019, 6:27 AM

Fix leftover extraneous change from an earlier version.

Harbormaster completed remote builds in B41554: Diff 231241.Nov 27 2019, 6:45 AM

dstuttard added a reviewer: critson.Dec 2 2019, 1:47 AM

I think there are two potential problems with this change (with the caveat that I have a heavy cold so I might not be thinking clearly):

The extra export is for not great for performance as it introduces an unnecessary stall at the end of the shader.
The extra export overwrites the set of active lanes set by a correct export done earlier in the shader.

This seems fine to me, although I know nothing about exports, and I would prefer if we could eventually fix the kill intrinsic design

In D70781#1764809, @critson wrote:

The extra export is for not great for performance as it introduces an unnecessary stall at the end of the shader.

I don't expect this specific case (an otherwise-infinite loop with a discard) to happen often enough with "real" shaders for performance to matter. After all, this went completely unnoticed until it showed up in a CTS test that was created by a fuzzer.

The extra export overwrites the set of active lanes set by a correct export done earlier in the shader.

I think this isn't an issue because the exec mask at the end of the program is going to be the same as the mask when the last "real" export happens. The way kill is implemented means that control flow never reconverges for a killed thread, so it stays dead until the very end. I've done some manual tests with the aforementioned CTS test, and it does seem to be properly discarding the right pixels too. But that's a good question :)

In D70781#1765237, @arsenm wrote:

This seems fine to me, although I know nothing about exports, and I would prefer if we could eventually fix the kill intrinsic design

That would be nice. I've thought about it a little, and it's tricky. Your first instinct might be to try to model the actual thread-level semantics, and have this new kill intrinsic do something like "jump here (which is an empty block with just a return) if this thread is killed but some threads are still live, otherwise jump to a block that just does a null export and then returns." So you'd have three possible exits, the normal one, the "I'm killed but some threads are still live" one, and "all threads are killed." However that's problematic for a number of reasons:

You'd have to either worry about handling cases where it doesn't jump to a block containing just a return or add some validation that this is always the case. The former case adds unnecessary complexity, the latter case makes the intrinsic still quite "magic," just in a different way.
The exec mask has to be zero when executing the null export in the all-threads-are-killed case. This sort of breaks the idea that this is "just a normal jump".
radeonsi actually relies on the exec mask not containing killed threads when returning, because the exports actually happen in an epilog which is a separate function whose code is pasted after the main shader. In other words, for threads that are killed, reconvergance actually doesn't conceptually happen until *after* the epilog. So the semantics here are still misleading, and you have to rely on the "optimization" of making it just turn off the bits in the exec mask for correctness.

I think that more doable would be an intrinsic that would do something like "jump to this block that does a null export if all threads are killed, otherwise jump to this block with the killed threads turned off in exec," and then in the frontend you'd make it jump to the next "normal" block in the case where not everything is killed. So now there are only two branch destinations. This is a little less honest, since if a thread is killed but some remain, then the thread that's killed doesn't actually continue on to the next normal block. However it has less problems.

One remaining problem, which is also a problem with the other idea, is that now you somehow have to get a mask of threads that haven't been killed or returned yet, and computing it is going to involve inserting instructions that are largely redundant with the control-flow instructions inserted by SILowerControlFlow, but there's currently no good way to get what we want directly from SILowerControlFlow. With some work you'll probably be able to write optimizations to remove these instructions in the simplest of cases, but that still leaves more complex cases with redundant instructions and no easy way to fix it. And of course, because the final null-export block has to be done with an exec mask of 0, you still can't fit it directly into the control-flow machinery without some special cases.

In D70781#1765291, @cwabbott wrote:

In D70781#1764809, @critson wrote:

The extra export is for not great for performance as it introduces an unnecessary stall at the end of the shader.

I don't expect this specific case (an otherwise-infinite loop with a discard) to happen often enough with "real" shaders for performance to matter. After all, this went completely unnoticed until it showed up in a CTS test that was created by a fuzzer.

Thanks for debugging my thoughts here. I missed that this is only triggered in the dummy return block case. I agree this is totally reasonable.

The extra export overwrites the set of active lanes set by a correct export done earlier in the shader.

I think this isn't an issue because the exec mask at the end of the program is going to be the same as the mask when the last "real" export happens. The way kill is implemented means that control flow never reconverges for a killed thread, so it stays dead until the very end. I've done some manual tests with the aforementioned CTS test, and it does seem to be properly discarding the right pixels too. But that's a good question :)

I agree in the general case that should hold; however, if there was a discard after the true export done then it wouldn't. I don't have a good use case for why such a thing would happen -- exports should in the main be the last thing a (PS) shader does. So if we consider kill after export done semantically invalid then this is fine.

In D70781#1766515, @critson wrote:

In D70781#1765291, @cwabbott wrote:

I think this isn't an issue because the exec mask at the end of the program is going to be the same as the mask when the last "real" export happens. The way kill is implemented means that control flow never reconverges for a killed thread, so it stays dead until the very end. I've done some manual tests with the aforementioned CTS test, and it does seem to be properly discarding the right pixels too. But that's a good question :)

I agree in the general case that should hold; however, if there was a discard after the true export done then it wouldn't. I don't have a good use case for why such a thing would happen -- exports should in the main be the last thing a (PS) shader does. So if we consider kill after export done semantically invalid then this is fine.

Yeah, we definitely don't ever do such a thing in Mesa, presumably not in AMDVLK/PAL either, which covers all the frontends, and I can't think of a reason for doing that either. So I don't think it's a concern. Should I add a comment to explain this?

In D70781#1766679, @cwabbott wrote:

In D70781#1766515, @critson wrote:

In D70781#1765291, @cwabbott wrote:

I think this isn't an issue because the exec mask at the end of the program is going to be the same as the mask when the last "real" export happens. The way kill is implemented means that control flow never reconverges for a killed thread, so it stays dead until the very end. I've done some manual tests with the aforementioned CTS test, and it does seem to be properly discarding the right pixels too. But that's a good question :)

I agree in the general case that should hold; however, if there was a discard after the true export done then it wouldn't. I don't have a good use case for why such a thing would happen -- exports should in the main be the last thing a (PS) shader does. So if we consider kill after export done semantically invalid then this is fine.

Yeah, we definitely don't ever do such a thing in Mesa, presumably not in AMDVLK/PAL either, which covers all the frontends, and I can't think of a reason for doing that either. So I don't think it's a concern. Should I add a comment to explain this?

A brief comment seems reasonable in case someone runs into it in future.

Update comment to explain why this works even when only some threads are killed.

Harbormaster completed remote builds in B42021: Diff 232593.Dec 6 2019, 9:44 AM

This mostly looks good, except I strongly suspect that all other export intrinsics should have their done bit set to 0 in this case.

If two exports with done bit are executed, I suspect we could enter race conditions where the export allocation is freed after the first export with done, and then another wave gets the same export memory spot and its data could be overwritten by the second, newly introduced, export with done bit. Or enter some hang condition in the hardware. Who knows.

Fix wrong indentation and missing "immarg" on intrinsic declaration
Make sure that we remove the done bit from existing exports, and test it

Harbormaster completed remote builds in B42097: Diff 232786.Dec 9 2019, 2:39 AM

In D70781#1774320, @nhaehnle wrote:

If two exports with done bit are executed, I suspect we could enter race conditions where the export allocation is freed after the first export with done, and then another wave gets the same export memory spot and its data could be overwritten by the second, newly introduced, export with done bit. Or enter some hang condition in the hardware. Who knows.

Thanks for mentioning this. I wasn't entirely sure if what I was doing is kosher, since the public documentation on exports is a little sparse, even though it worked for the test. I've implemented clearing the done bit like you've suggested... it shouldn't hurt at least.

cwabbott added a child revision: D71192: AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns.Dec 9 2019, 3:14 AM

Thanks, LGTM

This revision is now accepted and ready to land.Dec 9 2019, 11:33 PM

Can someone else please commit this? @nhaehnle? It's been almost a month, excluding Christmas, and my commit access situation still hasn't been resolved. This fix is necessary for radv to pass VK 1.2 conformance, and we'd like it to cherry-pick it to LLVM 10 before it's released.

Herald added a subscriber: kerbowa. · View Herald TranscriptJan 29 2020, 5:48 AM

Closed by commit rG0994c485e613: AMDGPU: Fix handling of infinite loops in fragment shaders (authored by cwabbott). · Explain WhyJan 29 2020, 6:35 AM

This revision was automatically updated to reflect the committed changes.

cwabbott reopened this revision.Jan 29 2020, 9:03 AM

This revision is now accepted and ready to land.Jan 29 2020, 9:03 AM

cwabbott closed this revision.Jan 29 2020, 9:08 AM

ruiling mentioned this in D105608: [NFC][AMDGPU] autogenerate kill-infinite-loop.ll checks.Jul 7 2021, 11:30 PM

ruiling mentioned this in D105610: [AMDGPU] Don't handle export done when unify exit nodes.Jul 7 2021, 11:36 PM

ruiling mentioned this in rGd9b9fdd91bb4: [AMDGPU] Don't handle export done when unify exit nodes.Jul 14 2021, 12:07 AM

Diff 241137

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

Show All 28 Lines
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	for (BasicBlock *Pred : predecessors(Top)) {
if (Visited.insert(Pred).second)		if (Visited.insert(Pred).second)
Stack.push_back(Pred);		Stack.push_back(Pred);
}		}
}		}

return true;		return true;
}		}

		static void removeDoneExport(Function &F) {
		ConstantInt *BoolFalse = ConstantInt::getFalse(F.getContext());
		for (BasicBlock &BB : F) {
		for (Instruction &I : BB) {
		if (IntrinsicInst *Intrin = llvm::dyn_cast<IntrinsicInst>(&I)) {
		if (Intrin->getIntrinsicID() == Intrinsic::amdgcn_exp) {
		Intrin->setArgOperand(6, BoolFalse); // done
		} else if (Intrin->getIntrinsicID() == Intrinsic::amdgcn_exp_compr) {
		Intrin->setArgOperand(4, BoolFalse); // done
		}
		}
		}
		}
		}

static BasicBlock *unifyReturnBlockSet(Function &F,		static BasicBlock *unifyReturnBlockSet(Function &F,
ArrayRef<BasicBlock *> ReturningBlocks,		ArrayRef<BasicBlock *> ReturningBlocks,
		bool InsertExport,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
StringRef Name) {		StringRef Name) {
// Otherwise, we need to insert a new basic block into the function, add a PHI		// Otherwise, we need to insert a new basic block into the function, add a PHI
// nodes (if the function returns values), and convert all of the return		// nodes (if the function returns values), and convert all of the return
// instructions into unconditional branches.		// instructions into unconditional branches.
BasicBlock *NewRetBlock = BasicBlock::Create(F.getContext(), Name, &F);		BasicBlock *NewRetBlock = BasicBlock::Create(F.getContext(), Name, &F);
		IRBuilder<> B(NewRetBlock);

		if (InsertExport) {
		// Ensure that there's only one "done" export in the shader by removing the
		// "done" bit set on the original final export. More than one "done" export
		// can lead to undefined behavior.
		removeDoneExport(F);

		Value *Undef = UndefValue::get(B.getFloatTy());
		B.CreateIntrinsic(Intrinsic::amdgcn_exp, { B.getFloatTy() },
		{
		B.getInt32(9), // target, SQ_EXP_NULL
		B.getInt32(0), // enabled channels
		Undef, Undef, Undef, Undef, // values
		B.getTrue(), // done
		B.getTrue(), // valid mask
		});
		}

PHINode *PN = nullptr;		PHINode *PN = nullptr;
if (F.getReturnType()->isVoidTy()) {		if (F.getReturnType()->isVoidTy()) {
ReturnInst::Create(F.getContext(), nullptr, NewRetBlock);		B.CreateRetVoid();
} else {		} else {
// If the function doesn't return void... add a PHI node to the block...		// If the function doesn't return void... add a PHI node to the block...
PN = PHINode::Create(F.getReturnType(), ReturningBlocks.size(),		PN = B.CreatePHI(F.getReturnType(), ReturningBlocks.size(),
"UnifiedRetVal");		"UnifiedRetVal");
NewRetBlock->getInstList().push_back(PN);		assert(!InsertExport);
ReturnInst::Create(F.getContext(), PN, NewRetBlock);		B.CreateRet(PN);
}		}

// Loop over all of the blocks, replacing the return instruction with an		// Loop over all of the blocks, replacing the return instruction with an
// unconditional branch.		// unconditional branch.
for (BasicBlock *BB : ReturningBlocks) {		for (BasicBlock *BB : ReturningBlocks) {
// Add an incoming element to the PHI node for every return instruction that		// Add an incoming element to the PHI node for every return instruction that
// is merging into this new block...		// is merging into this new block...
if (PN)		if (PN)
Show All 22 Lines	bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) {
// Loop over all of the blocks in a function, tracking all of the blocks that		// Loop over all of the blocks in a function, tracking all of the blocks that
// return.		// return.
SmallVector<BasicBlock *, 4> ReturningBlocks;		SmallVector<BasicBlock *, 4> ReturningBlocks;
SmallVector<BasicBlock *, 4> UnreachableBlocks;		SmallVector<BasicBlock *, 4> UnreachableBlocks;

// Dummy return block for infinite loop.		// Dummy return block for infinite loop.
BasicBlock *DummyReturnBB = nullptr;		BasicBlock *DummyReturnBB = nullptr;

		bool InsertExport = false;

for (BasicBlock *BB : PDT.getRoots()) {		for (BasicBlock *BB : PDT.getRoots()) {
if (isa<ReturnInst>(BB->getTerminator())) {		if (isa<ReturnInst>(BB->getTerminator())) {
if (!isUniformlyReached(DA, *BB))		if (!isUniformlyReached(DA, *BB))
ReturningBlocks.push_back(BB);		ReturningBlocks.push_back(BB);
} else if (isa<UnreachableInst>(BB->getTerminator())) {		} else if (isa<UnreachableInst>(BB->getTerminator())) {
if (!isUniformlyReached(DA, *BB))		if (!isUniformlyReached(DA, *BB))
UnreachableBlocks.push_back(BB);		UnreachableBlocks.push_back(BB);
} else if (BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator())) {		} else if (BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator())) {

ConstantInt *BoolTrue = ConstantInt::getTrue(F.getContext());		ConstantInt *BoolTrue = ConstantInt::getTrue(F.getContext());
if (DummyReturnBB == nullptr) {		if (DummyReturnBB == nullptr) {
DummyReturnBB = BasicBlock::Create(F.getContext(),		DummyReturnBB = BasicBlock::Create(F.getContext(),
"DummyReturnBlock", &F);		"DummyReturnBlock", &F);
Type *RetTy = F.getReturnType();		Type *RetTy = F.getReturnType();
Value *RetVal = RetTy->isVoidTy() ? nullptr : UndefValue::get(RetTy);		Value *RetVal = RetTy->isVoidTy() ? nullptr : UndefValue::get(RetTy);

		// For pixel shaders, the producer guarantees that an export is
		// executed before each return instruction. However, if there is an
		// infinite loop and we insert a return ourselves, we need to uphold
		// that guarantee by inserting a null export. This can happen e.g. in
		// an infinite loop with kill instructions, which is supposed to
		// terminate. However, we don't need to do this if there is a non-void
		// return value, since then there is an epilog afterwards which will
		// still export.
		//
		// Note: In the case where only some threads enter the infinite loop,
		// this can result in the null export happening redundantly after the
		// original exports. However, The last "real" export happens after all
		// the threads that didn't enter an infinite loop converged, which
		// means that the only extra threads to execute the null export are
		// threads that entered the infinite loop, and they only could've
		// exited through being killed which sets their exec bit to 0.
		// Therefore, unless there's an actual infinite loop, which can have
		// invalid results, or there's a kill after the last export, which we
		// assume the frontend won't do, this export will have the same exec
		// mask as the last "real" export, and therefore the valid mask will be
		// overwritten with the same value and will still be correct. Also,
		// even though this forces an extra unnecessary export wait, we assume
		// that this happens rare enough in practice to that we don't have to
		// worry about performance.
		if (F.getCallingConv() == CallingConv::AMDGPU_PS &&
		RetTy->isVoidTy()) {
		InsertExport = true;
		}

ReturnInst::Create(F.getContext(), RetVal, DummyReturnBB);		ReturnInst::Create(F.getContext(), RetVal, DummyReturnBB);
ReturningBlocks.push_back(DummyReturnBB);		ReturningBlocks.push_back(DummyReturnBB);
}		}

if (BI->isUnconditional()) {		if (BI->isUnconditional()) {
BasicBlock *LoopHeaderBB = BI->getSuccessor(0);		BasicBlock *LoopHeaderBB = BI->getSuccessor(0);
BI->eraseFromParent(); // Delete the unconditional branch.		BI->eraseFromParent(); // Delete the unconditional branch.
// Add a new conditional branch with a dummy edge to the return block.		// Add a new conditional branch with a dummy edge to the return block.
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (ReturningBlocks.empty())
return false; // No blocks return		return false; // No blocks return

if (ReturningBlocks.size() == 1)		if (ReturningBlocks.size() == 1)
return false; // Already has a single return block		return false; // Already has a single return block

const TargetTransformInfo &TTI		const TargetTransformInfo &TTI
= getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		= getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

unifyReturnBlockSet(F, ReturningBlocks, TTI, "UnifiedReturnBlock");		unifyReturnBlockSet(F, ReturningBlocks, InsertExport, TTI, "UnifiedReturnBlock");
return true;		return true;
}		}

llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll

This file was added.

				; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s
				; Although it's modeled without any control flow in order to get better code
				; out of the structurizer, @llvm.amdgcn.kill actually ends the thread that calls
				; it with "true". In case it's called in a provably infinite loop, we still
				; need to successfully exit and export something, even if we can't know where
				; to jump to in the LLVM IR. Therefore we insert a null export ourselves in
				; this case right before the s_endpgm to avoid GPU hangs, which is what this
				; tests.

				; CHECK-LABEL: return_void
				; Make sure that we remove the done bit from the original export
				; CHECK: exp mrt0 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} vm
				; CHECK: exp null off, off, off, off done vm
				; CHECK-NEXT: s_endpgm
				define amdgpu_ps void @return_void(float %0) #0 {
				main_body:
				%cmp = fcmp olt float %0, 1.000000e+01
				br i1 %cmp, label %end, label %loop

				loop:
				call void @llvm.amdgcn.kill(i1 false) #3
				br label %loop

				end:
				call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float 0., float 0., float 0., float 1., i1 true, i1 true) #3
				ret void
				}

				; Check that we also remove the done bit from compressed exports correctly.
				; CHECK-LABEL: return_void_compr
				; CHECK: exp mrt0 v{{[0-9]+}}, off, v{{[0-9]+}}, off compr vm
				; CHECK: exp null off, off, off, off done vm
				; CHECK-NEXT: s_endpgm
				define amdgpu_ps void @return_void_compr(float %0) #0 {
				main_body:
				%cmp = fcmp olt float %0, 1.000000e+01
				br i1 %cmp, label %end, label %loop

				loop:
				call void @llvm.amdgcn.kill(i1 false) #3
				br label %loop

				end:
				call void @llvm.amdgcn.exp.compr.v2i16(i32 0, i32 5, <2 x i16> < i16 0, i16 0 >, <2 x i16> < i16 0, i16 0 >, i1 true, i1 true) #3
				ret void
				}

				; In case there's an epilog, we shouldn't have to do this.
				; CHECK-LABEL: return_nonvoid
				; CHECK-NOT: exp null off, off, off, off done vm
				define amdgpu_ps float @return_nonvoid(float %0) #0 {
				main_body:
				%cmp = fcmp olt float %0, 1.000000e+01
				br i1 %cmp, label %end, label %loop

				loop:
				call void @llvm.amdgcn.kill(i1 false) #3
				br label %loop

				end:
				ret float 0.
				}

				declare void @llvm.amdgcn.kill(i1) #0
				declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) #0
				declare void @llvm.amdgcn.exp.compr.v2i16(i32 immarg, i32 immarg, <2 x i16>, <2 x i16>, i1 immarg, i1 immarg) #0

				attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix handling of infinite loops in fragment shaders
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 241137

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix handling of infinite loops in fragment shadersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 241137

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll

AMDGPU: Fix handling of infinite loops in fragment shaders
ClosedPublic