Download Raw Diff

Details

Reviewers

arsenm
dstuttard
nhaehnle
tpr
chandlerc

Summary

This fixes an issue where values are uniform inside of a loop but
the uses of that value outside the loop can be divergent.

Change-Id: I94d3d2e30cc2a6ae8d59e92cadf6f1b6cb7e708b

Diff Detail

Repository

rL LLVM

Build Status

Buildable 30885
Build 30884: arc lint + arc unit

Event Timeline

rtaylor created this revision.Apr 17 2019, 12:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2019, 12:53 PM

Herald added subscribers: llvm-commits, t-tye, tpr and 7 others. · View Herald Transcript

Harbormaster completed remote builds in B30702: Diff 195621.Apr 17 2019, 12:56 PM

This is a workaround. The structurizer / annotator must be correct without relying on another pass to hide situations they don't handle correctly

This revision now requires changes to proceed.Apr 17 2019, 12:58 PM

We have a test case such that a value that is uniform in the loop is used outside the loop where threads might have diverged.

For example:

define amdgpu_ps void @_amdgpu_ps_main(<4 x i32> inreg %desc, float %divergent, <2 x i32> %ptrish) {
bb59:

br label %.preheader

.preheader:

%tmp62 = phi i32 [ %tmp105, %bb104 ], [ 0, %bb59 ]
cmp and branch here

bb104:

%tmp105 = add nuw nsw i32 %tmp62, 1
cmp and branch here

.loopexit:

%load2 = tail call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %desc, i32 %tmp62, i32 0)

}

This calls lcssa after StructurizeCFG which inserts PHI nodes into the exit block for this type of value, allowing proper DA of the value after the loop.

In D60834#1470734, @arsenm wrote:

This is a workaround. The structurizer / annotator must be correct without relying on another pass to hide situations they don't handle correctly

We discussed going to the new DA which may track these properly. I haven't gotten to finding out if that is the case yet or not. Some of the group thought it was worthwhile to upstream this for now, I've added you to an email chain regarding this issue.

rtaylor added reviewers: dstuttard, nhaehnle, tpr.Apr 17 2019, 1:18 PM

It sort of intuitively makes sense to me that the control flow lowering would like LCSSA. However, this should not be handled by adding it directly to the pass pipeline. You can add this as a dependency, e.g. AU.addRequiredID(LCSSAID);

I would also like to see the an IR->IR testcase showing LCSSA was implicitly run

In D60834#1470763, @arsenm wrote:

It sort of intuitively makes sense to me that the control flow lowering would like LCSSA. However, this should not be handled by adding it directly to the pass pipeline. You can add this as a dependency, e.g. AU.addRequiredID(LCSSAID);

I would also like to see the an IR->IR testcase showing LCSSA was implicitly run

Actually, what really requires LCSSA? Is it DivergenceAnalysis or StructurizeCFG directly?

In D60834#1470764, @arsenm wrote:

In D60834#1470763, @arsenm wrote:

It sort of intuitively makes sense to me that the control flow lowering would like LCSSA. However, this should not be handled by adding it directly to the pass pipeline. You can add this as a dependency, e.g. AU.addRequiredID(LCSSAID);

I would also like to see the an IR->IR testcase showing LCSSA was implicitly run

Actually, what really requires LCSSA? Is it DivergenceAnalysis or StructurizeCFG directly?

How I understand the problem is that DA is not looking across blocks and therefore won't see that tmp62 is actually divergent in loop exit (though it is uniform in the loop). LCSSA provides a phi node for the loop exit block (where it is divergent) and allows DA to mark it divergent so that the s_buffer_load can be lowered to a buffer_load.

In D60834#1470916, @rtaylor wrote:

In D60834#1470764, @arsenm wrote:

In D60834#1470763, @arsenm wrote:

It sort of intuitively makes sense to me that the control flow lowering would like LCSSA. However, this should not be handled by adding it directly to the pass pipeline. You can add this as a dependency, e.g. AU.addRequiredID(LCSSAID);

I would also like to see the an IR->IR testcase showing LCSSA was implicitly run

Actually, what really requires LCSSA? Is it DivergenceAnalysis or StructurizeCFG directly?

How I understand the problem is that DA is not looking across blocks and therefore won't see that tmp62 is actually divergent in loop exit (though it is uniform in the loop). LCSSA provides a phi node for the loop exit block (where it is divergent) and allows DA to mark it divergent so that the s_buffer_load can be lowered to a buffer_load.

OK, so it seems to me like the dependency should be on DivergenceAnalysis. if you just patch it here, the same problem will occur for the handful of other passes that optionally use it

I think there are some misunderstandings here. None of the IR passes require LCSSA. The problem is in getting the divergence data into the SelectionDAG.

Specifically, you can have, in a weird mixture of IR and SelectionDAG:

loop:
  ...
  %uni = ...                             ; Value is uniform here
  ...
  br i1 %div, label %loop, label %next   ; Divergent loop exit

next:
  %0 = CopyFromReg N(corresponding to %uni)
  use %0

In this case, %0 must be labeled divergent. However, %0 does not exist at an IR level, and so the code in isSDNodeSourceOfDivergence can only query for the divergence of %uni. However, %uni itself is uniform.

One way to look at the problem is that DivergenceAnalysis::isDivergent is really "isDivergentAtDefinition", and what we need is a query "isDivergentAtUse". Implementing that query isn't entirely trivial, and LCSSA is effectively an alternative way of making the right query.

In D60834#1471277, @nhaehnle wrote:
I think there are some misunderstandings here. None of the IR passes require LCSSA. The problem is in getting the divergence data into the SelectionDAG.

Specifically, you can have, in a weird mixture of IR and SelectionDAG:
loop:
  ...
  %uni = ...                             ; Value is uniform here
  ...
  br i1 %div, label %loop, label %next   ; Divergent loop exit

next:
  %0 = CopyFromReg N(corresponding to %uni)
  use %0
In this case, %0 must be labeled divergent. However, %0 does not exist at an IR level, and so the code in isSDNodeSourceOfDivergence can only query for the divergence of %uni. However, %uni itself is uniform.

One way to look at the problem is that DivergenceAnalysis::isDivergent is really "isDivergentAtDefinition", and what we need is a query "isDivergentAtUse". Implementing that query isn't entirely trivial, and LCSSA is effectively an alternative way of making the right query.

So really amdgpu-isel is dependent on LCSSA

I am hitting this assert in LuxMark with this patch:
Assertion failed: (IncomingDef->isPHI()), function lowerPhis, file ../lib/Target/AMDGPU/SILowerI1Copies.cpp, line 534.
Stack dump:
0. Program arguments: /Users/matt/src/llvm/build_debug/bin/clang -cc1 -triple amdgcn-amd-amdhsa -emit-obj -disable-free -main-file-name t_9348_21.bc -mrelocation-model pic -pic-level 1 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fvisibility hidden -fapply-global-visibility-to-externs -target-cpu gfx900 -target-feature -wavefrontsize16 -target-feature -wavefrontsize32 -target-feature +wavefrontsize64 -target-feature -sram-ecc -target-feature -code-object-v3 -target-feature +cumode -dwarf-column-info -debugger-tuning=gdb -resource-dir /home/marsenau/builds/opencl_amdgpu_scratch/bin/lib/clang/8.0 -O3 -fdebug-compilation-dir /home/marsenau/src/LuxMark-3.1 -ferror-limit 19 -fmessage-length 201 -cl-kernel-arg-info -fobjc-runtime=gcc -fdiagnostics-show-option -vectorize-loops -vectorize-slp -mllvm -amdgpu-internalize-symbols -mllvm -amdgpu-early-inline-all -o /tmp/t_9348_21-9f1436.o -x ir AMD_9348_7/t_9348_21.bc -faddrsig

Code generation
Running pass 'CallGraph Pass Manager' on module 'AMD_9348_7/t_9348_21.bc'.
Running pass 'SI Lower i1 Copies' on function '@scheduler'

0 clang 0x0000000109a1f8bc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1 clang 0x0000000109a1fe79 PrintStackTraceSignalHandler(void*) + 25
2 clang 0x0000000109a1db36 llvm::sys::RunSignalHandlers() + 118
3 clang 0x0000000109a23a42 SignalHandler(int) + 210
4 libsystem_platform.dylib 0x00007fff7ddd1b5d _sigtramp + 29
5 libsystem_platform.dylib 0x000000012a983938 _sigtramp + 2897944056
6 libsystem_c.dylib 0x00007fff7dc916a6 abort + 127
7 libsystem_c.dylib 0x00007fff7dc5a20d basename_r + 0
8 clang 0x0000000106af4af5 (anonymous namespace)::SILowerI1Copies::lowerPhis() + 1141
9 clang 0x0000000106af40ba (anonymous namespace)::SILowerI1Copies::runOnMachineFunction(llvm::MachineFunction&) + 186
10 clang 0x000000010860f8de llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 542
11 clang 0x0000000108bfab35 llvm::FPPassManager::runOnFunction(llvm::Function&) + 613
12 clang 0x0000000107f5c8ad (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass*, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&) + 925
13 clang 0x0000000107f596ed (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) + 541
14 clang 0x0000000107f58ec1 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) + 433
15 clang 0x0000000108bfb8d5 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 789

I'm working on reducing it

Moved LCSSA call to after the sinking pass.

Harbormaster completed remote builds in B30885: Diff 196238.Apr 23 2019, 6:44 AM

This should really be expressed as a pass dependency, not explicitly adding the pass to the pipeline

This revision now requires changes to proceed.Apr 23 2019, 7:03 AM

Also needs a comment explaining why LCSSA is needed

In D60834#1475522, @arsenm wrote:

Also needs a comment explaining why LCSSA is needed

So are you looking for a dependency and a comment or just a comment?

I was misunderstanding the dependency issues, this should fix it.

Harbormaster completed remote builds in B31289: Diff 197792.May 2 2019, 8:16 AM

Scratch that last commit, this is still broken.

arsenm added inline comments.May 2 2019, 8:18 AM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
815	You should remove the explicit pass since it's a dependency now

Preserving StackProtector in LCSSA to avoid pass scheduling conflict
Removed explicit call of LCSSA in AMDGPUTargetMachine

Harbormaster completed remote builds in B31295: Diff 197814.May 2 2019, 10:01 AM

rtaylor added a reviewer: chandlerc.May 2 2019, 10:05 AM

arsenm added inline comments.May 2 2019, 10:38 AM

lib/Transforms/Utils/LCSSA.cpp
450 ↗	(On Diff #197814)	If you use the ID form you should be able to avoid the include

rtaylor marked an inline comment as done.May 2 2019, 11:24 AM

rtaylor added inline comments.

lib/Transforms/Utils/LCSSA.cpp
450 ↗	(On Diff #197814)	I don't think there is one for StackProtector.

arsenm added inline comments.May 2 2019, 12:34 PM

lib/Transforms/Utils/LCSSA.cpp
450 ↗	(On Diff #197814)	You can add that

rtaylor marked an inline comment as done.May 2 2019, 3:36 PM

rtaylor added inline comments.

lib/Transforms/Utils/LCSSA.cpp
450 ↗	(On Diff #197814)	I could but depending on where it goes it might still need a header, if it's not already included.

Added StackProtectorID and changed LCSSA to use it instead of StackProtector directly.

Harbormaster completed remote builds in B31457: Diff 198293.May 6 2019, 9:11 AM

arsenm added inline comments.May 6 2019, 1:42 PM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
92 ↗	(On Diff #198293)	A little more about why it doesn’t get this information would be good

More detailed explanation of why LCSSA is needed

Harbormaster completed remote builds in B31535: Diff 198471.May 7 2019, 7:55 AM

Ping - see D62614.

nhaehnle mentioned this in D62614: Fix for the OCL/LC to failure on some OCLPerf tests.May 31 2019, 12:30 AM

Assuming any linker errors with this are fixed, I think this is ready to go in.

Okay, so the linker errors haven't been resolved. If issues related to this problem are burning people urgently, I think a reasonable quick fix would be to add LCSSA not as a dependency but as a codegen pass. It'd be great if we could avoid that, but as hacks go, that still seems cleaner to me than some of the alternatives.

nhaehnle mentioned this in D62802: [RFC][AMDGPU] Uniform values being used outside loop marked non-divergent.Jun 3 2019, 1:24 AM

See D62802 and http://lists.llvm.org/pipermail/llvm-dev/2019-June/132751.html for the linker dependency issue.

lebedev.ri mentioned this in D63489: [InstSimplify] LCSSA PHIs should not be simplified away.Jun 19 2019, 9:07 AM

This has been copied over and extended on in https://reviews.llvm.org/D62802 so I'm abandoning this revision.

Diff 196238

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 806 Lines • ▼ Show 20 Lines	bool GCNPassConfig::addPreISel() {

// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit		// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit
// regions formed by them.		// regions formed by them.
addPass(&AMDGPUUnifyDivergentExitNodesID);		addPass(&AMDGPUUnifyDivergentExitNodesID);
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
addPass(createStructurizeCFGPass(true)); // true -> SkipUniformRegions		addPass(createStructurizeCFGPass(true)); // true -> SkipUniformRegions
}		}
addPass(createSinkingPass());		addPass(createSinkingPass());
		addPass(createLCSSAPass());
		arsenmUnsubmitted Not Done Reply Inline Actions You should remove the explicit pass since it's a dependency now arsenm: You should remove the explicit pass since it's a dependency now
addPass(createAMDGPUAnnotateUniformValues());		addPass(createAMDGPUAnnotateUniformValues());
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
addPass(createSIAnnotateControlFlowPass());		addPass(createSIAnnotateControlFlowPass());
}		}

return false;		return false;
}		}

▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

	Show All 11 Lines
	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: ; %bb.0: ; %start			; CHECK: ; %bb.0: ; %start
	; CHECK-NEXT: v_readfirstlane_b32 s0, v0			; CHECK-NEXT: v_readfirstlane_b32 s0, v0
	; CHECK-NEXT: s_mov_b32 m0, s0			; CHECK-NEXT: s_mov_b32 m0, s0
	; CHECK-NEXT: s_mov_b64 s[4:5], 0			; CHECK-NEXT: s_mov_b64 s[4:5], 0
	; CHECK-NEXT: v_interp_p1_f32_e32 v0, v1, attr0.x			; CHECK-NEXT: v_interp_p1_f32_e32 v0, v1, attr0.x
	; CHECK-NEXT: v_cmp_nlt_f32_e64 s[0:1], 0, v0			; CHECK-NEXT: v_cmp_nlt_f32_e64 s[0:1], 0, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: ; implicit-def: $sgpr2_sgpr3			; CHECK-NEXT: ; implicit-def: $sgpr8_sgpr9
	; CHECK-NEXT: ; implicit-def: $sgpr6_sgpr7			; CHECK-NEXT: ; implicit-def: $sgpr6_sgpr7
				; CHECK-NEXT: ; implicit-def: $sgpr2_sgpr3
	; CHECK-NEXT: BB0_1: ; %loop			; CHECK-NEXT: BB0_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: v_cmp_gt_u32_e32 vcc, 32, v1			; CHECK-NEXT: v_cmp_gt_u32_e32 vcc, 32, v1
	; CHECK-NEXT: s_and_b64 vcc, exec, vcc			; CHECK-NEXT: s_and_b64 vcc, exec, vcc
	; CHECK-NEXT: s_or_b64 s[6:7], s[6:7], exec			; CHECK-NEXT: s_or_b64 s[6:7], s[6:7], exec
	; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_or_b64 s[8:9], s[8:9], exec
	; CHECK-NEXT: s_cbranch_vccz BB0_5			; CHECK-NEXT: s_cbranch_vccz BB0_5
	; CHECK-NEXT: ; %bb.2: ; %endif1			; CHECK-NEXT: ; %bb.2: ; %endif1
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_mov_b64 s[6:7], -1			; CHECK-NEXT: s_mov_b64 s[6:7], -1
	; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_xor_b64 s[8:9], exec, s[8:9]			; CHECK-NEXT: s_xor_b64 s[8:9], exec, s[8:9]
	; CHECK-NEXT: ; mask branch BB0_4			; CHECK-NEXT: ; mask branch BB0_4
	; CHECK-NEXT: BB0_3: ; %endif2			; CHECK-NEXT: BB0_3: ; %endif2
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: v_add_u32_e32 v1, 1, v1			; CHECK-NEXT: v_add_u32_e32 v1, 1, v1
	; CHECK-NEXT: s_xor_b64 s[6:7], exec, -1			; CHECK-NEXT: s_xor_b64 s[6:7], exec, -1
	; CHECK-NEXT: BB0_4: ; %Flow1			; CHECK-NEXT: BB0_4: ; %Flow1
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]
	; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_mov_b64 s[8:9], 0
	; CHECK-NEXT: s_branch BB0_6			; CHECK-NEXT: BB0_5: ; %Flow
	; CHECK-NEXT: BB0_5: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: ; implicit-def: $vgpr1
	; CHECK-NEXT: BB0_6: ; %Flow
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_and_b64 s[8:9], exec, s[6:7]			; CHECK-NEXT: s_and_b64 s[10:11], exec, s[6:7]
	; CHECK-NEXT: s_or_b64 s[8:9], s[8:9], s[4:5]			; CHECK-NEXT: s_or_b64 s[10:11], s[10:11], s[4:5]
	; CHECK-NEXT: s_mov_b64 s[4:5], s[8:9]			; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
	; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_and_b64 s[4:5], s[8:9], exec
				; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], s[4:5]
				; CHECK-NEXT: s_mov_b64 s[4:5], s[10:11]
				; CHECK-NEXT: s_andn2_b64 exec, exec, s[10:11]
	; CHECK-NEXT: s_cbranch_execnz BB0_1			; CHECK-NEXT: s_cbranch_execnz BB0_1
	; CHECK-NEXT: ; %bb.7: ; %Flow2			; CHECK-NEXT: ; %bb.6: ; %Flow2
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[10:11]
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; this is the divergent branch with the condition not marked as divergent			; this is the divergent branch with the condition not marked as divergent
	; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[2:3]			; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[2:3]
	; CHECK-NEXT: ; mask branch BB0_9			; CHECK-NEXT: ; mask branch BB0_8
	; CHECK-NEXT: BB0_8: ; %if1			; CHECK-NEXT: BB0_7: ; %if1
	; CHECK-NEXT: v_sqrt_f32_e32 v1, v0			; CHECK-NEXT: v_sqrt_f32_e32 v1, v0
	; CHECK-NEXT: BB0_9: ; %endloop			; CHECK-NEXT: BB0_8: ; %endloop
	; CHECK-NEXT: s_or_b64 exec, exec, s[0:1]			; CHECK-NEXT: s_or_b64 exec, exec, s[0:1]
	; CHECK-NEXT: exp mrt0 v1, v1, v1, v1 done vm			; CHECK-NEXT: exp mrt0 v1, v1, v1, v1 done vm
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	start:			start:
	%v0 = call float @llvm.amdgcn.interp.p1(float %1, i32 0, i32 0, i32 %0)			%v0 = call float @llvm.amdgcn.interp.p1(float %1, i32 0, i32 0, i32 %0)
	br label %loop			br label %loop

	loop:			loop:
	Show All 28 Lines

test/CodeGen/AMDGPU/multilevel-break.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GCN-DAG: s_and_b64 [[TMP_NE:s\[[0-9]+:[0-9]+\]]], [[TMP51NEG]], exec			; GCN-DAG: s_and_b64 [[TMP_NE:s\[[0-9]+:[0-9]+\]]], [[TMP51NEG]], exec
	; GCN-DAG: s_or_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], [[TMP_EQ]]			; GCN-DAG: s_or_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], [[TMP_EQ]]
	; GCN-DAG: s_or_b64 [[BREAK_INNER]], [[BREAK_INNER]], [[TMP_NE]]			; GCN-DAG: s_or_b64 [[BREAK_INNER]], [[BREAK_INNER]], [[TMP_NE]]

	; GCN: ; %Flow			; GCN: ; %Flow
	; GCN: s_or_b64 exec, exec, [[SAVE_EXEC]]			; GCN: s_or_b64 exec, exec, [[SAVE_EXEC]]
	; GCN: s_and_b64 [[TMP0:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_INNER]]			; GCN: s_and_b64 [[TMP0:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_INNER]]
	; GCN: s_or_b64 [[TMP0]], [[TMP0]], [[LEFT_INNER]]			; GCN: s_or_b64 [[TMP0]], [[TMP0]], [[LEFT_INNER]]
				; GCN: s_andn2_b64 [[BREAK_OUTER2:s\[[0-9]+:[0-9]+\]]], [[BREAK_OUTER2]], exec
				; GCN: s_and_b64 [[LEFT_INNER]], [[BREAK_OUTER]], exec
				; GCN: s_or_b64 [[BREAK_OUTER2]], [[BREAK_OUTER2]], [[LEFT_INNER]]
	; GCN: s_mov_b64 [[LEFT_INNER]], [[TMP0]]			; GCN: s_mov_b64 [[LEFT_INNER]], [[TMP0]]
	; GCN: s_andn2_b64 exec, exec, [[TMP0]]			; GCN: s_andn2_b64 exec, exec, [[TMP0]]
	; GCN: s_cbranch_execnz [[INNER_LOOP]]			; GCN: s_cbranch_execnz [[INNER_LOOP]]

	; GCN: ; %Flow2			; GCN: ; %Flow2
	; GCN: s_or_b64 exec, exec, [[TMP0]]			; GCN: s_or_b64 exec, exec, [[TMP0]]
	; GCN: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_OUTER]]			; GCN: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_OUTER2]]
	; GCN: s_or_b64 [[TMP1]], [[TMP1]], [[LEFT_OUTER]]			; GCN: s_or_b64 [[TMP1]], [[TMP1]], [[LEFT_OUTER]]
	; GCN: s_mov_b64 [[LEFT_OUTER]], [[TMP1]]			; GCN: s_mov_b64 [[LEFT_OUTER]], [[TMP1]]
	; GCN: s_andn2_b64 exec, exec, [[TMP1]]			; GCN: s_andn2_b64 exec, exec, [[TMP1]]
	; GCN: s_cbranch_execnz [[OUTER_LOOP]]			; GCN: s_cbranch_execnz [[OUTER_LOOP]]

	; GCN: ; %IF			; GCN: ; %IF
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_vs void @multi_else_break(<4 x float> %vec, i32 %ub, i32 %cont) {			define amdgpu_vs void @multi_else_break(<4 x float> %vec, i32 %ub, i32 %cont) {
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Uniform values being used outside loop marked non-divergent
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196238

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

test/CodeGen/AMDGPU/multilevel-break.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Uniform values being used outside loop marked non-divergentAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196238

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

test/CodeGen/AMDGPU/multilevel-break.ll

[AMDGPU] Uniform values being used outside loop marked non-divergent
AbandonedPublic