This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
46/46
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
vgpr-spill-scc-clobber.mir

Differential D136169

[AMDGPU] Avoid SCC clobbering before S_CSELECT_B32
ClosedPublic

Authored by alex-t on Oct 18 2022, 6:40 AM.

Download Raw Diff

Details

Reviewers

rampitec
foad
arsenm

Commits

rG48ab3e75279a: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32

Summary

Frame lowering inserts scalar addition to compute the offset to the
stack objects. This instructions inserted in arbitrary place and may clobber
SCC between its definition and S_CSELECT_B32 instruction. This change
workarounds this particular code pattern. It queries the scavenger for SGPR and
if available saves SCC to it and restore its value after frame lowering code
insertion.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

alex-t created this revision.Oct 18 2022, 6:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 6:40 AM

Herald added subscribers: kosarev, kerbowa, arphaman and 7 others. · View Herald Transcript

alex-t requested review of this revision.Oct 18 2022, 6:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 6:40 AM

Herald added a subscriber: wdng. · View Herald Transcript

It queries the scavenger for SGPR and
if available saves SCC to it and restore its value after frame lowering code
insertion.

What if there is no free SGPR? I don't think it's acceptable to generate broken code in that case.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2245	Need braces and indentation for the body of this "if".

foad added inline comments.Oct 18 2022, 7:36 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
846	Weird indentation here. Can you run `git clang-format` on this change?
2246	Don't need the `!= AMDGPU::NoRegister` because Register converts to bool. Need braces around the multi-line BuildMI call.

Harbormaster completed remote builds in B192733: Diff 468522.Oct 18 2022, 7:47 AM

In D136169#3865217, @foad wrote:

It queries the scavenger for SGPR and
if available saves SCC to it and restore its value after frame lowering code
insertion.

What if there is no free SGPR? I don't think it's acceptable to generate broken code in that case.

I expected this question and deliberately leave this for discussion. I think that the proper behavior is to report error and bail out.
No alternatives look good to me.
Attempt to use V_ADD_* is unreliable as it assumes to scavenge register again which is also not guaranteed.
Moving the S_ADD_I32 insertion point may be unsolvable due to the data dependence problem

arsenm added inline comments.Oct 18 2022, 10:11 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
843–846	This isn't harmful, but I also doubt helps you. This is only used by LocalStackSlotAllocation and runs before dead flags should automatically be computed
2246	So this just remains broken if the scavenge failed?
2247	Shouldn't special case s_cselect*
2375	These dead setting changes should be done separately

Implementing error reporting if no SGPR scavenged. Formatting.

alex-t marked 5 inline comments as done.Oct 18 2022, 10:59 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2247	I see it looks like a dirty hack. And it is really. Otherwise, we will have a huge amount of SCC save/restore instructions along the code. Most of them will be unnecessary. In fact, only s_cselect and carry out instructions really use SCC but most of the SALU read it. BTW I am curious why did not we hit any errors related to SCC clobbering before s_cselect patch?

alex-t marked an inline comment as done.Oct 18 2022, 11:02 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2246	For now, I would opt for reporting fatal here. We could try to use v_add for those targets which allow v_add with the SGPR and constant if the frame index user accepts VGPR. This will give a chance in some cases but not in general.

alex-t added inline comments.Oct 18 2022, 11:37 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2247	Oops. Sorry I was wrong. It is really no need to special case s_cselect here. I checked the TD files and we only have SCC uses where it is really necessary.

Harbormaster completed remote builds in B192796: Diff 468614.Oct 18 2022, 12:12 PM

SCC save/restore for all SCC users. Code which sets dead flag in SCC operands removed as it should be in separate change.

Herald added a subscriber: qcolombet. · View Herald TranscriptOct 18 2022, 4:13 PM

alex-t marked 2 inline comments as done.Oct 18 2022, 4:15 PM

Harbormaster completed remote builds in B192873: Diff 468734.Oct 18 2022, 4:54 PM

foad added inline comments.Oct 19 2022, 2:12 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2231–2232	This still makes me nervous. For graphics we use LLVM as a JIT compiler, and we definitely have cases where SGPRs are spilled (so presumably there are no free SGPRs). What are we supposed to do if this scavenge fails?

alex-t added inline comments.Oct 19 2022, 5:25 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2231–2232	What we were supposed to do if failed the scavenge at line 2198 and line 2243? In fact, I have just followed the error handling strategy implemented for the same condition in this file earlier. I assumed that if it worked so far it should be okay further. In general, no strategy exists that ensures we never run out of registers. We can try to use vector addition with an SGPR and constant but it is possible for a few targets only. Otherwise, we'd need to scavenge VGPR but we already failed to scavenge SGPR which means that we were unable to spill one and probably has no VGPRs available. The trick with using FrameRegister as a temporary room is already used by frame lowering itself, also SCC copy should be alive from the point before the S_ADD insertion and last to the point right after it. Moving the insertion point before the SCC definition is restricted by the result register placement as follows: (1) def R (2) def SCC (3) use R <==== here we're about to insert R = S_ADD_I32 FR, Off (4) use SCC We cannot move the insertion point over another use of R. In the general case, this is unsolvable. My point is that: "Failed to scavenge" recovery strategy is a complex problem that should be addressed separately. There is no general solution as one always can invent the input on which the compiler runs out of registers. Please correct me if I am wrong.

alex-t marked an inline comment as done.Oct 19 2022, 6:21 AM

alex-t added inline comments.Oct 19 2022, 6:26 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2229	If we've failed to scavenge SGPR for SCC save/restore we never get here. But if we've got here we'd either fail to scavenge SGPR for the frame offset computation.

A bit more compact code.

foad added inline comments.Oct 19 2022, 9:32 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	What is this loop for? Isn't the scavenger already supposed to know whether SCC is live?

foad added inline comments.Oct 19 2022, 9:38 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2263	None of this code should depend on wave size. You don't want getBoolRC here - that's for VCC which might be a register pair in wave64. You can always save SCC in a single 32-bit SGPR.

alex-t added inline comments.Oct 19 2022, 9:43 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	What is this loop for? Isn't the scavenger already supposed to know whether SCC is live? The scavenger's knowledge depends on the accurate dead/kill flags insertion. This loop decreases the number of false-positive "live" SCC and accordingly unnecessary SCC saves/restores. As Matt rightly pointed out, the dead/kill correct placement is out of this change scope. It also is going to be a complex task. Should I add a corresponding TODO here?

alex-t marked 2 inline comments as done.Oct 19 2022, 9:52 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2263	Since back SGPR to SCC copy depends on EXEC/EXEC_LO and I am using the same register for saving and restoring I need to use 64bit unless isWave32. I am not sure if I can use S_AND_B32 reg, exec_lo in wave64 mode?

Harbormaster completed remote builds in B193025: Diff 468942.Oct 19 2022, 10:16 AM

foad added inline comments.Oct 19 2022, 11:43 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	Haven't dead/kill flags been automatically computed before this code runs? If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block?
2263	You can just use "s_cmp_lg_u32 sgpr, 0" to copy back to SCC.

alex-t marked an inline comment as done.Oct 19 2022, 1:10 PM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	Let me illustrate what I mean. For the code snipped below (1) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc (2) $vgpr51 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr43, 0, implicit $exec (3) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc (4) $vgpr52 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr44, 0, implicit $exec scavenger considers SCC "live" between (1) and (3) because its internal iterator position points to (2). Calling the "advance" method inside the target hook breaks the outer PEI logic. Literally, PEI does not expect the scavenger state to be changed inside the TargetRegisterInfo::LowerFrameIndex method. In this particular case, I can add the dead flag to the particular place: auto Add = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg) .addReg(FrameReg) .addImm(Offset); Add->getOperand(3).setIsDead(); Unfortunately, we have plenty of places like that. And for cases that are not as trivial, we are going to have plenty of odd code.
2216–2224	If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? Good point, thanks.
2263	Given that this this is pure scalar context yes

alex-t added inline comments.Oct 19 2022, 2:13 PM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? Good point, thanks. No, this does not work really. Given that we have a BB with just one SCC def, following the idea we would insert Some SCC def... $sgpr38 = S_CSELECT_B32 -1, 0, implicit $scc $sgpr8 = S_ADD_I32 $sgpr32, 2960, implicit-def $scc S_CMP_LG_U32 $sgpr38, 0, implicit-def $scc on each frame lowering S_ADD_I32 since each time we have S_CMP_LG_U32 defining SCC and no other defs to the end of the BB. So, we will scavenge each time until we are out of registers.

alex-t added inline comments.Oct 19 2022, 2:21 PM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? In fact, the loop aims to restrict a big amount of false positive SCC "live" slots by simulating the register scavenger "forward" job. But if SCC live-in in some BB it is still in scavenger "usedregs".

SCC copy restore has been made wave size independent.

alex-t marked an inline comment as done.Oct 19 2022, 2:25 PM

Harbormaster completed remote builds in B193096: Diff 469045.Oct 19 2022, 3:12 PM

foad added inline comments.Oct 20 2022, 2:35 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	The loop still sounds like a hack to me that will have O(n^2) cost in the worst case. But I really don't know enough about PEI or the RegisterScavenger to review this properly.

alex-t marked an inline comment as done.Oct 20 2022, 3:16 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	The loop still sounds like a hack to me that will have O(n^2) cost in the worst case. But I really don't know enough about PEI or the RegisterScavenger to review this properly. I don't like this approach either and would like to look for a neat solution. This is proposed as a workaround to stop Vulkan RT from failing. Could you also clarify, why you consider it O(n^2)? The worst case is when we have to walk to the end of the BB. This is just on level loop and each instruction is visited once.

foad added inline comments.Oct 20 2022, 6:20 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	In a large BB with n instructions that need frame index elimination, you will walk to end of the BB (worst case) for each one of them. So that is O(n^2) overall cost to run this pass.

alex-t marked an inline comment as done.Oct 20 2022, 9:49 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	Ok, you meant the whole pass, not the loop itself. I got it now.

alex-t marked an inline comment as done.Oct 20 2022, 10:17 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2216–2224	PEI main loop process instructions one by one inserting the frame index lowering code before instruction if it has the frame index operand. Register scavenger steps forward AFTER current instruction has been processed. That means we have the register's liveness information at the point right preceding the instruction being processed. In case the preceding instruction uses SCC, the scavenger considers it "live" even though the current instruction kills it. Without the loop, we'd have a lot of unnecessary SCC save/restore and run out of registers soon. Consider the following example: renamable $sgpr50 = S_CSELECT_B32 2924, killed renamable $sgpr50, implicit $scc <-- RS is here and RS->isRegUsed(AMDGPU::SCC) is true renamable $sgpr40 = S_ADD_I32 killed renamable $sgpr40, %stack.0.__llpc_global_proxy_, implicit-def dead $scc We do not need to save SCC here, but we cannot know that SCC will be dead at the next instruction without scanning forward. Typically we break from the loop at the first def or use of SCC. In the worst case if we have the instruction which uses frame index at the beginning of the large BB and this instruction also uses SCC and no def/use of SCC appears before the end of the BB Yes, in this case, we scan all the BB in the loop and the whole pass has O(n^2).

In fact, to avoid this unwanted loop I can use another hack - just forward scavenger one position to look up the SCC liveness at the insertion slot and then immediately backward scavenger to restore its position and avoid breaking the PEI logic.
This looks weird but avoids the loop.

Also, this approach still allows some amount of false positives and, hence, some unnecessary SCC saves/restores. Given that we may already have high SGPR pressure, each false positive spends yet one more SGPR and increases the probability to run out of registers.

Why this approach cannot be improved?
We cannot make a reasonable decision regarding SCC save/restore without scanning down to the nearest use or definition. Register scavenger retrieves liveness information going forward by each instruction at a time. So, at each point, it only knows that the register was defined somewhere before and has not been yet killed by another definition. It may become dead at the next instruction or keep living an arbitrarily long distance.
To make a reasonable decision regarding SCC we need to know if SCC is "live" at the insertion point and if it is used at some point before the next definition. Otherwise, we have to conservatively save/restore SCC if it is "live" at the point where we're about to insert scalar instruction that clobbers it. In all cases where SCC is re-defined before the use, such save/restore is unnecessary but spends SGPR.

To illustrate the above:

def SCC
some others instructions
use of the FI <-- We are going to insert "S_ADD_I32 FrameReg, Offset" right before it
                             SCC is "live" here
        ************
here is an arbitrarily long sequence of instructions
that neither use nor define SCC
        ************
Here is an instruction "I"  that defines or uses SCC

if "I" defines SCC (kills the previous definition) - no SCC save/restore is necessary at the insertion point.
if "I" uses SCC - it is necessary to save it before S_ADD_I32 insertion and restore it after.

We have to scan down until either "I" is found or the basic block ends.

Am I correct to assume that frame offset is always DWORD aligned? If so we have couple spare bits:

S_ADDC_U32 $reg, offset
S_BITCMP1_B32 $reg, 0
S_BITSET0_B32 $reg, 0

For the loop which may potentially consume a lot of time I'd suggest to set a small scan threshold and assume scc is used otherwise. This relies on the possibility of restoring scc without scavenging, like in my previous comment.

Applied approach suggested by @rampitec

In D136169#3872475, @rampitec wrote:
Am I correct to assume that frame offset is always DWORD aligned? If so we have couple spare bits:
S_ADDC_U32 $reg, offset
S_BITCMP1_B32 $reg, 0
S_BITSET0_B32 $reg, 0

That's brilliant! Thanks a lot.

In D136169#3875712, @alex-t wrote:
In D136169#3872475, @rampitec wrote:
Am I correct to assume that frame offset is always DWORD aligned? If so we have couple spare bits:
S_ADDC_U32 $reg, offset
S_BITCMP1_B32 $reg, 0
S_BITSET0_B32 $reg, 0
That's brilliant! Thanks a lot.

Enjoy :)

Harbormaster completed remote builds in B193613: Diff 469731.Oct 21 2022, 2:06 PM

rampitec added inline comments.Oct 22 2022, 12:05 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2219	I think you need to skip debug instructions without counting so that debug build does not differ from release.
llvm/lib/Target/AMDGPU/SIRegisterInfo.h
96 ↗	(On Diff #469731)	100 seems to large number to me. Compiling with spills is already sufficiently and painfully slow. I would not go more than 10 here. In a worst case you are now burning just 2 instructions and probably 8 cycles. Besides I do not think it deserves a special function, just a constant threshold right inside the code. Just because it is only used once.

rampitec added inline comments.Oct 22 2022, 12:19 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2213	Save is not a right word anymore. Preserve is.

foad added inline comments.Oct 24 2022, 3:57 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2258	`-Offset` here surely? Why aren't there any codegen tests for this?
2384–2385	Remove these changes from the patch.

Corrected wrong Frame Register restore code.
Several minor changes as requested.

alex-t marked 2 inline comments as done.Oct 24 2022, 9:20 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2384–2385	This is obviously a mistake. "-Offset" cannot work here. Given that we always have Reg and Offset aligned, hence even, we should just use S_SUBB_U32 here.
llvm/lib/Target/AMDGPU/SIRegisterInfo.h
96 ↗	(On Diff #469731)	100 seems to large number to me. Compiling with spills is already sufficiently and painfully slow. I would not go more than 10 here. In a worst case you are now burning just 2 instructions and probably 8 cycles. Besides I do not think it deserves a special function, just a constant threshold right inside the code. Just because it is only used once. I would not agree. Really there may be tens of the odd SCC preserving code sequences in 100 lines length BB if it has frequent stack access. So, this is not just 2 instructions but 2*N. I can show you the example of assembly if necessary.

Harbormaster completed remote builds in B193960: Diff 470182.Oct 24 2022, 11:42 AM

alex-t marked an inline comment as done.Oct 24 2022, 12:52 PM

alex-t marked 2 inline comments as done.Oct 24 2022, 2:26 PM

Temporary workaround to avoid SCC liveness scanning loop and using manually set
"dead" flags to decrease the amount of unnecessary SCC preserving code.

Harbormaster completed remote builds in B196506: Diff 473701.Nov 7 2022, 11:11 AM

sebastian-ne added a subscriber: sebastian-ne.Nov 17 2022, 4:21 AM

sebastian-ne added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2247	Maybe it makes sense to add an assert, that the least significant bit of `Offset` is indeed 0? (here and for s_subb below)

foad added inline comments.Nov 17 2022, 6:58 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2258	The bitcmp1/bitset0 trick only works with addc. It doesn't work with subb, because if scc is set then the result will be 1 less than you wanted, so bitset0 will not correct it.

Since the https://reviews.llvm.org/D137574 has been landed, this review is updated to use backward PEI.

added test for using and restoring frame register for offset calculation.
S_SUBB_U32 changed to S_ADDC_U32 with "-Offset"

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2247	As far as I know, the flat scratch offset is always dword-aligned. If it is not we've got the UB anyway. So, the assert does not help.
2258	Oops. Good point. BTW S_ADDC_U32 with "-Offset" may cause overflow itself. So, we override SCC again. Although offset has uint64_t, I guess that we always have a flat scratch offset value small enough to never get into this trouble. Just because we subtract the same value as we were adding in the previous step. And yes, we've got no problem with this so far as we had no tests for that case. I am going to add one.

foad added inline comments.Nov 21 2022, 10:53 PM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1637	Do you still need all these setIsDead changes? In any case can you remove them from this patch, so it just contains the essential changes? Maybe put them all in a separate patch, if there is still any need for them.

Harbormaster completed remote builds in B198864: Diff 477007.Nov 21 2022, 11:03 PM

removed unrelated changes

alex-t marked an inline comment as done.Nov 22 2022, 4:42 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1637	No. I don't need them anymore. Just my inaccuracy. Removed.

alex-t marked an inline comment as done.Nov 22 2022, 5:11 AM

There are still setIsDead changes and whitespace changes. Please try to strip them all out so we get a minimal patch to review.

Harbormaster completed remote builds in B198950: Diff 477139.Nov 22 2022, 6:10 AM

sebastian-ne added inline comments.Nov 22 2022, 6:56 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2247	Sure, an assert doesn’t help correctness. But if at some point in the future, we encounter a case where the offset is not aligned – maybe due to a bug elsewhere – having an assert makes it easy to find why some memory got corrupted. Without an assert, it will surely take hours to find out what is going on.

In D136169#3943669, @foad wrote:

There are still setIsDead changes and whitespace changes. Please try to strip them all out so we get a minimal patch to review.

Ok. I thought you wanted me to remove all the setIsDead calls since they are not needed anymore.

removed changes not relevant directly to the current revision
added assert to check if the flat scratch offset is aligned

LGTM, thanks!

In D136169#3944366, @alex-t wrote:

In D136169#3943669, @foad wrote:

There are still setIsDead changes and whitespace changes. Please try to strip them all out so we get a minimal patch to review.

Ok. I thought you wanted me to remove all the setIsDead calls since they are not needed anymore.

That would probably be fine, but it should be a separate patch.

This revision is now accepted and ready to land.Nov 22 2022, 10:00 AM

alex-t marked an inline comment as done.Nov 22 2022, 10:06 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2247	The assert which is currently added only ensures that the offset is even. This is enough to make sure we have the correct arithmetics here. Although to ensure correct alignment it should be !(Offset & flat_scratch_min_align) I am not sure if it makes sense but maybe there should be another error message?

This revision was landed with ongoing or failed builds.Nov 22 2022, 10:08 AM

Closed by commit rG48ab3e75279a: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32 (authored by Alexander Timofeev <alexander.timofeev@amd.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Alexander Timofeev <alexander.timofeev@amd.com> added a commit: rG48ab3e75279a: [AMDGPU] Avoid SCC clobbering before S_CSELECT_B32.

arsenm added inline comments.Nov 22 2022, 10:10 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2231–2232	I think we just need to start reserving an SGPR ahead of time in case we run into these scenarios, and un-reserve it after allocation if unused

arsenm added inline comments.Nov 22 2022, 10:12 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2246	Needs a comment explaining this trick

Harbormaster completed remote builds in B199012: Diff 477236.Nov 22 2022, 10:24 AM

alex-t marked 3 inline comments as done.Nov 22 2022, 10:47 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2231–2232	I guess it is for a separate change already.
2246	Okay, too late but could I just push yet another commit just adding the comment?

alex-t marked 3 inline comments as done.Nov 22 2022, 12:46 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIRegisterInfo.cpp

62 lines

test/

CodeGen/

AMDGPU/

vgpr-spill-scc-clobber.mir

12 lines

Diff 468942

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	Register FIReg = MRI.createVirtualRegister(
ST.enableFlatScratch() ? &AMDGPU::SReg_32_XM0RegClass		ST.enableFlatScratch() ? &AMDGPU::SReg_32_XM0RegClass
: &AMDGPU::VGPR_32RegClass);		: &AMDGPU::VGPR_32RegClass);

BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, Ins, DL, TII->get(MovOpc), FIReg)		BuildMI(*MBB, Ins, DL, TII->get(MovOpc), FIReg)
.addFrameIndex(FrameIdx);		.addFrameIndex(FrameIdx);

if (ST.enableFlatScratch() ) {		if (ST.enableFlatScratch()) {
BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_ADD_I32), BaseReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_ADD_I32), BaseReg)
.addReg(OffsetReg, RegState::Kill)		.addReg(OffsetReg, RegState::Kill)
.addReg(FIReg);		.addReg(FIReg);
return BaseReg;		return BaseReg;
		foadUnsubmitted Done Reply Inline Actions Weird indentation here. Can you run `git clang-format` on this change? foad: Weird indentation here. Can you run `git clang-format` on this change?
		arsenmUnsubmitted Done Reply Inline Actions This isn't harmful, but I also doubt helps you. This is only used by LocalStackSlotAllocation and runs before dead flags should automatically be computed arsenm: This isn't harmful, but I also doubt helps you. This is only used by LocalStackSlotAllocation…
}		}

TII->getAddNoCarry(*MBB, Ins, DL, BaseReg)		TII->getAddNoCarry(*MBB, Ins, DL, BaseReg)
.addReg(OffsetReg, RegState::Kill)		.addReg(OffsetReg, RegState::Kill)
.addReg(FIReg)		.addReg(FIReg)
.addImm(0); // clamp bit		.addImm(0); // clamp bit

return BaseReg;		return BaseReg;
▲ Show 20 Lines • Show All 774 Lines • ▼ Show 20 Lines	void SIRegisterInfo::buildSpillLoadStore(

if (ScratchOffsetRegDelta != 0) {		if (ScratchOffsetRegDelta != 0) {
// Subtract the offset we added to the ScratchOffset register.		// Subtract the offset we added to the ScratchOffset register.
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), SOffset)		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), SOffset)
.addReg(SOffset)		.addReg(SOffset)
.addImm(-ScratchOffsetRegDelta);		.addImm(-ScratchOffsetRegDelta);
}		}
}		}

		foadUnsubmitted Done Reply Inline Actions Do you still need all these setIsDead changes? In any case can you remove them from this patch, so it just contains the essential changes? Maybe put them all in a separate patch, if there is still any need for them. foad: Do you still need all these setIsDead changes? In any case can you remove them from this…
		alex-tAuthorUnsubmitted Done Reply Inline Actions No. I don't need them anymore. Just my inaccuracy. Removed. alex-t: No. I don't need them anymore. Just my inaccuracy. Removed.
void SIRegisterInfo::buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index,		void SIRegisterInfo::buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index,
int Offset, bool IsLoad,		int Offset, bool IsLoad,
bool IsKill) const {		bool IsKill) const {
// Load/store VGPR		// Load/store VGPR
MachineFrameInfo &FrameInfo = SB.MF.getFrameInfo();		MachineFrameInfo &FrameInfo = SB.MF.getFrameInfo();
assert(FrameInfo.getStackID(Index) != TargetStackID::SGPRSpill);		assert(FrameInfo.getStackID(Index) != TargetStackID::SGPRSpill);

Register FrameReg =		Register FrameReg =
▲ Show 20 Lines • Show All 559 Lines • ▼ Show 20 Lines	default: {
if (FrameReg)		if (FrameReg)
MIB.addReg(FrameReg);		MIB.addReg(FrameReg);
else		else
MIB.addImm(Offset);		MIB.addImm(Offset);

return;		return;
}		}

		bool NeedSaveSCC = false;
		rampitecUnsubmitted Done Reply Inline Actions Save is not a right word anymore. Preserve is. rampitec: Save is not a right word anymore. Preserve is.
		MachineBasicBlock::iterator I(MI);
		if (RS->isRegUsed(AMDGPU::SCC)) {
		while (I != MBB->end()) {
		if (*I != MI && I->readsRegister(AMDGPU::SCC)) {
		NeedSaveSCC = true;
		break;
		rampitecUnsubmitted Done Reply Inline Actions I think you need to skip debug instructions without counting so that debug build does not differ from release. rampitec: I think you need to skip debug instructions without counting so that debug build does not…
		}
		if (*I != MI && I->definesRegister(AMDGPU::SCC))
		break;
		I++;
		}
		foadUnsubmitted Done Reply Inline Actions What is this loop for? Isn't the scavenger already supposed to know whether SCC is live? foad: What is this loop for? Isn't the scavenger already supposed to know whether SCC is live?
		alex-tAuthorUnsubmitted Done Reply Inline Actions What is this loop for? Isn't the scavenger already supposed to know whether SCC is live? The scavenger's knowledge depends on the accurate dead/kill flags insertion. This loop decreases the number of false-positive "live" SCC and accordingly unnecessary SCC saves/restores. As Matt rightly pointed out, the dead/kill correct placement is out of this change scope. It also is going to be a complex task. Should I add a corresponding TODO here? alex-t: > What is this loop for? Isn't the scavenger already supposed to know whether SCC is live? The…
		foadUnsubmitted Done Reply Inline Actions Haven't dead/kill flags been automatically computed before this code runs? If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? foad: Haven't dead/kill flags been automatically computed before this code runs? If you do need this…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Let me illustrate what I mean. For the code snipped below (1) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc (2) $vgpr51 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr43, 0, implicit $exec (3) $sgpr3 = S_ADD_I32 $sgpr33, 32, implicit-def $scc (4) $vgpr52 = V_ADD_U32_e64 killed $sgpr3, killed $sgpr44, 0, implicit $exec scavenger considers SCC "live" between (1) and (3) because its internal iterator position points to (2). Calling the "advance" method inside the target hook breaks the outer PEI logic. Literally, PEI does not expect the scavenger state to be changed inside the TargetRegisterInfo::LowerFrameIndex method. In this particular case, I can add the dead flag to the particular place: auto Add = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg) .addReg(FrameReg) .addImm(Offset); Add->getOperand(3).setIsDead(); Unfortunately, we have plenty of places like that. And for cases that are not as trivial, we are going to have plenty of odd code. alex-t:* Let me illustrate what I mean. For the code snipped below ``` (1) $sgpr3 = S_ADD_I32 $sgpr33…
		alex-tAuthorUnsubmitted Done Reply Inline Actions If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? Good point, thanks. alex-t: > If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of…
		alex-tAuthorUnsubmitted Done Reply Inline Actions If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? Good point, thanks. No, this does not work really. Given that we have a BB with just one SCC def, following the idea we would insert Some SCC def... $sgpr38 = S_CSELECT_B32 -1, 0, implicit $scc $sgpr8 = S_ADD_I32 $sgpr32, 2960, implicit-def $scc S_CMP_LG_U32 $sgpr38, 0, implicit-def $scc on each frame lowering S_ADD_I32 since each time we have S_CMP_LG_U32 defining SCC and no other defs to the end of the BB. So, we will scavenge each time until we are out of registers. alex-t: > > If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of…
		alex-tAuthorUnsubmitted Done Reply Inline Actions If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of the basic block? In fact, the loop aims to restrict a big amount of false positive SCC "live" slots by simulating the register scavenger "forward" job. But if SCC live-in in some BB it is still in scavenger "usedregs". alex-t: > If you do need this loop, shouldn't it conservatively assume that SCC is live at the end of…
		foadUnsubmitted Done Reply Inline Actions The loop still sounds like a hack to me that will have O(n^2) cost in the worst case. But I really don't know enough about PEI or the RegisterScavenger to review this properly. foad: The loop still sounds like a hack to me that will have O(n^2) cost in the worst case. But I…
		alex-tAuthorUnsubmitted Done Reply Inline Actions The loop still sounds like a hack to me that will have O(n^2) cost in the worst case. But I really don't know enough about PEI or the RegisterScavenger to review this properly. I don't like this approach either and would like to look for a neat solution. This is proposed as a workaround to stop Vulkan RT from failing. Could you also clarify, why you consider it O(n^2)? The worst case is when we have to walk to the end of the BB. This is just on level loop and each instruction is visited once. alex-t: > The loop still sounds like a hack to me that will have O(n^2) cost in the worst case. But I…
		foadUnsubmitted Done Reply Inline Actions In a large BB with n instructions that need frame index elimination, you will walk to end of the BB (worst case) for each one of them. So that is O(n^2) overall cost to run this pass. foad: In a large BB with n instructions that need frame index elimination, you will walk to end of…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Ok, you meant the whole pass, not the loop itself. I got it now. alex-t: Ok, you meant the whole pass, not the loop itself. I got it now.
		alex-tAuthorUnsubmitted Done Reply Inline Actions PEI main loop process instructions one by one inserting the frame index lowering code before instruction if it has the frame index operand. Register scavenger steps forward AFTER current instruction has been processed. That means we have the register's liveness information at the point right preceding the instruction being processed. In case the preceding instruction uses SCC, the scavenger considers it "live" even though the current instruction kills it. Without the loop, we'd have a lot of unnecessary SCC save/restore and run out of registers soon. Consider the following example: renamable $sgpr50 = S_CSELECT_B32 2924, killed renamable $sgpr50, implicit $scc <-- RS is here and RS->isRegUsed(AMDGPU::SCC) is true renamable $sgpr40 = S_ADD_I32 killed renamable $sgpr40, %stack.0.__llpc_global_proxy_, implicit-def dead $scc We do not need to save SCC here, but we cannot know that SCC will be dead at the next instruction without scanning forward. Typically we break from the loop at the first def or use of SCC. In the worst case if we have the instruction which uses frame index at the beginning of the large BB and this instruction also uses SCC and no def/use of SCC appears before the end of the BB Yes, in this case, we scan all the BB in the loop and the whole pass has O(n^2). alex-t: PEI main loop process instructions one by one inserting the frame index lowering code before…
		}

Register TmpSReg =		Register TmpSReg =
UseSGPR ? TmpReg		UseSGPR ? TmpReg
: RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0,		: RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0,
		alex-tAuthorUnsubmitted Done Reply Inline Actions If we've failed to scavenge SGPR for SCC save/restore we never get here. But if we've got here we'd either fail to scavenge SGPR for the frame offset computation. alex-t: If we've failed to scavenge SGPR for SCC save/restore we never get here. But if we've got here…
!UseSGPR);		!UseSGPR);

// TODO: for flat scratch another attempt can be made with a VGPR index		// TODO: for flat scratch another attempt can be made with a VGPR index
		foadUnsubmitted Done Reply Inline Actions This still makes me nervous. For graphics we use LLVM as a JIT compiler, and we definitely have cases where SGPRs are spilled (so presumably there are no free SGPRs). What are we supposed to do if this scavenge fails? foad: This still makes me nervous. For graphics we use LLVM as a JIT compiler, and we definitely have…
		alex-tAuthorUnsubmitted Done Reply Inline Actions What we were supposed to do if failed the scavenge at line 2198 and line 2243? In fact, I have just followed the error handling strategy implemented for the same condition in this file earlier. I assumed that if it worked so far it should be okay further. In general, no strategy exists that ensures we never run out of registers. We can try to use vector addition with an SGPR and constant but it is possible for a few targets only. Otherwise, we'd need to scavenge VGPR but we already failed to scavenge SGPR which means that we were unable to spill one and probably has no VGPRs available. The trick with using FrameRegister as a temporary room is already used by frame lowering itself, also SCC copy should be alive from the point before the S_ADD insertion and last to the point right after it. Moving the insertion point before the SCC definition is restricted by the result register placement as follows: (1) def R (2) def SCC (3) use R <==== here we're about to insert R = S_ADD_I32 FR, Off (4) use SCC We cannot move the insertion point over another use of R. In the general case, this is unsolvable. My point is that: "Failed to scavenge" recovery strategy is a complex problem that should be addressed separately. There is no general solution as one always can invent the input on which the compiler runs out of registers. Please correct me if I am wrong. alex-t: What we were supposed to do if failed the scavenge at line 2198 and line 2243? In fact, I have…
		arsenmUnsubmitted Done Reply Inline Actions I think we just need to start reserving an SGPR ahead of time in case we run into these scenarios, and un-reserve it after allocation if unused arsenm: I think we just need to start reserving an SGPR ahead of time in case we run into these…
		alex-tAuthorUnsubmitted Done Reply Inline Actions I guess it is for a separate change already. alex-t: I guess it is for a separate change already.
// if no SGPRs can be scavenged.		// if no SGPRs can be scavenged.
if ((!TmpSReg && !FrameReg) \|\| (!TmpReg && !UseSGPR))		if ((!TmpSReg && !FrameReg) \|\| (!TmpReg && !UseSGPR))
report_fatal_error("Cannot scavenge register in FI elimination!");		report_fatal_error("Cannot scavenge register in FI elimination!");

if (!TmpSReg) {		if (!TmpSReg) {
// Use frame register and restore it after.		// Use frame register and restore it after.
TmpSReg = FrameReg;		TmpSReg = FrameReg;
FIOp.setReg(FrameReg);		FIOp.setReg(FrameReg);
FIOp.setIsKill(false);		FIOp.setIsKill(false);
}		}

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg)		I = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg)
.addReg(FrameReg)		.addReg(FrameReg)
		foadUnsubmitted Done Reply Inline Actions Need braces and indentation for the body of this "if". foad: Need braces and indentation for the body of this "if".
.addImm(Offset);		.addImm(Offset);
		foadUnsubmitted Done Reply Inline Actions Don't need the `!= AMDGPU::NoRegister` because Register converts to bool. Need braces around the multi-line BuildMI call. foad: Don't need the ` != AMDGPU::NoRegister` because Register converts to bool. Need braces around…
		arsenmUnsubmitted Done Reply Inline Actions So this just remains broken if the scavenge failed? arsenm: So this just remains broken if the scavenge failed?
		alex-tAuthorUnsubmitted Done Reply Inline Actions For now, I would opt for reporting fatal here. We could try to use v_add for those targets which allow v_add with the SGPR and constant if the frame index user accepts VGPR. This will give a chance in some cases but not in general. alex-t: For now, I would opt for reporting fatal here. We could try to use v_add for those targets…
		arsenmUnsubmitted Done Reply Inline Actions Needs a comment explaining this trick arsenm: Needs a comment explaining this trick
		alex-tAuthorUnsubmitted Done Reply Inline Actions Okay, too late but could I just push yet another commit just adding the comment? alex-t: Okay, too late but could I just push yet another commit just adding the comment?
		MachineBasicBlock::iterator P = I;
		arsenmUnsubmitted Done Reply Inline Actions Shouldn't special case s_cselect* arsenm: Shouldn't special case s_cselect*
		alex-tAuthorUnsubmitted Done Reply Inline Actions I see it looks like a dirty hack. And it is really. Otherwise, we will have a huge amount of SCC save/restore instructions along the code. Most of them will be unnecessary. In fact, only s_cselect and carry out instructions really use SCC but most of the SALU read it. BTW I am curious why did not we hit any errors related to SCC clobbering before s_cselect patch? alex-t: I see it looks like a dirty hack. And it is really. Otherwise, we will have a huge amount of…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Oops. Sorry I was wrong. It is really no need to special case s_cselect here. I checked the TD files and we only have SCC uses where it is really necessary. alex-t: Oops. Sorry I was wrong. It is really no need to special case s_cselect here. I checked the TD…
		sebastian-neUnsubmitted Done Reply Inline Actions Maybe it makes sense to add an assert, that the least significant bit of `Offset` is indeed 0? (here and for s_subb below) sebastian-ne: Maybe it makes sense to add an assert, that the least significant bit of `Offset` is indeed 0?
		alex-tAuthorUnsubmitted Done Reply Inline Actions As far as I know, the flat scratch offset is always dword-aligned. If it is not we've got the UB anyway. So, the assert does not help. alex-t: As far as I know, the flat scratch offset is always dword-aligned. If it is not we've got the…
		sebastian-neUnsubmitted Done Reply Inline Actions Sure, an assert doesn’t help correctness. But if at some point in the future, we encounter a case where the offset is not aligned – maybe due to a bug elsewhere – having an assert makes it easy to find why some memory got corrupted. Without an assert, it will surely take hours to find out what is going on. sebastian-ne: Sure, an assert doesn’t help correctness. But if at some point in the future, we encounter a…
		alex-tAuthorUnsubmitted Done Reply Inline Actions The assert which is currently added only ensures that the offset is even. This is enough to make sure we have the correct arithmetics here. Although to ensure correct alignment it should be !(Offset & flat_scratch_min_align) I am not sure if it makes sense but maybe there should be another error message? alex-t: The assert which is currently added only ensures that the offset is even. This is enough to…

if (!UseSGPR)		if (!UseSGPR)
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)		I = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)
.addReg(TmpSReg, RegState::Kill);		.addReg(TmpSReg, RegState::Kill);

if (TmpSReg == FrameReg) {		if (TmpSReg == FrameReg) {
// Undo frame register modification.		// Undo frame register modification.
BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_ADD_I32),		I = BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_ADD_I32),
FrameReg)		FrameReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(-Offset);		.addImm(-Offset);
		foadUnsubmitted Done Reply Inline Actions `-Offset` here surely? Why aren't there any codegen tests for this? foad: `-Offset` here surely? Why aren't there any codegen tests for this?
		foadUnsubmitted Done Reply Inline Actions The bitcmp1/bitset0 trick only works with addc. It doesn't work with subb, because if scc is set then the result will be 1 less than you wanted, so bitset0 will not correct it. foad: The bitcmp1/bitset0 trick only works with addc. It doesn't work with subb, because if scc is…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Oops. Good point. BTW S_ADDC_U32 with "-Offset" may cause overflow itself. So, we override SCC again. Although offset has uint64_t, I guess that we always have a flat scratch offset value small enough to never get into this trouble. Just because we subtract the same value as we were adding in the previous step. And yes, we've got no problem with this so far as we had no tests for that case. I am going to add one. alex-t: Oops. Good point. BTW S_ADDC_U32 with "-Offset" may cause overflow itself. So, we override SCC…
}		}

		if (NeedSaveSCC) {
		Register SCCCopy =
		RS->scavengeRegister(getBoolRC(), std::prev(P), 0, !UseSGPR);
		foadUnsubmitted Done Reply Inline Actions None of this code should depend on wave size. You don't want getBoolRC here - that's for VCC which might be a register pair in wave64. You can always save SCC in a single 32-bit SGPR. foad: None of this code should depend on wave size. You don't want getBoolRC here - that's for VCC…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Since back SGPR to SCC copy depends on EXEC/EXEC_LO and I am using the same register for saving and restoring I need to use 64bit unless isWave32. I am not sure if I can use S_AND_B32 reg, exec_lo in wave64 mode? alex-t: Since back SGPR to SCC copy depends on EXEC/EXEC_LO and I am using the same register for saving…
		foadUnsubmitted Done Reply Inline Actions You can just use "s_cmp_lg_u32 sgpr, 0" to copy back to SCC. foad: You can just use "s_cmp_lg_u32 sgpr, 0" to copy back to SCC.
		alex-tAuthorUnsubmitted Done Reply Inline Actions Given that this this is pure scalar context yes alex-t: Given that this this is pure scalar context yes
		if (!SCCCopy)
		report_fatal_error("Cannot scavenge register in FI elimination!");
		BuildMI(*MBB, P, DL,
		TII->get(ST.isWave32() ? AMDGPU::S_CSELECT_B32
		: AMDGPU::S_CSELECT_B64),
		SCCCopy)
		.addImm(-1)
		.addImm(0);
		unsigned Opcode =
		ST.isWave32() ? AMDGPU::S_AND_B32 : AMDGPU::S_AND_B64;
		Register Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
		BuildMI(*MBB, std::next(I), DL, TII->get(Opcode))
		.addReg(SCCCopy, getDefRegState(true))
		.addReg(SCCCopy)
		.addReg(Exec);
		}
return;		return;
}		}

bool IsMUBUF = TII->isMUBUF(*MI);		bool IsMUBUF = TII->isMUBUF(*MI);

if (!IsMUBUF && !MFI->isEntryFunction()) {		if (!IsMUBUF && !MFI->isEntryFunction()) {
// Convert to a swizzled stack address by scaling by the wave size.		// Convert to a swizzled stack address by scaling by the wave size.
// In an entry function/kernel the offset is already swizzled.		// In an entry function/kernel the offset is already swizzled.
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	default: {
Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg;		Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg;

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(Offset);
if (!IsSALU)		if (!IsSALU)
		arsenmUnsubmitted Done Reply Inline Actions These dead setting changes should be done separately arsenm: These dead setting changes should be done separately
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)
.addReg(ScaledReg, RegState::Kill);		.addReg(ScaledReg, RegState::Kill);
else		else
ResultReg = ScaledReg;		ResultReg = ScaledReg;

// If there were truly no free SGPRs, we need to undo everything.		// If there were truly no free SGPRs, we need to undo everything.
if (!TmpScaledReg.isValid()) {		if (!TmpScaledReg.isValid()) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(-Offset);		.addImm(-Offset);
		foadUnsubmitted Done Reply Inline Actions Remove these changes from the patch. foad: Remove these changes from the patch.
		alex-tAuthorUnsubmitted Done Reply Inline Actions This is obviously a mistake. "-Offset" cannot work here. Given that we always have Reg and Offset aligned, hence even, we should just use S_SUBB_U32 here. alex-t: This is obviously a mistake. "-Offset" cannot work here. Given that we always have Reg and…
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
}		}
}		}
}		}

// Don't introduce an extra copy if we're just materializing in a mov.		// Don't introduce an extra copy if we're just materializing in a mov.
if (IsCopy)		if (IsCopy)
MI->eraseFromParent();		MI->eraseFromParent();
else		else
▲ Show 20 Lines • Show All 721 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-spill-scc-clobber.mir

Show First 20 Lines • Show All 1,049 Lines • ▼ Show 20 Lines	body: \|
; MUBUF-NEXT: {{ $}}		; MUBUF-NEXT: {{ $}}
; MUBUF-NEXT: bb.2:		; MUBUF-NEXT: bb.2:
; MUBUF-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; MUBUF-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX9-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc		; GFX9-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc
; GFX9-FLATSCR: bb.0:		; GFX9-FLATSCR: bb.0:
; GFX9-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; GFX9-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; GFX9-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_CSELECT_B64 -1, 0, implicit $scc
; GFX9-FLATSCR-NEXT: $vcc_hi = S_ADD_I32 $sgpr32, 8200, implicit-def $scc		; GFX9-FLATSCR-NEXT: $vcc_hi = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_hi, implicit $exec		; GFX9-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_hi, implicit $exec
		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_AND_B64 $sgpr4_sgpr5, $exec, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GFX9-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX9-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc		; GFX9-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: bb.1:		; GFX9-FLATSCR-NEXT: bb.1:
; GFX9-FLATSCR-NEXT: successors: %bb.2(0x80000000)		; GFX9-FLATSCR-NEXT: successors: %bb.2(0x80000000)
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_NOP 0		; GFX9-FLATSCR-NEXT: S_NOP 0
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: bb.2:		; GFX9-FLATSCR-NEXT: bb.2:
; GFX9-FLATSCR-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX9-FLATSCR-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX10-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc		; GFX10-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc
; GFX10-FLATSCR: bb.0:		; GFX10-FLATSCR: bb.0:
; GFX10-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; GFX10-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
		; GFX10-FLATSCR-NEXT: $sgpr4 = S_CSELECT_B32 -1, 0, implicit $scc
; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc		; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
; GFX10-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec		; GFX10-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec
		; GFX10-FLATSCR-NEXT: $sgpr4 = S_AND_B32 $sgpr4, $exec_lo, implicit-def $scc
; GFX10-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GFX10-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc		; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: bb.1:		; GFX10-FLATSCR-NEXT: bb.1:
; GFX10-FLATSCR-NEXT: successors: %bb.2(0x80000000)		; GFX10-FLATSCR-NEXT: successors: %bb.2(0x80000000)
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: S_NOP 0		; GFX10-FLATSCR-NEXT: S_NOP 0
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	body: \|
; MUBUF-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; MUBUF-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX9-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc_no_vgprs_emergency_stack_slot		; GFX9-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc_no_vgprs_emergency_stack_slot
; GFX9-FLATSCR: bb.0:		; GFX9-FLATSCR: bb.0:
; GFX9-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; GFX9-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; GFX9-FLATSCR-NEXT: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255		; GFX9-FLATSCR-NEXT: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; GFX9-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)		; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)
		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_CSELECT_B64 -1, 0, implicit $scc
; GFX9-FLATSCR-NEXT: $vcc_hi = S_ADD_I32 $sgpr32, 8200, implicit-def $scc		; GFX9-FLATSCR-NEXT: $vcc_hi = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_hi, implicit $exec		; GFX9-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_hi, implicit $exec
		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_AND_B64 $sgpr4_sgpr5, $exec, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GFX9-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX9-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)		; GFX9-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX9-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc		; GFX9-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: bb.1:		; GFX9-FLATSCR-NEXT: bb.1:
; GFX9-FLATSCR-NEXT: successors: %bb.2(0x80000000)		; GFX9-FLATSCR-NEXT: successors: %bb.2(0x80000000)
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_NOP 0		; GFX9-FLATSCR-NEXT: S_NOP 0
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: bb.2:		; GFX9-FLATSCR-NEXT: bb.2:
; GFX9-FLATSCR-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX9-FLATSCR-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX10-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc_no_vgprs_emergency_stack_slot		; GFX10-FLATSCR-LABEL: name: mubuf_load_restore_clobber_scc_no_vgprs_emergency_stack_slot
; GFX10-FLATSCR: bb.0:		; GFX10-FLATSCR: bb.0:
; GFX10-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; GFX10-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; GFX10-FLATSCR-NEXT: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255		; GFX10-FLATSCR-NEXT: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; GFX10-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)		; GFX10-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)
		; GFX10-FLATSCR-NEXT: $sgpr4 = S_CSELECT_B32 -1, 0, implicit $scc
; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc		; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
; GFX10-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec		; GFX10-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec
		; GFX10-FLATSCR-NEXT: $sgpr4 = S_AND_B32 $sgpr4, $exec_lo, implicit-def $scc
; GFX10-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GFX10-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX10-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)		; GFX10-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc		; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: bb.1:		; GFX10-FLATSCR-NEXT: bb.1:
; GFX10-FLATSCR-NEXT: successors: %bb.2(0x80000000)		; GFX10-FLATSCR-NEXT: successors: %bb.2(0x80000000)
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: S_NOP 0		; GFX10-FLATSCR-NEXT: S_NOP 0
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	body: \|
; MUBUF-NEXT: liveins: $vgpr0		; MUBUF-NEXT: liveins: $vgpr0
; MUBUF-NEXT: {{ $}}		; MUBUF-NEXT: {{ $}}
; MUBUF-NEXT: S_ENDPGM 0, implicit $vgpr0		; MUBUF-NEXT: S_ENDPGM 0, implicit $vgpr0
; GFX9-FLATSCR-LABEL: name: v_mov_clobber_scc		; GFX9-FLATSCR-LABEL: name: v_mov_clobber_scc
; GFX9-FLATSCR: bb.0:		; GFX9-FLATSCR: bb.0:
; GFX9-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; GFX9-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; GFX9-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_CSELECT_B64 -1, 0, implicit $scc
; GFX9-FLATSCR-NEXT: $vcc_hi = S_ADD_I32 $sgpr32, 8200, implicit-def $scc		; GFX9-FLATSCR-NEXT: $vcc_hi = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_AND_B64 $sgpr4_sgpr5, $exec, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 killed $vcc_hi, implicit $exec		; GFX9-FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 killed $vcc_hi, implicit $exec
; GFX9-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc		; GFX9-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: bb.1:		; GFX9-FLATSCR-NEXT: bb.1:
; GFX9-FLATSCR-NEXT: successors: %bb.2(0x80000000)		; GFX9-FLATSCR-NEXT: successors: %bb.2(0x80000000)
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_NOP 0		; GFX9-FLATSCR-NEXT: S_NOP 0
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: bb.2:		; GFX9-FLATSCR-NEXT: bb.2:
; GFX9-FLATSCR-NEXT: liveins: $vgpr0		; GFX9-FLATSCR-NEXT: liveins: $vgpr0
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: S_ENDPGM 0, implicit $vgpr0		; GFX9-FLATSCR-NEXT: S_ENDPGM 0, implicit $vgpr0
; GFX10-FLATSCR-LABEL: name: v_mov_clobber_scc		; GFX10-FLATSCR-LABEL: name: v_mov_clobber_scc
; GFX10-FLATSCR: bb.0:		; GFX10-FLATSCR: bb.0:
; GFX10-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; GFX10-FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
		; GFX10-FLATSCR-NEXT: $sgpr4 = S_CSELECT_B32 -1, 0, implicit $scc
; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc		; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
		; GFX10-FLATSCR-NEXT: $sgpr4 = S_AND_B32 $sgpr4, $exec_lo, implicit-def $scc
; GFX10-FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec		; GFX10-FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec
; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc		; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: bb.1:		; GFX10-FLATSCR-NEXT: bb.1:
; GFX10-FLATSCR-NEXT: successors: %bb.2(0x80000000)		; GFX10-FLATSCR-NEXT: successors: %bb.2(0x80000000)
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
; GFX10-FLATSCR-NEXT: S_NOP 0		; GFX10-FLATSCR-NEXT: S_NOP 0
; GFX10-FLATSCR-NEXT: {{ $}}		; GFX10-FLATSCR-NEXT: {{ $}}
Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid SCC clobbering before S_CSELECT_B32ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 468942

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/vgpr-spill-scc-clobber.mir

[AMDGPU] Avoid SCC clobbering before S_CSELECT_B32
ClosedPublic