This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPU.h
-
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
14/15
GCNPreRABranchDistance.cpp
11/11
SIFrameLowering.cpp
5/7
SIInstrInfo.cpp
4/4
SIMachineFunctionInfo.h
-
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
branch-relax-spill.ll
-
branch-relaxation.ll
-
literal-constant-like-operand-instruction-size.ll
-
llc-pipeline.ll
-
long-branch-reserve-register.ll
-
MIR/AMDGPU/
-
AMDGPU/
-
machine-function-info-after-pei.ll
2/2
machine-function-info-long-branch-reg.ll
-
machine-function-info-no-ir.mir
-
machine-function-info.ll

Differential D149775

[AMDGPU] Reserve SGPR pair when long branches are present
ClosedPublic

Authored by crobeck on May 3 2023, 11:15 AM.

Download Raw Diff

Details

Reviewers

bcahoon
msearles
vpykhtin
arsenm
b-sumner
Pierre-vh

Group Reviewers

Restricted Project

Summary

After applying recent register allocation patches the compiler generates a register scavenging error. Branch Relaxation runs after RA, but RA doesn't allocate enough registers to plan for the case where an indirect branch ends up being needed during branch relaxation. This causes a “did not find scavenging index” assert to be hit from assignRegToScavengingIndex from within RegScavenger.

In this patch we estimate before RA whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. This is difficult as you often don't have an accurate idea of the code size and branch distance before register allocation and when you would need to reserve the registers. We therefore make the distance calculation a reduced complexity approximation and add a tuning factor on the threshold through the -amdgpu-long-branch-factor command line argument.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

crobeck created this revision.May 3 2023, 11:15 AM

crobeck created this object with visibility "Restricted Project (Project)".

crobeck created this object with edit policy "Restricted Project (Project)".

Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 11:15 AM

Herald added subscribers: kosarev, foad, kerbowa and 4 others. · View Herald Transcript

crobeck requested review of this revision.May 3 2023, 11:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 11:15 AM

crobeck updated this revision to Diff 519182.May 3 2023, 11:28 AM

crobeck updated this revision to Diff 519208.May 3 2023, 12:33 PM

I'm also wondering if reserving s[6:7] could have unintended side effects? Aren't those values used to pass parameters to the kernel?
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-sgpr-register-set-up-order-table

Maybe we need to use different, higher registers or something that depends on the CC/arguments of the function?

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
3	formatting
11	missing comment (or just remove this)
30	I'm not sure I understand, do you mean that the (estimated) branch offset gets multiplied by this value, and if the result doesn't fit in the branch instruction, we consider it a long branch?
69–70
94–97
171–175	If you're just interested in unconditional branches you can remove an indentation level
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2606–2609	If I understand correctly, `LongBranchReservedReg` is always reserved so could we just avoid running the scavenger when it's set and just use it directly? Otherwise it's just a "wasted" reserved register + nit: no `{}` for a one-line if body + very small nit: you could also just use `Register()` as the value for LongBranchReservedReg I think, it avoids the need to check against NoRegister. IMO it's more intuitive as you can just do `if(Register LongBranchReservedReg = MFI->getLongBranchReservedReg())` to check for its presence
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
384	Add a small doxygen comment on top
893	I'm not a fan of the default argument, I would just let the caller set it. Generally if I see a parameterless setter such as `setLongBranchReservedReg()` I would expect that `LongBranchReservedReg` is a bool that we're setting to true - for other types of value I'd expect to have to pass a parameter note that it's just my opinion, if a reviewer disagrees or if there is a precedent for this style then it's fine.

address some review comments

crobeck marked 7 inline comments as done.May 4 2023, 6:33 AM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
30	Right, it's essentially just an empirical tuning parameter on when we reserve the registers since estimating the code/branch size at this point, pre RA, is difficult.
171–175	I believe we just care about unconditional branches. But input on the logic here is of interest as well.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2606–2609	LongBranchReservedReg is only set if the PreRABranchDistance pass sets it because it finds a long branch. Otherwise it is just set to null. Formatting of LongBranchReservedReg != AMDGPU::NoRegister seemed consistent with the surrounding code but no strong feelings there. Open to changing.

crobeck marked 2 inline comments as done.May 4 2023, 6:40 AM

crobeck marked 2 inline comments as not done.May 4 2023, 7:09 AM

crobeck marked an inline comment as done.May 5 2023, 10:34 AM

address review comment

crobeck marked an inline comment as done.May 8 2023, 8:26 AM

Absolutely do not globally reserve s[6:7] for such things. Low SGPRs are very often given special meaning in function signatures.

Furthermore, is a global reservation even necessary? You could identify candidate long branches and give them a fake clobbered operand, so that the register allocator can just pre-choose a register pair, but not block it everywhere.

In D149775#4327638, @nhaehnle wrote:

Absolutely do not globally reserve s[6:7] for such things. Low SGPRs are very often given special meaning in function signatures.

Furthermore, is a global reservation even necessary? You could identify candidate long branches and give them a fake clobbered operand, so that the register allocator can just pre-choose a register pair, but not block it everywhere.

Sure. My thinking was to pick as low a SGPR pair as I could that wasn't already explicitly reserved. I thought s[6:7] were not used in the kernel args but I could be wrong. If there is another, higher, pair that makes more sense I'm OK with that.

I'll have to think about the fake clobbered operand idea. Would we be more likely to over allocate SGPRs when might not need them in that case?

vpykhtin added inline comments.May 12 2023, 7:40 AM

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
174	If you always need only the offset of the last instruction can you calculate it using the offset of the next MBB instead of summing every BB instruction's size?

I'm not sure why but this review doesn't contain context (unchanged lines of code), may be it worth to try arc diff to generate review requests.

What if we always reserve a register for long jumps?

arsenm added inline comments.May 12 2023, 10:12 AM

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
37	This is all copied from BranchRelaxation? I think you're trying to be far too precise for tracking this. This is going to be far too fuzzy given you have no idea what spills are going to be inserted. I think expected ~10 bytes per instruction without bothering to compute actual instruction sizes is enough and see if that's in the neighborhood of the maximum branch distance
179	This made a change and needs to return true. You also didn't actually modify the set of reserved registers
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
387	This is missing MIR serialization. Also don't need the initializer.
897	This shouldn't just hardcode to s[6:7]. Should pick the highest available and we can compact later (although I guess it's a bit awkward to move this one since relaxation runs after PEI where we handle all the other shifted reserved registers).

In D149775#4338257, @vpykhtin wrote:

What if we always reserve a register for long jumps?

That would really be more correct

rebase and address some review comments

Herald added subscribers: llvm-commits, hiraditya, wdng, jvesely. · View Herald TranscriptMay 19 2023, 12:21 PM

crobeck marked 3 inline comments as done.May 19 2023, 12:24 PM

crobeck edited the summary of this revision. (Show Details)May 19 2023, 1:10 PM

Harbormaster completed remote builds in B233271: Diff 523895.May 19 2023, 3:25 PM

addressing review comments.

Herald added a subscriber: qcolombet. · View Herald TranscriptMay 23 2023, 8:45 AM

In D149775#4338257, @vpykhtin wrote:

What if we always reserve a register for long jumps?

I think we still want the ability to not reserve them in cases where they might not actually be needed and would waste resources. The amdgpu-long-branch-factor initialized default has been reset higher so that we lean toward always reserving them now - if this value is 0 the long branch registers are never reserved. As this value grows the great chance the branch distance will fall within the threshold and the registers will be marked to be reserved.

crobeck added inline comments.May 23 2023, 9:05 AM

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
37	The distance calculation is mostly from BR. I'm not sure the actual method of determining what maximum branch distance use as the threshold should matter too much since it's just an empirical tuning factor. We lean toward always reserving the register by setting the default threshold somewhat high and can tune with the CL argument for individual cases where we don't want to reserve them. However you're right this distance calculation is probably overkill and could simpler.

Harbormaster completed remote builds in B233892: Diff 524741.May 23 2023, 10:33 AM

In D149775#4364811, @crobeck wrote:

In D149775#4338257, @vpykhtin wrote:

What if we always reserve a register for long jumps?

I think we still want the ability to not reserve them in cases where they might not actually be needed and would waste resources. The amdgpu-long-branch-factor initialized default has been reset higher so that we lean toward always reserving them now - if this value is 0 the long branch registers are never reserved. As this value grows the great chance the branch distance will fall within the threshold and the registers will be marked to be reserved.

Maybe we can reserve a register if the overall number of SGPRs is close to some threshold?

jrbyrnes added a subscriber: jrbyrnes.May 25 2023, 1:16 PM

jrbyrnes added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1454	Will this ever be called directly after `freezeReservedRegs` for VGPRForAGPRCopy (i.e. on gfx908)? If so, in those cases, can we use `reserveReg` instead of the second `freezeReservedRegs`?

rebase and address review comments

crobeck marked an inline comment as done.May 31 2023, 10:52 AM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1454	That's a good point. Updated.

crobeck marked an inline comment as done.May 31 2023, 11:47 AM

Harbormaster completed remote builds in B235621: Diff 527120.May 31 2023, 12:19 PM

formatting

Harbormaster completed remote builds in B235726: Diff 527263.May 31 2023, 8:47 PM

Pierre-vh added inline comments.May 31 2023, 11:53 PM

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
3	formatting
179–181	All of those are `unsigned` I think so I would use it here too to be consistent There is also a use of `uint64_t` somewhere else
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455	How does this work? Don't you need to check if `UnusedLowSGPR` returns something? Will it always return the pair we reserved earlier if nothing else is available? Would it be worth adding a test with `amdgpu-num-sgpr` to see what happens when 100% of SGPRs are used?
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll
69	missing newline

fix some tests

crobeck marked an inline comment as done.Jun 1 2023, 12:00 PM

update to bypass register scavenger if we've previous reserved ones

crobeck marked an inline comment as done.Jun 1 2023, 1:32 PM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2606–2609	Updated.

formatting

crobeck marked an inline comment as done.Jun 1 2023, 1:37 PM

update types in GCNPreRABranchDistance pass

crobeck marked an inline comment as done.Jun 1 2023, 1:45 PM

Harbormaster completed remote builds in B235970: Diff 527595.Jun 1 2023, 2:57 PM

crobeck updated this revision to Diff 527688.Jun 1 2023, 6:07 PM

This comment was removed by crobeck.

fix a type mismatch

Harbormaster completed remote builds in B236049: Diff 527695.Jun 1 2023, 7:20 PM

crobeck updated this revision to Diff 527892.Jun 2 2023, 10:07 AM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455	Yes, you're right. I've made the checks more explicit.

crobeck added inline comments.Jun 2 2023, 10:13 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455	Would it be worth adding a test with `amdgpu-num-sgpr` to see what happens when 100% of SGPRs are used? That's a good idea. At the very least good to understand what happens when were maxed out on SGPRs.

Harbormaster completed remote builds in B236204: Diff 527892.Jun 2 2023, 11:21 AM

crobeck updated this revision to Diff 527935.Jun 2 2023, 12:33 PM

Harbormaster completed remote builds in B236240: Diff 527935.Jun 2 2023, 1:11 PM

add test for when maxed out on SGPRs

crobeck added inline comments.Jun 9 2023, 1:24 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455	This ended up being kind of a mess trying to do it with the amdgpu-num-sgpr attribute so just used inline asm to use up all the registers

crobeck marked an inline comment as done.Jun 9 2023, 1:26 PM

Harbormaster completed remote builds in B237849: Diff 530066.Jun 9 2023, 2:17 PM

Any further comments on this? @arsenm Do you think we need to update the distance calculation or are OK to leave as is?

arsenm added inline comments.Jun 16 2023, 3:55 PM

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
37	I think you should go for fuzzier. Drop the block map and all that from BranchRelaxation. Just do something dead simple like number of instructions * 8 * fuzz ~= branch distance

Reduce complexity of branch distance calculation substantially - we now just approximate branch size by assuming 8 bytes per instruction. Also updated pass name to be more representative of what the pass does.

clean up unused header files and an unused function

Harbormaster completed remote builds in B239876: Diff 532741.Jun 19 2023, 1:28 PM

crobeck marked 2 inline comments as done.Jun 19 2023, 1:32 PM

crobeck edited the summary of this revision. (Show Details)Jun 20 2023, 8:27 AM

crobeck edited the summary of this revision. (Show Details)

crobeck edited the summary of this revision. (Show Details)Jun 20 2023, 8:29 AM

arsenm mentioned this in D143759: [AMDGPU] Implement whole wave register spill.Jun 21 2023, 6:53 AM

arsenm added inline comments.Jun 21 2023, 7:00 AM

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
45 ↗	(On Diff #532741)	uint64_ts?
77–78 ↗	(On Diff #532741)	Clear at exit instead?
85 ↗	(On Diff #532741)	Should skip debug and meta instructions, otherwise this violates the cardinal rule that debug info shouldn't change codegen
120 ↗	(On Diff #532741)	const
134–136 ↗	(On Diff #532741)	This should be moved into a TRI reserve registers helper like the others. Also, would be a bit safer to invert this by reserving by default and having this pass remove the reserved register
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1429	The reserved registers should already be frozen before PEI. I think the current freezeReservedRegs call here was just never updated to use the new incremental reserveReg
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2555	We should also fix this fixme at some point
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll
67	Add a test to show no change for debug instructions

arsenm added inline comments.Jun 21 2023, 7:01 AM

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
30 ↗	(On Diff #532741)	If you're going to use FP for this, might as well use double

cdevadas added a subscriber: cdevadas.Jun 21 2023, 7:20 AM

arsenm changed the edit policy from "Restricted Project (Project)" to "All Users".Jun 21 2023, 7:31 AM

Add a test for MIR serialization.

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
106 ↗	(On Diff #532741)	Capitalize.
117 ↗	(On Diff #532741)	Capitalize.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1445	Unify the `freezeReservedRegs` at the end of the function?
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2553	const

address review comments, add additional debug instruction test

crobeck marked an inline comment as done.Jun 21 2023, 1:45 PM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
134–136 ↗	(On Diff #532741)	We're not actually reserving the register here. We're just setting the flag that we need to reserve the register in TRI. But, I can move this if it makes more sense there. If we instead invert it we then need to always run through everything and prove there isn't a long branch as opposed to exiting once if we find a long branch.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1429	Just set both calls to reserveReg then?
1445	Pending updating reserveReg call above.

Harbormaster completed remote builds in B240339: Diff 533380.Jun 21 2023, 2:28 PM

Update reserveReg calls in SIFrameLowering

crobeck marked 2 inline comments as done.Jun 22 2023, 6:55 AM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1429	I think I've convinced myself these can both just be reserveReg calls.

Harbormaster completed remote builds in B240494: Diff 533586.Jun 22 2023, 8:15 AM

Swap out reserve register routine for TRI helper function

crobeck added inline comments.Jun 22 2023, 9:52 AM

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
134–136 ↗	(On Diff #532741)	This should be cleaner now with a call into TRI instead of grabbing my own registers.

arsenm added inline comments.Jun 22 2023, 10:13 AM

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
86 ↗	(On Diff #533663)	Just isMetaInstruction is fine, it's a superset of isDebugIstr
102 ↗	(On Diff #533663)	Don't need a specific instruction offset, just the block distances are good enough

remove specific instruction offsets from distance calc

crobeck marked 2 inline comments as done.Jun 22 2023, 11:44 AM

variable, pointer/reference clean up

Harbormaster completed remote builds in B240587: Diff 533731.Jun 22 2023, 3:23 PM

invert to reserve register by default and then have pass remove the register if unneeded

crobeck marked an inline comment as done.Jun 23 2023, 7:38 AM

Harbormaster completed remote builds in B240759: Diff 533956.Jun 23 2023, 8:46 AM

arsenm added inline comments.Jun 23 2023, 3:04 PM

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
116–117 ↗	(On Diff #533956)	I don't see how this can fail unless you're using amdgpu-num-sgprs (and this is one of the reasons it should be removed)
119 ↗	(On Diff #533956)	I would assume this is already noregister if this failed I think you misunderstood what I was saying about reserve first and release later. I meant in the compile pipeline, not in the pass itself. You can find the reserved register in finalizeLowering, and then have this pass unset and un-reserve. I don't know if we have an exposed helper to clear reserved registers. You can do that as a follow up,
120 ↗	(On Diff #533956)	If you actually did modify the MFI register this should be a return true
143 ↗	(On Diff #533956)	This should be return true if you modified MFI
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2624	Should also skip the enterBasicBlockEnd above

remove unnecessary set functions

crobeck marked 3 inline comments as done.Jun 23 2023, 4:00 PM

crobeck added inline comments.

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
116–117 ↗	(On Diff #533956)	It failed in some test cases where all the SGPRs were already used from inline asm
119 ↗	(On Diff #533956)	Yes, you're right I can remove that set function. Got it. OK, that makes sense.

skipping some of register scavenging if we already have a reserved register

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2624	There is some init of MBB in enterBasicBlockEnd that breaks things if I skip it entirely. But I think I can get away will skipping some of it.

Harbormaster completed remote builds in B240884: Diff 534113.Jun 23 2023, 4:49 PM

arsenm accepted this revision.Jun 27 2023, 3:14 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
116–117 ↗	(On Diff #533956)	oh right

This revision is now accepted and ready to land.Jun 27 2023, 3:14 PM

bcahoon mentioned this in rG853b2a84cb99: [AMDGPU] Reserve SGPR pair when long branches are present.Jun 29 2023, 2:53 PM

rG853b2a84cb9902725752e9603011041ebe33c7bf

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

3 lines

AMDGPUTargetMachine.cpp

9 lines

CMakeLists.txt

1 line

GCNPreRABranchDistance.cpp

193 lines

SIFrameLowering.cpp

18 lines

SIInstrInfo.cpp

17 lines

SIMachineFunctionInfo.h

12 lines

SIMachineFunctionInfo.cpp

2 lines

SIRegisterInfo.cpp

4 lines

test/

CodeGen/

AMDGPU/

branch-relax-spill.ll

2 lines

branch-relaxation.ll

4 lines

literal-constant-like-operand-instruction-size.ll

2 lines

llc-pipeline.ll

5 lines

long-branch-reserve-register.ll

330 lines

MIR/

AMDGPU/

machine-function-info-after-pei.ll

1 line

machine-function-info-long-branch-reg.ll

68 lines

machine-function-info-no-ir.mir

4 lines

machine-function-info.ll

4 lines

Diff 527688

llvm/lib/Target/AMDGPU/AMDGPU.h

Context not available.
	void initializeGCNNSAReassignPass(PassRegistry &);	void initializeGCNNSAReassignPass(PassRegistry &);
	extern char &GCNNSAReassignID;	extern char &GCNNSAReassignID;

		void initializeGCNPreRABranchDistancePass(PassRegistry &);
		extern char &GCNPreRABranchDistanceID;

	void initializeGCNPreRAOptimizationsPass(PassRegistry &);	void initializeGCNPreRAOptimizationsPass(PassRegistry &);
	extern char &GCNPreRAOptimizationsID;	extern char &GCNPreRAOptimizationsID;

Context not available.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Context not available.
	initializeAMDGPUResourceUsageAnalysisPass(*PR);	initializeAMDGPUResourceUsageAnalysisPass(*PR);
	initializeGCNNSAReassignPass(*PR);	initializeGCNNSAReassignPass(*PR);
	initializeGCNPreRAOptimizationsPass(*PR);	initializeGCNPreRAOptimizationsPass(*PR);
		initializeGCNPreRABranchDistancePass(*PR);
	initializeGCNRewritePartialRegUsesPass(*PR);	initializeGCNRewritePartialRegUsesPass(*PR);
	}	}

Context not available.
	if (!usingDefaultRegAlloc())	if (!usingDefaultRegAlloc())
	report_fatal_error(RegAllocOptNotSupportedMessage);	report_fatal_error(RegAllocOptNotSupportedMessage);

		addPass(&GCNPreRABranchDistanceID);

	addPass(createSGPRAllocPass(false));	addPass(createSGPRAllocPass(false));

	// Equivalent of PEI for SGPRs.	// Equivalent of PEI for SGPRs.
Context not available.
	if (!usingDefaultRegAlloc())	if (!usingDefaultRegAlloc())
	report_fatal_error(RegAllocOptNotSupportedMessage);	report_fatal_error(RegAllocOptNotSupportedMessage);

		addPass(&GCNPreRABranchDistanceID);

	addPass(createSGPRAllocPass(true));	addPass(createSGPRAllocPass(true));

	// Commit allocated register changes. This is mostly necessary because too	// Commit allocated register changes. This is mostly necessary because too
Context not available.
	if (parseOptionalRegister(YamlMFI.VGPRForAGPRCopy, MFI->VGPRForAGPRCopy))	if (parseOptionalRegister(YamlMFI.VGPRForAGPRCopy, MFI->VGPRForAGPRCopy))
	return true;	return true;

		if (parseOptionalRegister(YamlMFI.LongBranchReservedReg,
		MFI->LongBranchReservedReg))
		return true;

	auto diagnoseRegisterClass = [&](const yaml::StringValue &RegName) {	auto diagnoseRegisterClass = [&](const yaml::StringValue &RegName) {
	// Create a diagnostic for a the register string literal.	// Create a diagnostic for a the register string literal.
	const MemoryBuffer &Buffer =	const MemoryBuffer &Buffer =
Context not available.

llvm/lib/Target/AMDGPU/CMakeLists.txt

Context not available.
	GCNMinRegStrategy.cpp	GCNMinRegStrategy.cpp
	GCNNSAReassign.cpp	GCNNSAReassign.cpp
	GCNPreRAOptimizations.cpp	GCNPreRAOptimizations.cpp
		GCNPreRABranchDistance.cpp
	GCNRegPressure.cpp	GCNRegPressure.cpp
	GCNRewritePartialRegUses.cpp	GCNRewritePartialRegUses.cpp
	GCNSchedStrategy.cpp	GCNSchedStrategy.cpp
Context not available.

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp

This file was added.

//===-- GCNPreRABranchDistance.cpp ----------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

Pierre-vhUnsubmitted

Done

formatting

Pierre-vh: formatting

Pierre-vhUnsubmitted

Done

formatting

Pierre-vh: formatting

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

/// \file

/// \brief Pass to estimate pre RA branch size and reserve a pair of SGPRs if

/// there is a long branch. Tuning of what is considered "long" is handled

/// through amdgpu-pre-ra-branch-distance cl argument that sets

Pierre-vhUnsubmitted

Done

missing comment (or just remove this)

Pierre-vh: missing comment (or just remove this)

/// LongBranchFactor.

//===----------------------------------------------------------------------===//

#include "AMDGPU.h"

#include "GCNSubtarget.h"

#include "MCTargetDesc/AMDGPUMCTargetDesc.h"

#include "SIMachineFunctionInfo.h"

#include "llvm/CodeGen/LiveIntervals.h"

#include "llvm/CodeGen/MachineFunctionPass.h"

#include "llvm/InitializePasses.h"

#include <algorithm>

#include <cmath>

using namespace llvm;

#define DEBUG_TYPE "amdgpu-pre-ra-branch-distance"

namespace {

static cl::opt<float> LongBranchFactor(

Pierre-vhUnsubmitted

Done

I'm not sure I understand, do you mean that the (estimated) branch offset gets multiplied by this value, and if the result doesn't fit in the branch instruction, we consider it a long branch?

Pierre-vh: I'm not sure I understand, do you mean that the (estimated) branch offset gets multiplied by…

crobeckAuthorUnsubmitted

Done

Right, it's essentially just an empirical tuning parameter on when we reserve the registers since estimating the code/branch size at this point, pre RA, is difficult.

crobeck: Right, it's essentially just an empirical tuning parameter on when we reserve the registers…

"amdgpu-long-branch-factor", cl::init(1000.0), cl::Hidden,

cl::desc("Factor to apply to what qualifies as a long branch "

"to reserve a pair of scalar registers. If this value "

"is 0 the long branch registers are never reserved. As this "

"value grows the greater chance the branch distance will fall "

"within the threshold and the registers will be marked to be "

"reserved. We set the value high to lean towards always reserving "

arsenmUnsubmitted

Done

This is all copied from BranchRelaxation?

I think you're trying to be far too precise for tracking this. This is going to be far too fuzzy given you have no idea what spills are going to be inserted. I think expected ~10 bytes per instruction without bothering to compute actual instruction sizes is enough and see if that's in the neighborhood of the maximum branch distance

arsenm: This is all copied from BranchRelaxation? I think you're trying to be far too precise for…

crobeckAuthorUnsubmitted

Done

The distance calculation is mostly from BR. I'm not sure the actual method of determining what maximum branch distance use as the threshold should matter too much since it's just an empirical tuning factor. We lean toward always reserving the register by setting the default threshold somewhat high and can tune with the CL argument for individual cases where we don't want to reserve them. However you're right this distance calculation is probably overkill and could simpler.

crobeck: The distance calculation is mostly from BR. I'm not sure the actual method of determining what…

arsenmUnsubmitted

Done

I think you should go for fuzzier. Drop the block map and all that from BranchRelaxation. Just do something dead simple like number of instructions * 8 * fuzz ~= branch distance

arsenm: I think you should go for fuzzier. Drop the block map and all that from BranchRelaxation. Just…

"a register for long jumps"));

class GCNPreRABranchDistance : public MachineFunctionPass {

/// BasicBlockInfo - Information about the offset and size of a single

/// basic block.

struct BasicBlockInfo {

/// Offset - Distance from the beginning of the function to the beginning

/// of this basic block.

///

/// The offset is always aligned as required by the basic block.

unsigned Offset = 0;

/// Size - Size of the basic block in bytes. If the block contains

/// inline assembly, this is a worst case estimate.

///

/// The size does not include any alignment padding whether from the

/// beginning of the block, or from an aligned jump table at the end.

unsigned Size = 0;

BasicBlockInfo() = default;

/// Compute the offset immediately following this block. \p MBB is the next

/// block.

unsigned postOffset(const MachineBasicBlock &MBB) const {

const unsigned PO = Offset + Size;

const Align Alignment = MBB.getAlignment();

const Align ParentAlign = MBB.getParent()->getAlignment();

if (Alignment <= ParentAlign)

return alignTo(PO, Alignment);

// The alignment of this MBB is larger than the function's alignment, so

// we can't tell whether or not it will insert nops. Assume that it will.

return alignTo(PO, Alignment) + Alignment.value() - ParentAlign.value();

Pierre-vhUnsubmitted

Done

SmallVector<BasicBlockInfo, 16> BlockInfo;

- MachineFunction *MF;

- const SIInstrInfo *TII;

+ MachineFunction *MF = nullptr;

+ const SIInstrInfo *TII = nullptr;

void scanFunction();

Pierre-vh:

}

};

SmallVector<BasicBlockInfo, 16> BlockInfo;

MachineFunction *MF = nullptr;

const SIInstrInfo *TII = nullptr;

void scanFunction();

void adjustBlockOffsets(MachineBasicBlock &Start);

uint64_t computeBlockSize(const MachineBasicBlock &MBB) const;

unsigned getInstrOffset(const MachineInstr &MI) const;

public:

static char ID;

GCNPreRABranchDistance() : MachineFunctionPass(ID) {

initializeGCNPreRABranchDistancePass(*PassRegistry::getPassRegistry());

}

void scanFunction(MachineFunction &MF);

bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override {

return "AMDGPU Pre-RA Branch Distance";

}

void getAnalysisUsage(AnalysisUsage &AU) const override {

AU.setPreservesAll();

MachineFunctionPass::getAnalysisUsage(AU);

}

};

} // End anonymous namespace.

char GCNPreRABranchDistance::ID = 0;

Pierre-vhUnsubmitted

Done

char GCNPreRABranchDistance::ID = 0;

- INITIALIZE_PASS_BEGIN(GCNPreRABranchDistance, DEBUG_TYPE,

+ INITIALIZE_PASS(GCNPreRABranchDistance, DEBUG_TYPE,

"AMDGPU Pre-RA Branch Distance", false, false)

- INITIALIZE_PASS_END(GCNPreRABranchDistance, DEBUG_TYPE,

- "Pre-RA Branch Distance", false, false)

char &llvm::GCNPreRABranchDistanceID = GCNPreRABranchDistance::ID;

Pierre-vh:

INITIALIZE_PASS(GCNPreRABranchDistance, DEBUG_TYPE,

"AMDGPU Pre-RA Branch Distance", false, false)

char &llvm::GCNPreRABranchDistanceID = GCNPreRABranchDistance::ID;

/// scanFunction - Do the initial scan of the function, building up

/// information about each block.

void GCNPreRABranchDistance::scanFunction() {

BlockInfo.clear();

BlockInfo.resize(MF->getNumBlockIDs());

// First thing, compute the size of all basic blocks, and see if the function

// has any inline assembly in it. If so, we have to be conservative about

// alignment assumptions, as we don't know for sure the size of any

// instructions in the inline assembly.

for (MachineBasicBlock &MBB : *MF)

BlockInfo[MBB.getNumber()].Size = computeBlockSize(MBB);

// Compute block offsets and known bits.

adjustBlockOffsets(*MF->begin());

}

uint64_t

GCNPreRABranchDistance::computeBlockSize(const MachineBasicBlock &MBB) const {

uint64_t CodeSize = 0;

for (const MachineInstr &MI : MBB)

CodeSize += TII->getInstSizeInBytes(MI);

return CodeSize;

}

void GCNPreRABranchDistance::adjustBlockOffsets(MachineBasicBlock &Start) {

unsigned PrevNum = Start.getNumber();

for (auto &MBB :

make_range(std::next(MachineFunction::iterator(Start)), MF->end())) {

unsigned Num = MBB.getNumber();

// Get the offset and known bits at the end of the layout predecessor.

// Include the alignment of the current block.

BlockInfo[Num].Offset = BlockInfo[PrevNum].postOffset(MBB);

PrevNum = Num;

}

/// getInstrOffset - Return the current offset of the specified machine

/// instruction from the start of the function. This offset changes as stuff is

/// moved around inside the function.

unsigned GCNPreRABranchDistance::getInstrOffset(const MachineInstr &MI) const {

const MachineBasicBlock *MBB = MI.getParent();

// The offset is composed of two things: the sum of the sizes of all MBB's

// before this instruction's block, and the offset from the start of the block

// it is in.

unsigned Offset = BlockInfo[MBB->getNumber()].Offset;

// Sum instructions before MI in MBB.

for (MachineBasicBlock::const_iterator I = MBB->begin(); &*I != &MI; ++I) {

assert(I != MBB->end() && "Didn't find MI in its own basic block?");

Offset += TII->getInstSizeInBytes(*I);

}

return Offset;

}

bool GCNPreRABranchDistance::runOnMachineFunction(MachineFunction &Fn) {

MF = &Fn;

const GCNSubtarget &STM = MF->getSubtarget<GCNSubtarget>();

TII = STM.getInstrInfo();

SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();

// Do the initial scan of the function, building up information about the

// sizes of each block.

scanFunction();

for (MachineBasicBlock &MBB : *MF) {

MachineBasicBlock::iterator Last = MBB.getLastNonDebugInstr();

if (Last == MBB.end() || !Last->isUnconditionalBranch())

vpykhtinUnsubmitted

Not Done

If you always need only the offset of the last instruction can you calculate it using the offset of the next MBB instead of summing every BB instruction's size?

vpykhtin: If you always need only the offset of the last instruction can you calculate it using the…

continue;

Pierre-vhUnsubmitted

Done

MachineBasicBlock::iterator Last = MBB.getLastNonDebugInstr();

- if (Last == MBB.end())

+ if (Last == MBB.end() || !Last->isUnconditionalBranch())

continue;

- if (Last->isConditionalBranch())

- continue;

- if (Last->isUnconditionalBranch()) {

MachineBasicBlock *DestBB = TII->getBranchDestBlock(*Last);

If you're just interested in unconditional branches you can remove an indentation level

Pierre-vh: If you're just interested in unconditional branches you can remove an indentation level

crobeckAuthorUnsubmitted

Done

I believe we just care about unconditional branches. But input on the logic here is of interest as well.

crobeck: I believe we just care about unconditional branches. But input on the logic here is of interest…

MachineBasicBlock *DestBB = TII->getBranchDestBlock(*Last);

uint64_t DestOffset = BlockInfo[DestBB->getNumber()].Offset;

uint64_t SrcOffset = getInstrOffset(*Last);

uint64_t Offset =

arsenmUnsubmitted

Done

This made a change and needs to return true. You also didn't actually modify the set of reserved registers

arsenm: This made a change and needs to return true. You also didn't actually modify the set of…

static_cast<uint64_t>(LongBranchFactor * (DestOffset - SrcOffset));

// We assume that if the branch offset falls out of range here

Pierre-vhUnsubmitted

Done

All of those are unsigned I think so I would use it here too to be consistent
There is also a use of uint64_t somewhere else

Pierre-vh: All of those are `unsigned` I think so I would use it here too to be consistent There is also a…

// the branch is "long" and we need to reserve the register

if (!TII->isBranchOffsetInRange(Last->getOpcode(), Offset)) {

// For now, reserve highest available SGPR pair. After

// RA, shift down to a lower unused pair of SGPRs

AMDGPU::SGPR_64RegClass.getRegister(STM.getMaxNumSGPRs(*MF) / 2 - 1);

MFI->setLongBranchReservedReg(Reg);

return true;

}

return false;

}

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Context not available.
	const SIRegisterInfo *TRI = ST.getRegisterInfo();	const SIRegisterInfo *TRI = ST.getRegisterInfo();
	MachineRegisterInfo &MRI = MF.getRegInfo();	MachineRegisterInfo &MRI = MF.getRegInfo();
	SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();	SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
		bool RegsFrozen = false;
		arsenmUnsubmitted Done Reply Inline Actions The reserved registers should already be frozen before PEI. I think the current freezeReservedRegs call here was just never updated to use the new incremental reserveReg arsenm: The reserved registers should already be frozen before PEI. I think the current…
		crobeckAuthorUnsubmitted Done Reply Inline Actions Just set both calls to reserveReg then? crobeck: Just set both calls to reserveReg then?
		crobeckAuthorUnsubmitted Done Reply Inline Actions I think I've convinced myself these can both just be reserveReg calls. crobeck: I think I've convinced myself these can both just be reserveReg calls.

	if (ST.hasMAIInsts() && !ST.hasGFX90AInsts()) {	if (ST.hasMAIInsts() && !ST.hasGFX90AInsts()) {
	// On gfx908, we had initially reserved highest available VGPR for AGPR	// On gfx908, we had initially reserved highest available VGPR for AGPR
Context not available.
	// identified VGPR (for AGPR copy).	// identified VGPR (for AGPR copy).
	FuncInfo->setVGPRForAGPRCopy(UnusedLowVGPR);	FuncInfo->setVGPRForAGPRCopy(UnusedLowVGPR);
	MRI.freezeReservedRegs(MF);	MRI.freezeReservedRegs(MF);
		cdevadasUnsubmitted Done Reply Inline Actions Unify the `freezeReservedRegs` at the end of the function? cdevadas: Unify the `freezeReservedRegs` at the end of the function?
		crobeckAuthorUnsubmitted Done Reply Inline Actions Pending updating reserveReg call above. crobeck: Pending updating reserveReg call above.
		RegsFrozen = true;
	}	}
	}	}

		// We initally reserved the highest available SGPR pair for long branches
		// now, after RA, we shift down to a lower unused one if one exists
		if (FuncInfo->getLongBranchReservedReg()) {
		Register UnusedLowSGPR =
		TRI->findUnusedRegister(MRI, &AMDGPU::SGPR_64RegClass, MF);
		jrbyrnesUnsubmitted Done Reply Inline Actions Will this ever be called directly after `freezeReservedRegs` for VGPRForAGPRCopy (i.e. on gfx908)? If so, in those cases, can we use `reserveReg` instead of the second `freezeReservedRegs`? jrbyrnes: Will this ever be called directly after `freezeReservedRegs` for VGPRForAGPRCopy (i.e. on…
		crobeckAuthorUnsubmitted Done Reply Inline Actions That's a good point. Updated. crobeck: That's a good point. Updated.
		FuncInfo->setLongBranchReservedReg(UnusedLowSGPR);
		Pierre-vhUnsubmitted Done Reply Inline Actions How does this work? Don't you need to check if `UnusedLowSGPR` returns something? Will it always return the pair we reserved earlier if nothing else is available? Would it be worth adding a test with `amdgpu-num-sgpr` to see what happens when 100% of SGPRs are used? Pierre-vh: How does this work? Don't you need to check if `UnusedLowSGPR` returns something? Will it…
		crobeckAuthorUnsubmitted Done Reply Inline Actions Yes, you're right. I've made the checks more explicit. crobeck: Yes, you're right. I've made the checks more explicit.
		crobeckAuthorUnsubmitted Done Reply Inline Actions Would it be worth adding a test with `amdgpu-num-sgpr` to see what happens when 100% of SGPRs are used? That's a good idea. At the very least good to understand what happens when were maxed out on SGPRs. crobeck: > Would it be worth adding a test with `amdgpu-num-sgpr` to see what happens when 100% of SGPRs…
		crobeckAuthorUnsubmitted Done Reply Inline Actions This ended up being kind of a mess trying to do it with the amdgpu-num-sgpr attribute so just used inline asm to use up all the registers crobeck: This ended up being kind of a mess trying to do it with the amdgpu-num-sgpr attribute so just…
		// Update reserved registers to include long branch ones
		// if we've already called freezeReservedRegs above
		// we can avoid recomputing the whole set of reserved regs and just call
		// reserveReg instead
		if (RegsFrozen)
		MRI.reserveReg(UnusedLowSGPR, TRI);
		else
		MRI.freezeReservedRegs(MF);
		}
	}	}

	// The special SGPR spills like the one needed for FP, BP or any reserved	// The special SGPR spills like the one needed for FP, BP or any reserved
Context not available.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Context not available.

	MachineFunction *MF = MBB.getParent();	MachineFunction *MF = MBB.getParent();
	MachineRegisterInfo &MRI = MF->getRegInfo();	MachineRegisterInfo &MRI = MF->getRegInfo();
		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();

	// FIXME: Virtual register workaround for RegScavenger not working with empty	// FIXME: Virtual register workaround for RegScavenger not working with empty
	// blocks.	// blocks.
		arsenmUnsubmitted Not Done Reply Inline Actions We should also fix this fixme at some point arsenm: We should also fix this fixme at some point
		cdevadasUnsubmitted Done Reply Inline Actions const cdevadas: const
Context not available.
	// buzz;	// buzz;

	RS->enterBasicBlockEnd(MBB);	RS->enterBasicBlockEnd(MBB);
	Register Scav = RS->scavengeRegisterBackwards(
	AMDGPU::SReg_64RegClass, MachineBasicBlock::iterator(GetPC),	Register LongBranchReservedReg = MFI->getLongBranchReservedReg();
	/* RestoreAfter / false, 0, / AllowSpill */ false);	Register Scav;

		Pierre-vhUnsubmitted Done Reply Inline Actions If I understand correctly, `LongBranchReservedReg` is always reserved so could we just avoid running the scavenger when it's set and just use it directly? Otherwise it's just a "wasted" reserved register + nit: no `{}` for a one-line if body + very small nit: you could also just use `Register()` as the value for LongBranchReservedReg I think, it avoids the need to check against NoRegister. IMO it's more intuitive as you can just do `if(Register LongBranchReservedReg = MFI->getLongBranchReservedReg())` to check for its presence Pierre-vh: If I understand correctly, `LongBranchReservedReg` is always reserved so could we just avoid…
		crobeckAuthorUnsubmitted Done Reply Inline Actions LongBranchReservedReg is only set if the PreRABranchDistance pass sets it because it finds a long branch. Otherwise it is just set to null. Formatting of LongBranchReservedReg != AMDGPU::NoRegister seemed consistent with the surrounding code but no strong feelings there. Open to changing. crobeck: LongBranchReservedReg is only set if the PreRABranchDistance pass sets it because it finds a…
		crobeckAuthorUnsubmitted Done Reply Inline Actions Updated. crobeck: Updated.
		// If we've previously reserved a register for long branches
		// avoid running the scavenger and just use those registers
		if (LongBranchReservedReg)
		Scav = LongBranchReservedReg;
		else
		Scav = RS->scavengeRegisterBackwards(
		AMDGPU::SReg_64RegClass, MachineBasicBlock::iterator(GetPC),
		/* RestoreAfter / false, 0, / AllowSpill */ false);

	if (Scav) {	if (Scav) {
	RS->setRegUsed(Scav);	RS->setRegUsed(Scav);
	MRI.replaceRegWith(PCReg, Scav);	MRI.replaceRegWith(PCReg, Scav);
Context not available.
		arsenmUnsubmitted Not Done Reply Inline Actions Should also skip the enterBasicBlockEnd above arsenm: Should also skip the enterBasicBlockEnd above
		crobeckAuthorUnsubmitted Done Reply Inline Actions There is some init of MBB in enterBasicBlockEnd that breaks things if I skip it entirely. But I think I can get away will skipping some of it. crobeck: There is some init of MBB in enterBasicBlockEnd that breaks things if I skip it entirely. But I…

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Context not available.
	SIMode Mode;	SIMode Mode;
	std::optional<FrameIndex> ScavengeFI;	std::optional<FrameIndex> ScavengeFI;
	StringValue VGPRForAGPRCopy;	StringValue VGPRForAGPRCopy;
		StringValue LongBranchReservedReg;

	SIMachineFunctionInfo() = default;	SIMachineFunctionInfo() = default;
	SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,	SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,
Context not available.
	YamlIO.mapOptional("scavengeFI", MFI.ScavengeFI);	YamlIO.mapOptional("scavengeFI", MFI.ScavengeFI);
	YamlIO.mapOptional("vgprForAGPRCopy", MFI.VGPRForAGPRCopy,	YamlIO.mapOptional("vgprForAGPRCopy", MFI.VGPRForAGPRCopy,
	StringValue()); // Don't print out when it's empty.	StringValue()); // Don't print out when it's empty.
		YamlIO.mapOptional("longBranchReservedReg", MFI.LongBranchReservedReg,
		StringValue());
	}	}
	};	};

Context not available.
	// base to the beginning of the new function's frame.	// base to the beginning of the new function's frame.
		Pierre-vhUnsubmitted Done Reply Inline Actions Add a small doxygen comment on top Pierre-vh: Add a small doxygen comment on top
	Register StackPtrOffsetReg = AMDGPU::SP_REG;	Register StackPtrOffsetReg = AMDGPU::SP_REG;

		// Registers that may be reserved when RA doesn't allocate enough
		arsenmUnsubmitted Done Reply Inline Actions This is missing MIR serialization. Also don't need the initializer. arsenm: This is missing MIR serialization. Also don't need the initializer.
		// registers to plan for the case where an indirect branch ends up
		// being needed during branch relaxation.
		Register LongBranchReservedReg;

	AMDGPUFunctionArgInfo ArgInfo;	AMDGPUFunctionArgInfo ArgInfo;

	// Graphics info.	// Graphics info.
		Pierre-vhUnsubmitted Done Reply Inline Actions I'm not a fan of the default argument, I would just let the caller set it. Generally if I see a parameterless setter such as `setLongBranchReservedReg()` I would expect that `LongBranchReservedReg` is a bool that we're setting to true - for other types of value I'd expect to have to pass a parameter note that it's just my opinion, if a reviewer disagrees or if there is a precedent for this style then it's fine. Pierre-vh: I'm not a fan of the default argument, I would just let the caller set it. Generally if I see a…
		arsenmUnsubmitted Done Reply Inline Actions This shouldn't just hardcode to s[6:7]. Should pick the highest available and we can compact later (although I guess it's a bit awkward to move this one since relaxation runs after PEI where we handle all the other shifted reserved registers). arsenm: This shouldn't just hardcode to s[6:7]. Should pick the highest available and we can compact…
Context not available.
	StackPtrOffsetReg = Reg;	StackPtrOffsetReg = Reg;
	}	}

		void setLongBranchReservedReg(Register Reg) { LongBranchReservedReg = Reg; }

	// Note the unset value for this is AMDGPU::SP_REG rather than	// Note the unset value for this is AMDGPU::SP_REG rather than
	// NoRegister. This is mostly a workaround for MIR tests where state that	// NoRegister. This is mostly a workaround for MIR tests where state that
	// can't be directly computed from the function is not preserved in serialized	// can't be directly computed from the function is not preserved in serialized
Context not available.
	return StackPtrOffsetReg;	return StackPtrOffsetReg;
	}	}

		Register getLongBranchReservedReg() const { return LongBranchReservedReg; }

	Register getQueuePtrUserSGPR() const {	Register getQueuePtrUserSGPR() const {
	return ArgInfo.QueuePtr.getRegister();	return ArgInfo.QueuePtr.getRegister();
	}	}
Context not available.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Context not available.
	for (Register Reg : MFI.getWWMReservedRegs())	for (Register Reg : MFI.getWWMReservedRegs())
	WWMReservedRegs.push_back(regToString(Reg, TRI));	WWMReservedRegs.push_back(regToString(Reg, TRI));

		if (MFI.getLongBranchReservedReg())
		LongBranchReservedReg = regToString(MFI.getLongBranchReservedReg(), TRI);
	if (MFI.getVGPRForAGPRCopy())	if (MFI.getVGPRForAGPRCopy())
	VGPRForAGPRCopy = regToString(MFI.getVGPRForAGPRCopy(), TRI);	VGPRForAGPRCopy = regToString(MFI.getVGPRForAGPRCopy(), TRI);
	auto SFI = MFI.getOptionalScavengeFI();	auto SFI = MFI.getOptionalScavengeFI();
Context not available.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Context not available.
	reserveRegisterTuples(Reserved, ScratchRSrcReg);	reserveRegisterTuples(Reserved, ScratchRSrcReg);
	}	}

		Register LongBranchReservedReg = MFI->getLongBranchReservedReg();
		if (LongBranchReservedReg)
		reserveRegisterTuples(Reserved, LongBranchReservedReg);

	// We have to assume the SP is needed in case there are calls in the function,	// We have to assume the SP is needed in case there are calls in the function,
	// which is detected after the function is lowered. If we aren't really going	// which is detected after the function is lowered. If we aren't really going
	// to need SP, don't bother reserving it.	// to need SP, don't bother reserving it.
Context not available.

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -verify-machineinstrs -amdgpu-s-branch-bits=5 -o - %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -verify-machineinstrs -amdgpu-s-branch-bits=5 -amdgpu-long-branch-factor=0 -o - %s \| FileCheck %s

	define amdgpu_kernel void @spill(ptr addrspace(1) %arg, i32 %cnd) #0 {			define amdgpu_kernel void @spill(ptr addrspace(1) %arg, i32 %cnd) #0 {
	; CHECK-LABEL: spill:			; CHECK-LABEL: spill:

llvm/test/CodeGen/AMDGPU/branch-relaxation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs -amdgpu-s-branch-bits=4 -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs -amdgpu-s-branch-bits=4 -simplifycfg-require-and-preserve-domtree=1 -amdgpu-long-branch-factor=0 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s


	; FIXME: We should use llvm-mc for this, but we can't even parse our own output.			; FIXME: We should use llvm-mc for this, but we can't even parse our own output.
	; See PR33579.			; See PR33579.
	; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-s-branch-bits=4 -o %t.o -filetype=obj -simplifycfg-require-and-preserve-domtree=1 %s			; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-s-branch-bits=4 -amdgpu-long-branch-factor=0 -o %t.o -filetype=obj -simplifycfg-require-and-preserve-domtree=1 %s
	; RUN: llvm-readobj -r %t.o \| FileCheck --check-prefix=OBJ %s			; RUN: llvm-readobj -r %t.o \| FileCheck --check-prefix=OBJ %s

	; OBJ: Relocations [			; OBJ: Relocations [

llvm/test/CodeGen/AMDGPU/literal-constant-like-operand-instruction-size.ll

	; RUN: llc -march=amdgcn -mcpu=gfx906 -verify-machineinstrs -amdgpu-s-branch-bits=6 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx906 -verify-machineinstrs -amdgpu-s-branch-bits=6 -amdgpu-long-branch-factor=0 < %s \| FileCheck -check-prefix=GCN %s


	; Restrict maximum branch to between +31 and -32 dwords			; Restrict maximum branch to between +31 and -32 dwords

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

Context not available.
	; GCN-O0-NEXT: Virtual Register Map	; GCN-O0-NEXT: Virtual Register Map
	; GCN-O0-NEXT: Live Register Matrix	; GCN-O0-NEXT: Live Register Matrix
	; GCN-O0-NEXT: SI Pre-allocate WWM Registers	; GCN-O0-NEXT: SI Pre-allocate WWM Registers
		; GCN-O0-NEXT: AMDGPU Pre-RA Branch Distance
	; GCN-O0-NEXT: Fast Register Allocator	; GCN-O0-NEXT: Fast Register Allocator
	; GCN-O0-NEXT: SI lower SGPR spill instructions	; GCN-O0-NEXT: SI lower SGPR spill instructions
	; GCN-O0-NEXT: Fast Register Allocator	; GCN-O0-NEXT: Fast Register Allocator
Context not available.
	; GCN-O1-NEXT: Live Register Matrix	; GCN-O1-NEXT: Live Register Matrix
	; GCN-O1-NEXT: SI Pre-allocate WWM Registers	; GCN-O1-NEXT: SI Pre-allocate WWM Registers
	; GCN-O1-NEXT: SI optimize exec mask operations pre-RA	; GCN-O1-NEXT: SI optimize exec mask operations pre-RA
		; GCN-O1-NEXT: AMDGPU Pre-RA Branch Distance
	; GCN-O1-NEXT: Machine Natural Loop Construction	; GCN-O1-NEXT: Machine Natural Loop Construction
	; GCN-O1-NEXT: Machine Block Frequency Analysis	; GCN-O1-NEXT: Machine Block Frequency Analysis
	; GCN-O1-NEXT: Debug Variable Analysis	; GCN-O1-NEXT: Debug Variable Analysis
Context not available.
	; GCN-O1-OPTS-NEXT: Live Register Matrix	; GCN-O1-OPTS-NEXT: Live Register Matrix
	; GCN-O1-OPTS-NEXT: SI Pre-allocate WWM Registers	; GCN-O1-OPTS-NEXT: SI Pre-allocate WWM Registers
	; GCN-O1-OPTS-NEXT: SI optimize exec mask operations pre-RA	; GCN-O1-OPTS-NEXT: SI optimize exec mask operations pre-RA
		; GCN-O1-OPTS-NEXT: AMDGPU Pre-RA Branch Distance
	; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction	; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction
	; GCN-O1-OPTS-NEXT: Machine Block Frequency Analysis	; GCN-O1-OPTS-NEXT: Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Debug Variable Analysis	; GCN-O1-OPTS-NEXT: Debug Variable Analysis
Context not available.
	; GCN-O2-NEXT: SI Pre-allocate WWM Registers	; GCN-O2-NEXT: SI Pre-allocate WWM Registers
	; GCN-O2-NEXT: SI optimize exec mask operations pre-RA	; GCN-O2-NEXT: SI optimize exec mask operations pre-RA
	; GCN-O2-NEXT: SI Form memory clauses	; GCN-O2-NEXT: SI Form memory clauses
		; GCN-O2-NEXT: AMDGPU Pre-RA Branch Distance
	; GCN-O2-NEXT: Machine Natural Loop Construction	; GCN-O2-NEXT: Machine Natural Loop Construction
	; GCN-O2-NEXT: Machine Block Frequency Analysis	; GCN-O2-NEXT: Machine Block Frequency Analysis
	; GCN-O2-NEXT: Debug Variable Analysis	; GCN-O2-NEXT: Debug Variable Analysis
Context not available.
	; GCN-O3-NEXT: SI Pre-allocate WWM Registers	; GCN-O3-NEXT: SI Pre-allocate WWM Registers
	; GCN-O3-NEXT: SI optimize exec mask operations pre-RA	; GCN-O3-NEXT: SI optimize exec mask operations pre-RA
	; GCN-O3-NEXT: SI Form memory clauses	; GCN-O3-NEXT: SI Form memory clauses
		; GCN-O3-NEXT: AMDGPU Pre-RA Branch Distance
	; GCN-O3-NEXT: Machine Natural Loop Construction	; GCN-O3-NEXT: Machine Natural Loop Construction
	; GCN-O3-NEXT: Machine Block Frequency Analysis	; GCN-O3-NEXT: Machine Block Frequency Analysis
	; GCN-O3-NEXT: Debug Variable Analysis	; GCN-O3-NEXT: Debug Variable Analysis
Context not available.

llvm/test/CodeGen/AMDGPU/long-branch-reserve-register.ll

This file was added.

				; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-s-branch-bits=4 -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

				; OBJ: Relocations [
				; OBJ-NEXT: ]

				; Used to emit an always 4 byte instruction. Inline asm always assumes
				; each instruction is the maximum size.
				declare void @llvm.amdgcn.s.sleep(i32) #0

				declare i32 @llvm.amdgcn.workitem.id.x() #1


				define amdgpu_kernel void @uniform_conditional_max_short_forward_branch(ptr addrspace(1) %arg, i32 %cnd) #0 {
				; GCN-LABEL: uniform_conditional_max_short_forward_branch:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_cmp_eq_u32 s2, 0
				; GCN-NEXT: s_cbranch_scc1 .LBB0_2
				; GCN-NEXT: ; %bb.1: ; %bb2
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: s_sleep 0
				; GCN-NEXT: .LBB0_2: ; %bb3
				; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
				; GCN-NEXT: s_mov_b32 s7, 0xf000
				; GCN-NEXT: s_mov_b32 s6, -1
				; GCN-NEXT: v_mov_b32_e32 v0, s2
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
				bb:
				%cmp = icmp eq i32 %cnd, 0
				br i1 %cmp, label %bb3, label %bb2 ; +8 dword branch

				bb2:
				; 24 bytes
				call void asm sideeffect
				"v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				call void @llvm.amdgcn.s.sleep(i32 0)
				br label %bb3

				bb3:
				store volatile i32 %cnd, ptr addrspace(1) %arg
				ret void
				}

				define amdgpu_kernel void @uniform_conditional_min_long_forward_branch(ptr addrspace(1) %arg, i32 %cnd) #0 {
				; GCN-LABEL: uniform_conditional_min_long_forward_branch:
				; GCN: ; %bb.0: ; %bb0
				; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_cmp_eq_u32 s2, 0
				; GCN-NEXT: s_cbranch_scc0 .LBB1_1
				; GCN-NEXT: .LBB1_3: ; %bb0
				; GCN-NEXT: s_getpc_b64 s[8:9]
				; GCN-NEXT: .Lpost_getpc0:
				; GCN-NEXT: s_add_u32 s8, s8, (.LBB1_2-.Lpost_getpc0)&4294967295
				; GCN-NEXT: s_addc_u32 s9, s9, (.LBB1_2-.Lpost_getpc0)>>32
				; GCN-NEXT: s_setpc_b64 s[8:9]
				; GCN-NEXT: .LBB1_1: ; %bb2
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: .LBB1_2: ; %bb3
				; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
				; GCN-NEXT: s_mov_b32 s7, 0xf000
				; GCN-NEXT: s_mov_b32 s6, -1
				; GCN-NEXT: v_mov_b32_e32 v0, s2
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
				bb0:
				%cmp = icmp eq i32 %cnd, 0
				br i1 %cmp, label %bb3, label %bb2 ; +9 dword branch

				bb2:
				; 32 bytes
				call void asm sideeffect
				"v_nop_e64
				v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				br label %bb3

				bb3:
				store volatile i32 %cnd, ptr addrspace(1) %arg
				ret void
				}

				define amdgpu_kernel void @uniform_conditional_min_long_forward_vcnd_branch(ptr addrspace(1) %arg, float %cnd) #0 {
				; GCN-LABEL: uniform_conditional_min_long_forward_vcnd_branch:
				; GCN: ; %bb.0: ; %bb0
				; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: v_cmp_eq_f32_e64 s[4:5], s2, 0
				; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
				; GCN-NEXT: s_cbranch_vccz .LBB2_1
				; GCN-NEXT: .LBB2_3: ; %bb0
				; GCN-NEXT: s_getpc_b64 s[8:9]
				; GCN-NEXT: .Lpost_getpc1:
				; GCN-NEXT: s_add_u32 s8, s8, (.LBB2_2-.Lpost_getpc1)&4294967295
				; GCN-NEXT: s_addc_u32 s9, s9, (.LBB2_2-.Lpost_getpc1)>>32
				; GCN-NEXT: s_setpc_b64 s[8:9]
				; GCN-NEXT: .LBB2_1: ; %bb2
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; 32 bytes
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: .LBB2_2: ; %bb3
				; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
				; GCN-NEXT: s_mov_b32 s7, 0xf000
				; GCN-NEXT: s_mov_b32 s6, -1
				; GCN-NEXT: v_mov_b32_e32 v0, s2
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: buffer_store_dword v0, off, s[4:7], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
				bb0:
				%cmp = fcmp oeq float %cnd, 0.0
				br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch

				bb2:
				call void asm sideeffect " ; 32 bytes
				v_nop_e64
				v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				br label %bb3

				bb3:
				store volatile float %cnd, ptr addrspace(1) %arg
				ret void
				}

				define amdgpu_kernel void @min_long_forward_vbranch(ptr addrspace(1) %arg) #0 {
				; GCN-LABEL: min_long_forward_vbranch:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GCN-NEXT: v_mov_b32_e32 v1, 0
				; GCN-NEXT: s_mov_b32 s3, 0xf000
				; GCN-NEXT: s_mov_b32 s2, 0
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: buffer_load_dword v2, v[0:1], s[0:3], 0 addr64 glc
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v1, s1
				; GCN-NEXT: v_add_i32_e32 v0, vcc, s0, v0
				; GCN-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
				; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v2
				; GCN-NEXT: s_and_saveexec_b64 s[0:1], vcc
				; GCN-NEXT: s_cbranch_execnz .LBB3_1
				; GCN-NEXT: .LBB3_3: ; %bb
				; GCN-NEXT: s_getpc_b64 s[4:5]
				; GCN-NEXT: .Lpost_getpc2:
				; GCN-NEXT: s_add_u32 s4, s4, (.LBB3_2-.Lpost_getpc2)&4294967295
				; GCN-NEXT: s_addc_u32 s5, s5, (.LBB3_2-.Lpost_getpc2)>>32
				; GCN-NEXT: s_setpc_b64 s[4:5]
				; GCN-NEXT: .LBB3_1: ; %bb2
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; 32 bytes
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: .LBB3_2: ; %bb3
				; GCN-NEXT: s_or_b64 exec, exec, s[0:1]
				; GCN-NEXT: s_mov_b32 s0, s2
				; GCN-NEXT: s_mov_b32 s1, s2
				; GCN-NEXT: buffer_store_dword v2, v[0:1], s[0:3], 0 addr64
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
				bb:
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = zext i32 %tid to i64
				%gep = getelementptr inbounds i32, ptr addrspace(1) %arg, i64 %tid.ext
				%load = load volatile i32, ptr addrspace(1) %gep
				%cmp = icmp eq i32 %load, 0
				br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch

				bb2:
				call void asm sideeffect " ; 32 bytes
				v_nop_e64
				v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				br label %bb3

				bb3:
				store volatile i32 %load, ptr addrspace(1) %gep
				ret void
				}

				define amdgpu_kernel void @long_backward_sbranch(ptr addrspace(1) %arg) #0 {
				; GCN-LABEL: long_backward_sbranch:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_mov_b32 s0, 0
				; GCN-NEXT: .LBB4_1: ; %bb2
				; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN-NEXT: s_add_i32 s0, s0, 1
				; GCN-NEXT: s_cmp_lt_i32 s0, 10
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: s_cbranch_scc0 .LBB4_2
				; GCN-NEXT: .LBB4_3: ; %bb2
				; GCN-NEXT: ; in Loop: Header=BB4_1 Depth=1
				; GCN-NEXT: s_getpc_b64 s[2:3]
				; GCN-NEXT: .Lpost_getpc3:
				; GCN-NEXT: s_add_u32 s2, s2, (.LBB4_1-.Lpost_getpc3)&4294967295
				; GCN-NEXT: s_addc_u32 s3, s3, (.LBB4_1-.Lpost_getpc3)>>32
				; GCN-NEXT: s_setpc_b64 s[2:3]
				; GCN-NEXT: .LBB4_2: ; %bb3
				; GCN-NEXT: s_endpgm

				bb:
				br label %bb2

				bb2:
				%loop.idx = phi i32 [ 0, %bb ], [ %inc, %bb2 ]
				; 24 bytes
				call void asm sideeffect
				"v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				%inc = add nsw i32 %loop.idx, 1 ; add cost 4
				%cmp = icmp slt i32 %inc, 10 ; condition cost = 8
				br i1 %cmp, label %bb2, label %bb3 ; -

				bb3:
				ret void
				}

				; Requires expansion of unconditional branch from %bb2 to %bb4 (and
				; expansion of conditional branch from %bb to %bb3.

				define amdgpu_kernel void @uniform_unconditional_min_long_forward_branch(ptr addrspace(1) %arg, i32 %arg1) {
				; GCN-LABEL: uniform_unconditional_min_long_forward_branch:
				; GCN: ; %bb.0: ; %bb0
				; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_cmp_eq_u32 s2, 0
				; GCN-NEXT: s_mov_b64 s[2:3], -1
				; GCN-NEXT: s_cbranch_scc0 .LBB5_1
				; GCN-NEXT: .LBB5_7: ; %bb0
				; GCN-NEXT: s_getpc_b64 s[4:5]
				; GCN-NEXT: .Lpost_getpc5:
				; GCN-NEXT: s_add_u32 s4, s4, (.LBB5_4-.Lpost_getpc5)&4294967295
				; GCN-NEXT: s_addc_u32 s5, s5, (.LBB5_4-.Lpost_getpc5)>>32
				; GCN-NEXT: s_setpc_b64 s[4:5]
				; GCN-NEXT: .LBB5_1: ; %Flow
				; GCN-NEXT: s_andn2_b64 vcc, exec, s[2:3]
				; GCN-NEXT: s_cbranch_vccnz .LBB5_3
				; GCN-NEXT: .LBB5_2: ; %bb2
				; GCN-NEXT: s_mov_b32 s3, 0xf000
				; GCN-NEXT: s_mov_b32 s2, -1
				; GCN-NEXT: v_mov_b32_e32 v0, 17
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: .LBB5_3: ; %bb4
				; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; GCN-NEXT: s_mov_b32 s3, 0xf000
				; GCN-NEXT: s_mov_b32 s2, -1
				; GCN-NEXT: s_waitcnt expcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v0, 63
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
				; GCN-NEXT: .LBB5_4: ; %bb3
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: s_mov_b64 vcc, exec
				; GCN-NEXT: s_cbranch_execnz .LBB5_5
				; GCN-NEXT: .LBB5_9: ; %bb3
				; GCN-NEXT: s_getpc_b64 s[4:5]
				; GCN-NEXT: .Lpost_getpc6:
				; GCN-NEXT: s_add_u32 s4, s4, (.LBB5_2-.Lpost_getpc6)&4294967295
				; GCN-NEXT: s_addc_u32 s5, s5, (.LBB5_2-.Lpost_getpc6)>>32
				; GCN-NEXT: s_setpc_b64 s[4:5]
				; GCN-NEXT: .LBB5_5: ; %bb3
				; GCN-NEXT: s_getpc_b64 s[4:5]
				; GCN-NEXT: .Lpost_getpc4:
				; GCN-NEXT: s_add_u32 s4, s4, (.LBB5_3-.Lpost_getpc4)&4294967295
				; GCN-NEXT: s_addc_u32 s5, s5, (.LBB5_3-.Lpost_getpc4)>>32
				; GCN-NEXT: s_setpc_b64 s[4:5]
				bb0:
				%tmp = icmp ne i32 %arg1, 0
				br i1 %tmp, label %bb2, label %bb3

				bb2:
				store volatile i32 17, ptr addrspace(1) undef
				br label %bb4

				bb3:
				; 32 byte asm
				call void asm sideeffect
				"v_nop_e64
				v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				br label %bb4

				bb4:
				store volatile i32 63, ptr addrspace(1) %arg
				ret void
				}

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-after-pei.ll

Context not available.
	; AFTER-PEI-NEXT: occupancy: 5	; AFTER-PEI-NEXT: occupancy: 5
	; AFTER-PEI-NEXT: scavengeFI: '%fixed-stack.0'	; AFTER-PEI-NEXT: scavengeFI: '%fixed-stack.0'
	; AFTER-PEI-NEXT: vgprForAGPRCopy: ''	; AFTER-PEI-NEXT: vgprForAGPRCopy: ''
		; AFTER-PEI-NEXT: longBranchReservedReg: ''
	; AFTER-PEI-NEXT: body:	; AFTER-PEI-NEXT: body:
	define amdgpu_kernel void @scavenge_fi(ptr addrspace(1) %out, i32 %in) #0 {	define amdgpu_kernel void @scavenge_fi(ptr addrspace(1) %out, i32 %in) #0 {
	%wide.sgpr0 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0	%wide.sgpr0 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0
Context not available.

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -amdgpu-s-branch-bits=4 -stop-after=branch-relaxation -verify-machineinstrs %s -o - \| FileCheck %s

				; Test that long branch reserved register is serialized through
				; MIR.

				; CHECK-LABEL: {{^}}name: uniform_long_forward_branch
				; CHECK: machineFunctionInfo:
				; CHECK-NEXT: explicitKernArgSize: 12
				; CHECK-NEXT: maxKernArgAlign: 8
				; CHECK-NEXT: ldsSize: 0
				; CHECK-NEXT: gdsSize: 0
				; CHECK-NEXT: dynLDSAlign: 1
				; CHECK-NEXT: isEntryFunction: true
				; CHECK-NEXT: noSignedZerosFPMath: false
				; CHECK-NEXT: memoryBound: false
				; CHECK-NEXT: waveLimiter: false
				; CHECK-NEXT: hasSpilledSGPRs: false
				; CHECK-NEXT: hasSpilledVGPRs: false
				; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
				; CHECK-NEXT: frameOffsetReg: '$fp_reg'
				; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
				; CHECK-NEXT: bytesInStackArgArea: 0
				; CHECK-NEXT: returnsVoid: true
				; CHECK-NEXT: argumentInfo:
				; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
				; CHECK-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
				; CHECK-NEXT: workGroupIDX: { reg: '$sgpr6' }
				; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
				; CHECK-NEXT: workItemIDX: { reg: '$vgpr0' }
				; CHECK-NEXT: psInputAddr: 0
				; CHECK-NEXT: psInputEnable: 0
				; CHECK-NEXT: mode:
				; CHECK-NEXT: ieee: true
				; CHECK-NEXT: dx10-clamp: true
				; CHECK-NEXT: fp32-input-denormals: true
				; CHECK-NEXT: fp32-output-denormals: true
				; CHECK-NEXT: fp64-fp16-input-denormals: true
				; CHECK-NEXT: fp64-fp16-output-denormals: true
				; CHECK-NEXT: BitsOf32BitAddress: 0
				; CHECK-NEXT: occupancy: 8
				; CHECK-NEXT: vgprForAGPRCopy: ''
				; CHECK-NEXT: longBranchReservedReg: '$sgpr2_sgpr3'
				; CHECK-NEXT: body:
				define amdgpu_kernel void @uniform_long_forward_branch(ptr addrspace(1) %arg, i32 %arg1) {
				bb0:
				%tmp = icmp ne i32 %arg1, 0
				br i1 %tmp, label %bb2, label %bb3

				bb2:
				store volatile i32 17, ptr addrspace(1) undef
				br label %bb4

				bb3:
				; 32 byte asm
				call void asm sideeffect
				"v_nop_e64
				v_nop_e64
				v_nop_e64
				v_nop_e64", ""() #0
				br label %bb4

				bb4:
				store volatile i32 63, ptr addrspace(1) %arg
				ret void
				}

				attributes #0 = { nounwind }
				arsenmUnsubmitted Done Reply Inline Actions Add a test to show no change for debug instructions arsenm: Add a test to show no change for debug instructions
				attributes #1 = { nounwind readnone }
				Pierre-vhUnsubmitted Done Reply Inline Actions missing newline Pierre-vh: missing newline

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

Context not available.
	# FULL-NEXT: highBitsOf32BitAddress: 0	# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 8	# FULL-NEXT: occupancy: 8
	# FULL-NEXT: vgprForAGPRCopy: ''	# FULL-NEXT: vgprForAGPRCopy: ''
		# FULL-NEXT: longBranchReservedReg: ''
	# FULL-NEXT: body:	# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:	# SIMPLE: machineFunctionInfo:
Context not available.
	# FULL-NEXT: highBitsOf32BitAddress: 0	# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 8	# FULL-NEXT: occupancy: 8
	# FULL-NEXT: vgprForAGPRCopy: ''	# FULL-NEXT: vgprForAGPRCopy: ''
		# FULL-NEXT: longBranchReservedReg: ''
	# FULL-NEXT: body:	# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:	# SIMPLE: machineFunctionInfo:
Context not available.
	# FULL-NEXT: highBitsOf32BitAddress: 0	# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 8	# FULL-NEXT: occupancy: 8
	# FULL-NEXT: vgprForAGPRCopy: ''	# FULL-NEXT: vgprForAGPRCopy: ''
		# FULL-NEXT: longBranchReservedReg: ''
	# FULL-NEXT: body:	# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:	# SIMPLE: machineFunctionInfo:
Context not available.
	# FULL-NEXT: highBitsOf32BitAddress: 0	# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 8	# FULL-NEXT: occupancy: 8
	# FULL-NEXT: vgprForAGPRCopy: ''	# FULL-NEXT: vgprForAGPRCopy: ''
		# FULL-NEXT: longBranchReservedReg: ''
	# FULL-NEXT: body:	# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:	# SIMPLE: machineFunctionInfo:
Context not available.

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

Context not available.
	; CHECK-NEXT: highBitsOf32BitAddress: 0	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 8	; CHECK-NEXT: occupancy: 8
	; CHECK-NEXT: vgprForAGPRCopy: ''	; CHECK-NEXT: vgprForAGPRCopy: ''
		; CHECK-NEXT: longBranchReservedReg: ''
	; CHECK-NEXT: body:	; CHECK-NEXT: body:
	define amdgpu_kernel void @kernel(i32 %arg0, i64 %arg1, <16 x i32> %arg2) {	define amdgpu_kernel void @kernel(i32 %arg0, i64 %arg1, <16 x i32> %arg2) {
	%gep = getelementptr inbounds [512 x float], ptr addrspace(3) @lds, i32 0, i32 %arg0	%gep = getelementptr inbounds [512 x float], ptr addrspace(3) @lds, i32 0, i32 %arg0
Context not available.
	; CHECK-NEXT: highBitsOf32BitAddress: 0	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 10	; CHECK-NEXT: occupancy: 10
	; CHECK-NEXT: vgprForAGPRCopy: ''	; CHECK-NEXT: vgprForAGPRCopy: ''
		; CHECK-NEXT: longBranchReservedReg: ''
	; CHECK-NEXT: body:	; CHECK-NEXT: body:
	define amdgpu_ps void @ps_shader(i32 %arg0, i32 inreg %arg1) {	define amdgpu_ps void @ps_shader(i32 %arg0, i32 inreg %arg1) {
	%gep = getelementptr inbounds [128 x i32], ptr addrspace(2) @gds, i32 0, i32 %arg0	%gep = getelementptr inbounds [128 x i32], ptr addrspace(2) @gds, i32 0, i32 %arg0
Context not available.
	; CHECK-NEXT: highBitsOf32BitAddress: 0	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 8	; CHECK-NEXT: occupancy: 8
	; CHECK-NEXT: vgprForAGPRCopy: ''	; CHECK-NEXT: vgprForAGPRCopy: ''
		; CHECK-NEXT: longBranchReservedReg: ''
	; CHECK-NEXT: body:	; CHECK-NEXT: body:
	define void @function() {	define void @function() {
	ret void	ret void
Context not available.
	; CHECK-NEXT: highBitsOf32BitAddress: 0	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 8	; CHECK-NEXT: occupancy: 8
	; CHECK-NEXT: vgprForAGPRCopy: ''	; CHECK-NEXT: vgprForAGPRCopy: ''
		; CHECK-NEXT: longBranchReservedReg: ''
	; CHECK-NEXT: body:	; CHECK-NEXT: body:
	define void @function_nsz() #0 {	define void @function_nsz() #0 {
	ret void	ret void
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Reserve SGPR pair when long branches are presentClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 527688

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll

llvm/test/CodeGen/AMDGPU/branch-relaxation.ll

llvm/test/CodeGen/AMDGPU/literal-constant-like-operand-instruction-size.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/long-branch-reserve-register.ll

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-after-pei.ll

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

[AMDGPU] Reserve SGPR pair when long branches are present
ClosedPublic