This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
2
GCNRegPressure.cpp
1/4
SIFixSGPRCopies.cpp
1
SIFoldOperands.cpp
2
SIISelLowering.cpp
2/7
SIInstrInfo.cpp
-
SILoadStoreOptimizer.cpp
-
SIPeepholeSDWA.cpp
-
SIRegisterInfo.h
3/8
SIRegisterInfo.cpp
-
SIRegisterInfo.td
2
SIWholeQuadMode.cpp
-
Utils/
-
AMDGPUBaseInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
irtranslator-inline-asm.ll
-
inline-asm.i128.ll

Differential D109300

[AMDGPU] Make vector superclasses allocatable
ClosedPublic

Authored by cdevadas on Sep 5 2021, 10:15 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec

Commits

rG654c89d85a51: [AMDGPU] Make vector superclasses allocatable

Summary

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cdevadas created this revision.Sep 5 2021, 10:15 PM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptSep 5 2021, 10:15 PM

cdevadas requested review of this revision.Sep 5 2021, 10:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2021, 10:15 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

cdevadas added a child revision: D109301: [AMDGPU] Enable copy between VGPR and AGPR classes during regalloc.Sep 5 2021, 10:19 PM

Harbormaster completed remote builds in B122715: Diff 370841.Sep 5 2021, 10:23 PM

• hafixo added a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:44 AM

• hafixo added a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:47 AM

+@critson for awareness.

thopre removed a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:47 AM

thopre removed a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:51 AM

Can you specifically mention VGPRs and AGPRs in the commit message/description

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11395	Can you shortcut the getRegClassForReg if not virtual?

Allow generating psets for AV reglcass.
Addressed the review comments.

Harbormaster completed remote builds in B123226: Diff 371594.Sep 9 2021, 8:44 AM

Redefined multiclass AVRegClass to take vregList and aregList separately and applied the decimate operation on them individually.

Harbormaster completed remote builds in B123594: Diff 372125.Sep 12 2021, 10:26 AM

arsenm added inline comments.Sep 13 2021, 1:26 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2789	This logically doesn't flow right. This shouldn't be checking hasVectorRegisters, and just use isVGPR. You override this decision later with the AGPR class

cdevadas added inline comments.Sep 13 2021, 7:45 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2789	That check was added to consider AV classes too. The opcode is later changed to V_ACCVGPR_WRITE_B32_e64 if t is AGPR only class.

Rebase
+ Ping

Harbormaster completed remote builds in B124605: Diff 373501.Sep 20 2021, 1:50 AM

Do you know how RA will chose registers for an AV operand? V, A, and AV seem to have same AllocationPriority, so what exactly RA will be doing? At least on gfx908 we would want it to pick least congested RC. Well, after allocating everything which truly requires either V or A.
In fact I believe it will do directly the opposite, given that wider tuples have higher priority, so RA will likely start with AV before any other allocation is considered.

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
81	So for AV and VS it will always tell VGPR? It does not seem conceptually right.
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
262	Now that AV is allocatable I suspect we will miss a lot of optimizations with multiple changes like this (e.g. hasAGPRs -> isAGPRClass) and maybe even make wrong and illegal decisions because we will be unable to correctly identify if that is a VGPR or AGPR presented an AV.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
895	What will happen if that is AV?
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	I do not think there can be a separate pressure on AV. What does that mean for AV pressure if you have pressure 256 on V and 256 on A? It cannot be increased or decreased separately.

In D109300#3010576, @rampitec wrote:

Do you know how RA will chose registers for an AV operand? V, A, and AV seem to have same AllocationPriority, so what exactly RA will be doing?

You can set an explicit allocation order for a class. I think what you get now is VGPRs are higher priority than AGPRs. The real point is that RA can decide to introduce RA temporary registers to relieve pressure

In D109300#3010586, @arsenm wrote:

In D109300#3010576, @rampitec wrote:

Do you know how RA will chose registers for an AV operand? V, A, and AV seem to have same AllocationPriority, so what exactly RA will be doing?

You can set an explicit allocation order for a class. I think what you get now is VGPRs are higher priority than AGPRs. The real point is that RA can decide to introduce RA temporary registers to relieve pressure

An explicit order interleaving registers is probably needed at least on gfx908. However, it does not solve all the issues. Imagine you have set it and allocated 3x32 VGPR tuples and 3x32 AGPR tuples as a result. Then RA will start allocating smaller registers and needs 64 more VGPRs. It will end up with 96 + 64 = 160 VGPRs and 96 AGPRs, where it could get away with 128 of each class. This 2x performance difference.

Yet another question concerns gfx90a. Assume we are reading matrix C from memory into a register tuple. The mfma would need to use AGPR, but the load may use either AGPR or VGPR and has AV operand (same for a store). How likely will it happen that a VGPR tuple will be used for the load and then copied into AGPR? Can this happen?

In D109300#3010612, @rampitec wrote:

Yet another question concerns gfx90a. Assume we are reading matrix C from memory into a register tuple. The mfma would need to use AGPR, but the load may use either AGPR or VGPR and has AV operand (same for a store). How likely will it happen that a VGPR tuple will be used for the load and then copied into AGPR? Can this happen?

Ideally we would have the concrete classes chosen ahead of time (i.e. regbankselect analyses the uses and defs and sets the bank, resulting in the concrete class). The main reason to make this change is to allow the allocator to split ranges with cross class copies. We wouldn't generally want AV_* classes in the incoming MIR.

In D109300#3010650, @arsenm wrote:

In D109300#3010612, @rampitec wrote:

Yet another question concerns gfx90a. Assume we are reading matrix C from memory into a register tuple. The mfma would need to use AGPR, but the load may use either AGPR or VGPR and has AV operand (same for a store). How likely will it happen that a VGPR tuple will be used for the load and then copied into AGPR? Can this happen?

Ideally we would have the concrete classes chosen ahead of time (i.e. regbankselect analyses the uses and defs and sets the bank, resulting in the concrete class). The main reason to make this change is to allow the allocator to split ranges with cross class copies. We wouldn't generally want AV_* classes in the incoming MIR.

In general selector cannot do it. You need to know RP by class and chose an RC accordingly. Moreover, on gfx90a you would also need to chose an instruction accordingly.

In D109300#3010692, @rampitec wrote:

In D109300#3010650, @arsenm wrote:

In D109300#3010612, @rampitec wrote:

Yet another question concerns gfx90a. Assume we are reading matrix C from memory into a register tuple. The mfma would need to use AGPR, but the load may use either AGPR or VGPR and has AV operand (same for a store). How likely will it happen that a VGPR tuple will be used for the load and then copied into AGPR? Can this happen?

Ideally we would have the concrete classes chosen ahead of time (i.e. regbankselect analyses the uses and defs and sets the bank, resulting in the concrete class). The main reason to make this change is to allow the allocator to split ranges with cross class copies. We wouldn't generally want AV_* classes in the incoming MIR.

In general selector cannot do it. You need to know RP by class and chose an RC accordingly. Moreover, on gfx90a you would also need to chose an instruction accordingly.

Isn't it possible to select legal register classes by having constrained register operands in the instruction definition itself?

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
262	I agree we might miss out on certain optimizations. That should be handled (separately). COPY is a special instruction, the machine opcode changes based on the target register for AMDGPU. Any Pre-RA copy instance with AV class, I am forcing it to the VGPR class. But illegal decisions should be averted. If it happens, mostly occurs at the selection.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
895	I guess this function, `copyPhysReg` will only be called post-RA.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	The assert here https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/CodeGenRegisters.cpp#L2028 enforces at least one PSet when we make the AV classes allocatable. There should have been a flag to entirely avoid the psets for allocatable regclasses.

In D109300#3012033, @cdevadas wrote:

In D109300#3010692, @rampitec wrote:

In general selector cannot do it. You need to know RP by class and chose an RC accordingly. Moreover, on gfx90a you would also need to chose an instruction accordingly.

Isn't it possible to select legal register classes by having constrained register operands in the instruction definition itself?

You could do specific instructions. Moreover, gfx90a has introduced _acd andf _vcd mfma variants because register classes of SrcC and Dst are tied. For SrcA and SrcB we actually have a freedom to use either RC independently. So we need to make a decision which register class to use and then which instruction to use consequentially. That decision cannot be reasonably done at selection time, in fact it can only be done during regalloc when we know actual pressure. Moreover, the decision will be different for gfx908 and gfx90a. For gfx908 you would want to allocate equal amount of both types of registers, and for gfx90a you would want to stay with VGPRs as long as you can. Then changing an RC will also require replacement of other instructions dealing with it, like v_accvgpr_* will turn into v_mov_b32 and vice versa, some v_accvgpr_* instructions dropped etc. And all of that is only possible inside the RA.

Right now we do not do anything like this, SrcC and Dst are always AGPRs, then SrcA and SrcB are VGPRs I believe. It is a suboptimal but easy choice. But since you are enabling AV operands you are leaving this to RA to decide, and I have a question how exactly will it decide? We could constraint reg classes of mfma operands just like we do now. That is probably an easiest way to at least preserve the functionality.

One thing to keep in mind: and allocatable AV class directly affects instructions we have to generate, not just select but also create in many different passes. I.e. if the specific class is AGPR we have to use v_accvgpr_*, if that is VGPR we have to use v_mob_b32. Then v_accvgpr_write_b32 does not work with immediates on gfx908, it does not work with SGPR sources, loads and stores do not work with AGPRs in gfx908 etc. I.e. that would be a huge and unsound codegen change across the BE.

With all of that I assume we have to always use specific RC for everything we select and generate. We basically cannot use AV as an operand of any instruction we have produced. The only use for the allocatable AV shall be RA copying registers instead of spilling. I do not think that is what this change is doing now.

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
262	The point of this code is to avoid unneeded copies between VGPRs and AGPRs. Forcing it to VGPRs misses the point.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
895	Ah, right.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	There is GeneratePressureSet to avoid it, but the issue is conceptual. I do not understand a strategy of tracking pressure for a combined class.

rampitec added inline comments.Sep 21 2021, 12:48 PM

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
81	Given the last comment, if we have no real AV operands in any instructions this would be fine.
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
730	Then this shall not be needed because we should not have situations like this.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11396	And this shall remain as is to not allow any live AV operands.

I have just applied the patch and verified there are no live AV operands after selection and basically anywhere. So I guess that clears the concerns of how RA would select an actual RC - it would not. This is what we need I suppose.
I am still unclear about PSet though. Maybe it is better to suppress PSet generation?

rampitec added inline comments.Sep 21 2021, 3:17 PM

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
262	Given no actual AV operand this is OK.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2789	You should not have any AV at this point, these all are resolved. Then it was working with AGPRs already. This portion does not seem to be needed. However, reverting it revealed problem with global isel (llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx908 -o - -verify-machineinstrs < llvm/test/CodeGen/AMDGPU/ds_gws_align.ll): %13:av_32 = COPY %8:sreg_32 %14:av_32 = COPY %9:sreg_32 You have w/a this by folding the immediate, but really global isel shall not produce AV operands.
4844	It is still isAGPR. No need to legalize VGPR.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2420	So again this is not that simple with a combined class. This is a lower bound, probably twice smaller than in reality. But I guess we cannot answer better.
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
1430	It can only be VGPR here.
1440	Ditto.

cdevadas added inline comments.Sep 22 2021, 5:59 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	GeneratePressureSet doesn't help to avoid the PSet entirely. As you can see, for all AV classes except AV_32, GeneratePressureSet is currently zero. The moment I reset this flag for AV_32, it asserts as I indicated earlier.

rampitec added inline comments.Sep 22 2021, 10:40 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	I did not see that assert because I was using optimized tablegen. Shall we omit the assert if GeneratePressureSet = 0? On practice it passes testing with GeneratePressureSet = 0 and optimized tablegen.

cdevadas added inline comments.Sep 22 2021, 11:01 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	Yes, it all worked well with no Psets generated for AV classes when I used the optimized tablegen. I can post a patch to remove that assertion.

rampitec added inline comments.Sep 22 2021, 11:11 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2432	Maybe just add `\|\| !GeneratePressureSet` to the assertion?

Removed psets entirely for AV classes + Addressed other comments.

cdevadas added a parent revision: D110305: [TableGen] Allow targets to entirely ignore Psets for registers.Sep 23 2021, 10:22 AM

Harbormaster completed remote builds in B125393: Diff 374601.Sep 23 2021, 10:54 AM

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

In D109300#3018937, @rampitec wrote:

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

The MAIInst uses AVSrc_32 for src0 & src1 operands.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L378

I am not sure we can do something to prevent GIsel from choosing AV classes here.
Yes, the w/a in FoldImmediate was done to adjust the register class as per the opcode we choose.

In D109300#3019936, @cdevadas wrote:

In D109300#3018937, @rampitec wrote:

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

The MAIInst uses AVSrc_32 for src0 & src1 operands.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L378

I am not sure we can do something to prevent GIsel from choosing AV classes here.
Yes, the w/a in FoldImmediate was done to adjust the register class as per the opcode we choose.

Right, that's the only place where we currently have the freedom to chose either v or a, and it is supposed to be resolved at selection. The need to use AV class here comes from MC support so that asm will be free to use any register.

I believe custom code in regbank select can refine the register class for these 2 operands.

I have to note that we cannot rely on the optimization for correctness, if FoldImmediate will not run or succeed we will have broken IR and failure, so it has to be fixed in any case.

In D109300#3019936, @cdevadas wrote:

In D109300#3018937, @rampitec wrote:

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

The MAIInst uses AVSrc_32 for src0 & src1 operands.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L378

I am not sure we can do something to prevent GIsel from choosing AV classes here.
Yes, the w/a in FoldImmediate was done to adjust the register class as per the opcode we choose.

Regbankselect is only implemented in a way that should just work and is probably not optimal. In the future we could improve the selector to take the hint of the incoming bank to select the specific A/V class

In D109300#3021666, @arsenm wrote:

In D109300#3019936, @cdevadas wrote:

In D109300#3018937, @rampitec wrote:

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

The MAIInst uses AVSrc_32 for src0 & src1 operands.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L378

I am not sure we can do something to prevent GIsel from choosing AV classes here.
Yes, the w/a in FoldImmediate was done to adjust the register class as per the opcode we choose.

Regbankselect is only implemented in a way that should just work and is probably not optimal. In the future we could improve the selector to take the hint of the incoming bank to select the specific A/V class

The problem is exactly it does not work if AV is selected. This needs to be resolved to V or A at selection, otherwise you get a compilation crash (and actually cannot reasonably select related instructions, it will not work).

In D109300#3019936, @cdevadas wrote:

In D109300#3018937, @rampitec wrote:

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

The MAIInst uses AVSrc_32 for src0 & src1 operands.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L378

I am not sure we can do something to prevent GIsel from choosing AV classes here.
Yes, the w/a in FoldImmediate was done to adjust the register class as per the opcode we choose.

Something is missing code to constrain or insert a constraining copy when the instructions are initially emitted

In D109300#3021674, @arsenm wrote:

In D109300#3019936, @cdevadas wrote:

In D109300#3018937, @rampitec wrote:

Thanks. FoldImmediate() and Global ISel seems to be last issue to me.

The MAIInst uses AVSrc_32 for src0 & src1 operands.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L378

I am not sure we can do something to prevent GIsel from choosing AV classes here.
Yes, the w/a in FoldImmediate was done to adjust the register class as per the opcode we choose.

Something is missing code to constrain or insert a constraining copy when the instructions are initially emitted

During InstructionSelect, the MIOperands' regclasses are assigned for the first time with llvm::constrainOperandRegClass.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/GlobalISel/Utils.cpp#L104 indicates that the regclasses are derived from the instruction definition itself.
Only when the regclass is unallocatable, the selected regbank is used to restrain the regclass for the MO (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/GlobalISel/Utils.cpp#L114).
This is the reason SRC0 & SRC1 of MAIInst get AV class even though regbankSelect has managed to give them both VRegBank.
Should we restrain the MIOperand regclasses based on the incoming regbank when the corresponding RC in the instruction definition is its superclass?

In D109300#3073129, @cdevadas wrote:

Should we restrain the MIOperand regclasses based on the incoming regbank when the corresponding RC in the instruction definition is its superclass?

Yes, I'm surprised this doesn't happen already. Just the instruction definition can be ambiguous

cdevadas mentioned this in D112323: GlobalISel/Utils: Use incoming regbank while constraining the superclasses.Oct 22 2021, 9:09 AM

cdevadas added a parent revision: D112323: GlobalISel/Utils: Use incoming regbank while constraining the superclasses.

In D109300#3076671, @arsenm wrote:

In D109300#3073129, @cdevadas wrote:

Should we restrain the MIOperand regclasses based on the incoming regbank when the corresponding RC in the instruction definition is its superclass?

Yes, I'm surprised this doesn't happen already. Just the instruction definition can be ambiguous

Posted D112323.

cdevadas mentioned this in rGaa2d3b59ce75: GlobalISel/Utils: Use incoming regbank while constraining the superclasses.Oct 30 2021, 4:22 AM

Removed the workaround post-Selection that forced a Vreg class for the occurrences of AV classes. With D112323, we no longer choose AV superclasses during instruction selection. They either become a Vreg class or an Areg class.

Harbormaster completed remote builds in B131624: Diff 383645.Oct 31 2021, 3:50 AM

LGTM

This revision is now accepted and ready to land.Nov 1 2021, 12:01 PM

Closed by commit rG654c89d85a51: [AMDGPU] Make vector superclasses allocatable (authored by cdevadas). · Explain WhyNov 25 2021, 9:50 PM

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rG654c89d85a51: [AMDGPU] Make vector superclasses allocatable.

cdevadas removed a child revision: D109301: [AMDGPU] Enable copy between VGPR and AGPR classes during regalloc.Nov 29 2021, 2:57 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

10 lines

4 lines

16 lines

8 lines

55 lines

SILoadStoreOptimizer.cpp

2 lines

2 lines

9 lines

85 lines

44 lines

6 lines

Utils/

AMDGPUBaseInfo.cpp

16 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

irtranslator-inline-asm.ll

2 lines

inline-asm.i128.ll

24 lines

Diff 371594

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	///////////////////////////////////////////////////////////////////////////////			///////////////////////////////////////////////////////////////////////////////
	// GCNRegPressure			// GCNRegPressure

	unsigned GCNRegPressure::getRegKind(Register Reg,			unsigned GCNRegPressure::getRegKind(Register Reg,
	const MachineRegisterInfo &MRI) {			const MachineRegisterInfo &MRI) {
	assert(Reg.isVirtual());			assert(Reg.isVirtual());
	const auto RC = MRI.getRegClass(Reg);			const auto RC = MRI.getRegClass(Reg);
	auto STI = static_cast<const SIRegisterInfo*>(MRI.getTargetRegisterInfo());			auto STI = static_cast<const SIRegisterInfo*>(MRI.getTargetRegisterInfo());
	return STI->isSGPRClass(RC) ?			return STI->isSGPRClass(RC)
	(STI->getRegSizeInBits(*RC) == 32 ? SGPR32 : SGPR_TUPLE) :			? (STI->getRegSizeInBits(*RC) == 32 ? SGPR32 : SGPR_TUPLE)
	STI->hasAGPRs(RC) ?			: STI->isAGPRClass(RC)
	(STI->getRegSizeInBits(*RC) == 32 ? AGPR32 : AGPR_TUPLE) :			? (STI->getRegSizeInBits(*RC) == 32 ? AGPR32 : AGPR_TUPLE)
	(STI->getRegSizeInBits(*RC) == 32 ? VGPR32 : VGPR_TUPLE);			: (STI->getRegSizeInBits(*RC) == 32 ? VGPR32 : VGPR_TUPLE);
				rampitecUnsubmitted Not Done Reply Inline Actions So for AV and VS it will always tell VGPR? It does not seem conceptually right. rampitec: So for AV and VS it will always tell VGPR? It does not seem conceptually right.
				rampitecUnsubmitted Not Done Reply Inline Actions Given the last comment, if we have no real AV operands in any instructions this would be fine. rampitec: Given the last comment, if we have no real AV operands in any instructions this would be fine.
	}			}

	void GCNRegPressure::inc(unsigned Reg,			void GCNRegPressure::inc(unsigned Reg,
	LaneBitmask PrevMask,			LaneBitmask PrevMask,
	LaneBitmask NewMask,			LaneBitmask NewMask,
	const MachineRegisterInfo &MRI) {			const MachineRegisterInfo &MRI) {
	if (SIRegisterInfo::getNumCoveredRegs(NewMask) ==			if (SIRegisterInfo::getNumCoveredRegs(NewMask) ==
	SIRegisterInfo::getNumCoveredRegs(PrevMask))			SIRegisterInfo::getNumCoveredRegs(PrevMask))
	▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	static bool foldVGPRCopyIntoRegSequence(MachineInstr &MI,
// SGPRy = REG_SEQUENCE SGPRx, sub0 ...		// SGPRy = REG_SEQUENCE SGPRx, sub0 ...
// VGPRz = COPY SGPRy		// VGPRz = COPY SGPRy

// =>		// =>
// VGPRx = COPY SGPRx		// VGPRx = COPY SGPRx
// VGPRz = REG_SEQUENCE VGPRx, sub0		// VGPRz = REG_SEQUENCE VGPRx, sub0

MI.getOperand(0).setReg(CopyUse.getOperand(0).getReg());		MI.getOperand(0).setReg(CopyUse.getOperand(0).getReg());
bool IsAGPR = TRI->hasAGPRs(DstRC);		bool IsAGPR = TRI->isAGPRClass(DstRC);
		rampitecUnsubmitted Not Done Reply Inline Actions Now that AV is allocatable I suspect we will miss a lot of optimizations with multiple changes like this (e.g. hasAGPRs -> isAGPRClass) and maybe even make wrong and illegal decisions because we will be unable to correctly identify if that is a VGPR or AGPR presented an AV. rampitec: Now that AV is allocatable I suspect we will miss a lot of optimizations with multiple changes…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions I agree we might miss out on certain optimizations. That should be handled (separately). COPY is a special instruction, the machine opcode changes based on the target register for AMDGPU. Any Pre-RA copy instance with AV class, I am forcing it to the VGPR class. But illegal decisions should be averted. If it happens, mostly occurs at the selection. cdevadas: I agree we might miss out on certain optimizations. That should be handled (separately). COPY…
		rampitecUnsubmitted Not Done Reply Inline Actions The point of this code is to avoid unneeded copies between VGPRs and AGPRs. Forcing it to VGPRs misses the point. rampitec: The point of this code is to avoid unneeded copies between VGPRs and AGPRs. Forcing it to VGPRs…
		rampitecUnsubmitted Not Done Reply Inline Actions Given no actual AV operand this is OK. rampitec: Given no actual AV operand this is OK.

for (unsigned I = 1, N = MI.getNumOperands(); I != N; I += 2) {		for (unsigned I = 1, N = MI.getNumOperands(); I != N; I += 2) {
Register SrcReg = MI.getOperand(I).getReg();		Register SrcReg = MI.getOperand(I).getReg();
unsigned SrcSubReg = MI.getOperand(I).getSubReg();		unsigned SrcSubReg = MI.getOperand(I).getSubReg();

const TargetRegisterClass *SrcRC = MRI.getRegClass(SrcReg);		const TargetRegisterClass *SrcRC = MRI.getRegClass(SrcReg);
assert(TRI->isSGPRClass(SrcRC) &&		assert(TRI->isSGPRClass(SrcRC) &&
"Expected SGPR REG_SEQUENCE to only have SGPR inputs");		"Expected SGPR REG_SEQUENCE to only have SGPR inputs");
▲ Show 20 Lines • Show All 564 Lines • ▼ Show 20 Lines	for (const auto &Use : MRI->use_operands(Reg)) {
OpRC != &AMDGPU::VS_64RegClass) {		OpRC != &AMDGPU::VS_64RegClass) {
numVGPRUses++;		numVGPRUses++;
}		}
}		}
}		}

Register PHIRes = MI.getOperand(0).getReg();		Register PHIRes = MI.getOperand(0).getReg();
const TargetRegisterClass *RC0 = MRI->getRegClass(PHIRes);		const TargetRegisterClass *RC0 = MRI->getRegClass(PHIRes);
if (AllAGPRUses && numVGPRUses && !TRI->hasAGPRs(RC0)) {		if (AllAGPRUses && numVGPRUses && !TRI->isAGPRClass(RC0)) {
LLVM_DEBUG(dbgs() << "Moving PHI to AGPR: " << MI);		LLVM_DEBUG(dbgs() << "Moving PHI to AGPR: " << MI);
MRI->setRegClass(PHIRes, TRI->getEquivalentAGPRClass(RC0));		MRI->setRegClass(PHIRes, TRI->getEquivalentAGPRClass(RC0));
for (unsigned I = 1, N = MI.getNumOperands(); I != N; I += 2) {		for (unsigned I = 1, N = MI.getNumOperands(); I != N; I += 2) {
MachineInstr *DefMI = MRI->getVRegDef(MI.getOperand(I).getReg());		MachineInstr *DefMI = MRI->getVRegDef(MI.getOperand(I).getReg());
if (DefMI && DefMI->isPHI())		if (DefMI && DefMI->isPHI())
PHIOperands.insert(DefMI);		PHIOperands.insert(DefMI);
}		}
}		}
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 720 Lines • ▼ Show 20 Lines	if (FoldingImmLike && UseMI->isCopy()) {

// In order to fold immediates into copies, we need to change the		// In order to fold immediates into copies, we need to change the
// copy to a MOV.		// copy to a MOV.

unsigned MovOp = TII->getMovOpcode(DestRC);		unsigned MovOp = TII->getMovOpcode(DestRC);
if (MovOp == AMDGPU::COPY)		if (MovOp == AMDGPU::COPY)
return;		return;

		// Use VGPR regclass if it is an AV class.
		if (TRI->isVectorSuperClass(DestRC))
		rampitecUnsubmitted Not Done Reply Inline Actions Then this shall not be needed because we should not have situations like this. rampitec: Then this shall not be needed because we should not have situations like this.
		MRI->setRegClass(DestReg, TRI->getEquivalentVGPRClass(DestRC));

UseMI->setDesc(TII->get(MovOp));		UseMI->setDesc(TII->get(MovOp));
MachineInstr::mop_iterator ImpOpI = UseMI->implicit_operands().begin();		MachineInstr::mop_iterator ImpOpI = UseMI->implicit_operands().begin();
MachineInstr::mop_iterator ImpOpE = UseMI->implicit_operands().end();		MachineInstr::mop_iterator ImpOpE = UseMI->implicit_operands().end();
while (ImpOpI != ImpOpE) {		while (ImpOpI != ImpOpE) {
MachineInstr::mop_iterator Tmp = ImpOpI;		MachineInstr::mop_iterator Tmp = ImpOpI;
ImpOpI++;		ImpOpI++;
UseMI->RemoveOperand(UseMI->getOperandNo(Tmp));		UseMI->RemoveOperand(UseMI->getOperandNo(Tmp));
}		}
▲ Show 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	while (UseMI->isCopy() && !Op->getSubReg()) {
UseMI = Op->getParent();		UseMI = Op->getParent();
}		}

if (Op->getSubReg())		if (Op->getSubReg())
return false;		return false;

unsigned OpIdx = Op - &UseMI->getOperand(0);		unsigned OpIdx = Op - &UseMI->getOperand(0);
const MCInstrDesc &InstDesc = UseMI->getDesc();		const MCInstrDesc &InstDesc = UseMI->getDesc();
const MCOperandInfo &OpInfo = InstDesc.OpInfo[OpIdx];		if (!TRI->isVectorSuperClass(
switch (OpInfo.RegClass) {		TRI->getRegClass(InstDesc.OpInfo[OpIdx].RegClass)))
case AMDGPU::AV_32RegClassID: LLVM_FALLTHROUGH;
case AMDGPU::AV_64RegClassID: LLVM_FALLTHROUGH;
case AMDGPU::AV_96RegClassID: LLVM_FALLTHROUGH;
case AMDGPU::AV_128RegClassID: LLVM_FALLTHROUGH;
case AMDGPU::AV_160RegClassID:
break;
default:
return false;		return false;
}

const auto *NewDstRC = TRI->getEquivalentAGPRClass(MRI->getRegClass(Reg));		const auto *NewDstRC = TRI->getEquivalentAGPRClass(MRI->getRegClass(Reg));
auto Dst = MRI->createVirtualRegister(NewDstRC);		auto Dst = MRI->createVirtualRegister(NewDstRC);
auto RS = BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),		auto RS = BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),
TII->get(AMDGPU::REG_SEQUENCE), Dst);		TII->get(AMDGPU::REG_SEQUENCE), Dst);

for (unsigned I = 0; I < Defs.size(); ++I) {		for (unsigned I = 0; I < Defs.size(); ++I) {
MachineOperand *Def = Defs[I].first;		MachineOperand *Def = Defs[I].first;
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,384 Lines • ▼ Show 20 Lines	if (TII->isVOP3(MI.getOpcode())) {
if (const MCOperandInfo *OpInfo = MI.getDesc().OpInfo) {		if (const MCOperandInfo *OpInfo = MI.getDesc().OpInfo) {
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
for (auto I : { AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0),		for (auto I : { AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0),
AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src1) }) {		AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src1) }) {
if (I == -1)		if (I == -1)
break;		break;
MachineOperand &Op = MI.getOperand(I);		MachineOperand &Op = MI.getOperand(I);
if ((OpInfo[I].RegClass != llvm::AMDGPU::AV_64RegClassID &&		if (!Op.isReg() \|\| !Op.getReg().isVirtual())
OpInfo[I].RegClass != llvm::AMDGPU::AV_32RegClassID) \|\|		continue;
!Op.getReg().isVirtual() \|\| !TRI->isAGPR(MRI, Op.getReg()))		auto *RC = TRI->getRegClassForReg(MRI, Op.getReg());
		arsenmUnsubmitted Not Done Reply Inline Actions Can you shortcut the getRegClassForReg if not virtual? arsenm: Can you shortcut the getRegClassForReg if not virtual?
		if (!TRI->hasAGPRs(RC))
		rampitecUnsubmitted Not Done Reply Inline Actions And this shall remain as is to not allow any live AV operands. rampitec: And this shall remain as is to not allow any live AV operands.
continue;		continue;
auto *Src = MRI.getUniqueVRegDef(Op.getReg());		auto *Src = MRI.getUniqueVRegDef(Op.getReg());
if (!Src \|\| !Src->isCopy() \|\|		if (!Src \|\| !Src->isCopy() \|\|
!TRI->isSGPRReg(MRI, Src->getOperand(1).getReg()))		!TRI->isSGPRReg(MRI, Src->getOperand(1).getReg()))
continue;		continue;
auto *RC = TRI->getRegClassForReg(MRI, Op.getReg());
auto *NewRC = TRI->getEquivalentVGPRClass(RC);		auto *NewRC = TRI->getEquivalentVGPRClass(RC);
// All uses of agpr64 and agpr32 can also accept vgpr except for		// All uses of agpr64 and agpr32 can also accept vgpr except for
// v_accvgpr_read, but we do not produce agpr reads during selection,		// v_accvgpr_read, but we do not produce agpr reads during selection,
// so no use checks are needed.		// so no use checks are needed.
MRI.setRegClass(Op.getReg(), NewRC);		MRI.setRegClass(Op.getReg(), NewRC);
}		}
}		}

▲ Show 20 Lines • Show All 941 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 886 Lines • ▼ Show 20 Lines	if (!RI.isSGPRClass(SrcRC)) {
return;		return;
}		}
expandSGPRCopy(*this, MBB, MI, DL, DestReg, SrcReg, KillSrc, RC, Forward);		expandSGPRCopy(*this, MBB, MI, DL, DestReg, SrcReg, KillSrc, RC, Forward);
return;		return;
}		}

unsigned EltSize = 4;		unsigned EltSize = 4;
unsigned Opcode = AMDGPU::V_MOV_B32_e32;		unsigned Opcode = AMDGPU::V_MOV_B32_e32;
if (RI.hasAGPRs(RC)) {		if (RI.isAGPRClass(RC)) {
		rampitecUnsubmitted Not Done Reply Inline Actions What will happen if that is AV? rampitec: What will happen if that is AV?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions I guess this function, `copyPhysReg` will only be called post-RA. cdevadas: I guess this function, `copyPhysReg` will only be called post-RA.
		rampitecUnsubmitted Not Done Reply Inline Actions Ah, right. rampitec: Ah, right.
Opcode = (RI.hasVGPRs(SrcRC)) ?		Opcode = (RI.hasVGPRs(SrcRC)) ?
AMDGPU::V_ACCVGPR_WRITE_B32_e64 : AMDGPU::INSTRUCTION_LIST_END;		AMDGPU::V_ACCVGPR_WRITE_B32_e64 : AMDGPU::INSTRUCTION_LIST_END;
} else if (RI.hasVGPRs(RC) && RI.hasAGPRs(SrcRC)) {		} else if (RI.hasVGPRs(RC) && RI.isAGPRClass(SrcRC)) {
Opcode = AMDGPU::V_ACCVGPR_READ_B32_e64;		Opcode = AMDGPU::V_ACCVGPR_READ_B32_e64;
} else if ((Size % 64 == 0) && RI.hasVGPRs(RC) &&		} else if ((Size % 64 == 0) && RI.hasVGPRs(RC) &&
(RI.isProperlyAlignedRC(*RC) &&		(RI.isProperlyAlignedRC(*RC) &&
(SrcRC == RC \|\| RI.isSGPRClass(SrcRC)))) {		(SrcRC == RC \|\| RI.isSGPRClass(SrcRC)))) {
// TODO: In 96-bit case, could do a 64-bit mov and then a 32-bit mov.		// TODO: In 96-bit case, could do a 64-bit mov and then a 32-bit mov.
if (ST.hasPackedFP32Ops()) {		if (ST.hasPackedFP32Ops()) {
Opcode = AMDGPU::V_PK_MOV_B32;		Opcode = AMDGPU::V_PK_MOV_B32;
EltSize = 8;		EltSize = 8;
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	BuildMI(*MBB, I, DL, get(AMDGPU::V_CMP_NE_I32_e64), Reg)
.addImm(Value)		.addImm(Value)
.addReg(SrcReg);		.addReg(SrcReg);

return Reg;		return Reg;
}		}

unsigned SIInstrInfo::getMovOpcode(const TargetRegisterClass *DstRC) const {		unsigned SIInstrInfo::getMovOpcode(const TargetRegisterClass *DstRC) const {

if (RI.hasAGPRs(DstRC))		if (RI.isAGPRClass(DstRC))
return AMDGPU::COPY;		return AMDGPU::COPY;
if (RI.getRegSizeInBits(*DstRC) == 32) {		if (RI.getRegSizeInBits(*DstRC) == 32) {
return RI.isSGPRClass(DstRC) ? AMDGPU::S_MOV_B32 : AMDGPU::V_MOV_B32_e32;		return RI.isSGPRClass(DstRC) ? AMDGPU::S_MOV_B32 : AMDGPU::V_MOV_B32_e32;
} else if (RI.getRegSizeInBits(*DstRC) == 64 && RI.isSGPRClass(DstRC)) {		} else if (RI.getRegSizeInBits(*DstRC) == 64 && RI.isSGPRClass(DstRC)) {
return AMDGPU::S_MOV_B64;		return AMDGPU::S_MOV_B64;
} else if (RI.getRegSizeInBits(*DstRC) == 64 && !RI.isSGPRClass(DstRC)) {		} else if (RI.getRegSizeInBits(*DstRC) == 64 && !RI.isSGPRClass(DstRC)) {
return AMDGPU::V_MOV_B64_PSEUDO;		return AMDGPU::V_MOV_B64_PSEUDO;
}		}
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	BuildMI(MBB, MI, DL, OpDesc)
.addMemOperand(MMO)		.addMemOperand(MMO)
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);		.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);

if (RI.spillSGPRToVGPR())		if (RI.spillSGPRToVGPR())
FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);		FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);
return;		return;
}		}

unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillSaveOpcode(SpillSize)		unsigned Opcode = RI.isAGPRClass(RC) ? getAGPRSpillSaveOpcode(SpillSize)
: getVGPRSpillSaveOpcode(SpillSize);		: getVGPRSpillSaveOpcode(SpillSize);
MFI->setHasSpilledVGPRs();		MFI->setHasSpilledVGPRs();

BuildMI(MBB, MI, DL, get(Opcode))		BuildMI(MBB, MI, DL, get(Opcode))
.addReg(SrcReg, getKillRegState(isKill)) // data		.addReg(SrcReg, getKillRegState(isKill)) // data
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset
.addImm(0) // offset		.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	if (RI.isSGPRClass(RC)) {
BuildMI(MBB, MI, DL, OpDesc, DestReg)		BuildMI(MBB, MI, DL, OpDesc, DestReg)
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addMemOperand(MMO)		.addMemOperand(MMO)
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);		.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);

return;		return;
}		}

unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillRestoreOpcode(SpillSize)		unsigned Opcode = RI.isAGPRClass(RC) ? getAGPRSpillRestoreOpcode(SpillSize)
: getVGPRSpillRestoreOpcode(SpillSize);		: getVGPRSpillRestoreOpcode(SpillSize);
BuildMI(MBB, MI, DL, get(Opcode), DestReg)		BuildMI(MBB, MI, DL, get(Opcode), DestReg)
.addFrameIndex(FrameIndex) // vaddr		.addFrameIndex(FrameIndex) // vaddr
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset
.addImm(0) // offset		.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}

void SIInstrInfo::insertNoop(MachineBasicBlock &MBB,		void SIInstrInfo::insertNoop(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 1,175 Lines • ▼ Show 20 Lines	bool SIInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
// FIXME: We could handle FrameIndex values here.		// FIXME: We could handle FrameIndex values here.
if (!ImmOp->isImm())		if (!ImmOp->isImm())
return false;		return false;

unsigned Opc = UseMI.getOpcode();		unsigned Opc = UseMI.getOpcode();
if (Opc == AMDGPU::COPY) {		if (Opc == AMDGPU::COPY) {
Register DstReg = UseMI.getOperand(0).getReg();		Register DstReg = UseMI.getOperand(0).getReg();
bool Is16Bit = getOpSize(UseMI, 0) == 2;		bool Is16Bit = getOpSize(UseMI, 0) == 2;
bool isVGPRCopy = RI.isVGPR(*MRI, DstReg);		const TargetRegisterClass RC = RI.getRegClassForReg(MRI, DstReg);
unsigned NewOpc = isVGPRCopy ? AMDGPU::V_MOV_B32_e32 : AMDGPU::S_MOV_B32;		bool IsVectorRegCopy = RI.hasVectorRegisters(RC);
		arsenmUnsubmitted Not Done Reply Inline Actions This logically doesn't flow right. This shouldn't be checking hasVectorRegisters, and just use isVGPR. You override this decision later with the AGPR class arsenm: This logically doesn't flow right. This shouldn't be checking hasVectorRegisters, and just use…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions That check was added to consider AV classes too. The opcode is later changed to V_ACCVGPR_WRITE_B32_e64 if t is AGPR only class. cdevadas: That check was added to consider AV classes too. The opcode is later changed to…
		rampitecUnsubmitted Not Done Reply Inline Actions You should not have any AV at this point, these all are resolved. Then it was working with AGPRs already. This portion does not seem to be needed. However, reverting it revealed problem with global isel (llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx908 -o - -verify-machineinstrs < llvm/test/CodeGen/AMDGPU/ds_gws_align.ll): %13:av_32 = COPY %8:sreg_32 %14:av_32 = COPY %9:sreg_32 You have w/a this by folding the immediate, but really global isel shall not produce AV operands. rampitec: You should not have any AV at this point, these all are resolved. Then it was working with…
		unsigned NewOpc =
		IsVectorRegCopy ? AMDGPU::V_MOV_B32_e32 : AMDGPU::S_MOV_B32;
APInt Imm(32, ImmOp->getImm());		APInt Imm(32, ImmOp->getImm());

if (UseMI.getOperand(1).getSubReg() == AMDGPU::hi16)		if (UseMI.getOperand(1).getSubReg() == AMDGPU::hi16)
Imm = Imm.ashr(16);		Imm = Imm.ashr(16);

if (RI.isAGPR(*MRI, DstReg)) {		if (RI.isVectorSuperClass(RC)) {
		MRI->setRegClass(DstReg, &AMDGPU::VGPR_32RegClass);
		} else if (RI.isAGPRClass(RC)) {
if (!isInlineConstant(Imm))		if (!isInlineConstant(Imm))
return false;		return false;
NewOpc = AMDGPU::V_ACCVGPR_WRITE_B32_e64;		NewOpc = AMDGPU::V_ACCVGPR_WRITE_B32_e64;
}		}

if (Is16Bit) {		if (Is16Bit) {
if (isVGPRCopy)		if (RI.isVGPRClass(RC))
return false; // Do not clobber vgpr_hi16		return false; // Do not clobber vgpr_hi16

if (DstReg.isVirtual() &&		if (DstReg.isVirtual() && UseMI.getOperand(0).getSubReg() != AMDGPU::lo16)
UseMI.getOperand(0).getSubReg() != AMDGPU::lo16)
return false;		return false;

UseMI.getOperand(0).setSubReg(0);		UseMI.getOperand(0).setSubReg(0);
if (DstReg.isPhysical()) {		if (DstReg.isPhysical()) {
DstReg = RI.get32BitRegister(DstReg);		DstReg = RI.get32BitRegister(DstReg);
UseMI.getOperand(0).setReg(DstReg);		UseMI.getOperand(0).setReg(DstReg);
}		}
assert(UseMI.getOperand(1).getReg().isVirtual());		assert(UseMI.getOperand(1).getReg().isVirtual());
}		}
▲ Show 20 Lines • Show All 1,046 Lines • ▼ Show 20 Lines	if (!Reg)
continue;		continue;

// FIXME: Ideally we would have separate instruction definitions with the		// FIXME: Ideally we would have separate instruction definitions with the
// aligned register constraint.		// aligned register constraint.
// FIXME: We do not verify inline asm operands, but custom inline asm		// FIXME: We do not verify inline asm operands, but custom inline asm
// verification is broken anyway		// verification is broken anyway
if (ST.needsAlignedVGPRs()) {		if (ST.needsAlignedVGPRs()) {
const TargetRegisterClass *RC = RI.getRegClassForReg(MRI, Reg);		const TargetRegisterClass *RC = RI.getRegClassForReg(MRI, Reg);
const bool IsVGPR = RI.hasVGPRs(RC);		if (RI.hasVectorRegisters(RC) && MO.getSubReg()) {
const bool IsAGPR = !IsVGPR && RI.hasAGPRs(RC);
if ((IsVGPR \|\| IsAGPR) && MO.getSubReg()) {
const TargetRegisterClass *SubRC =		const TargetRegisterClass *SubRC =
RI.getSubRegClass(RC, MO.getSubReg());		RI.getSubRegClass(RC, MO.getSubReg());
RC = RI.getCompatibleSubRegClass(RC, SubRC, MO.getSubReg());		RC = RI.getCompatibleSubRegClass(RC, SubRC, MO.getSubReg());
if (RC)		if (RC)
RC = SubRC;		RC = SubRC;
}		}

// Check that this is the aligned version of the class.		// Check that this is the aligned version of the class.
▲ Show 20 Lines • Show All 954 Lines • ▼ Show 20 Lines	if (Src1.isReg() && RI.isVGPR(MRI, Src1.getReg())) {
BuildMI(*MI.getParent(), MI, DL, get(AMDGPU::V_READFIRSTLANE_B32), Reg)		BuildMI(*MI.getParent(), MI, DL, get(AMDGPU::V_READFIRSTLANE_B32), Reg)
.add(Src1);		.add(Src1);
Src1.ChangeToRegister(Reg, false);		Src1.ChangeToRegister(Reg, false);
}		}
return;		return;
}		}

// No VOP2 instructions support AGPRs.		// No VOP2 instructions support AGPRs.
if (Src0.isReg() && RI.isAGPR(MRI, Src0.getReg()))		if (Src0.isReg() && RI.hasAGPRs(RI.getRegClassForReg(MRI, Src0.getReg())))
		rampitecUnsubmitted Not Done Reply Inline Actions It is still isAGPR. No need to legalize VGPR. rampitec: It is still isAGPR. No need to legalize VGPR.
legalizeOpWithMove(MI, Src0Idx);		legalizeOpWithMove(MI, Src0Idx);

if (Src1.isReg() && RI.isAGPR(MRI, Src1.getReg()))		if (Src1.isReg() && RI.hasAGPRs(RI.getRegClassForReg(MRI, Src1.getReg())))
legalizeOpWithMove(MI, Src1Idx);		legalizeOpWithMove(MI, Src1Idx);

// VOP2 src0 instructions support all operand types, so we don't need to check		// VOP2 src0 instructions support all operand types, so we don't need to check
// their legality. If src1 is already legal, we don't need to do anything.		// their legality. If src1 is already legal, we don't need to do anything.
if (isLegalRegOperand(MRI, InstrDesc.OpInfo[Src1Idx], Src1))		if (isLegalRegOperand(MRI, InstrDesc.OpInfo[Src1Idx], Src1))
return;		return;

// Special case: V_READLANE_B32 accepts only immediate or SGPR operands for		// Special case: V_READLANE_B32 accepts only immediate or SGPR operands for
▲ Show 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	if (MI.getOpcode() == AMDGPU::PHI) {
// otherwise we will create illegal VGPR->SGPR copies when legalizing		// otherwise we will create illegal VGPR->SGPR copies when legalizing
// them.		// them.
if (VRC \|\| !RI.isSGPRClass(getOpRegClass(MI, 0))) {		if (VRC \|\| !RI.isSGPRClass(getOpRegClass(MI, 0))) {
if (!VRC) {		if (!VRC) {
assert(SRC);		assert(SRC);
if (getOpRegClass(MI, 0) == &AMDGPU::VReg_1RegClass) {		if (getOpRegClass(MI, 0) == &AMDGPU::VReg_1RegClass) {
VRC = &AMDGPU::VReg_1RegClass;		VRC = &AMDGPU::VReg_1RegClass;
} else		} else
VRC = RI.hasAGPRs(getOpRegClass(MI, 0))		VRC = RI.isAGPRClass(getOpRegClass(MI, 0))
? RI.getEquivalentAGPRClass(SRC)		? RI.getEquivalentAGPRClass(SRC)
: RI.getEquivalentVGPRClass(SRC);		: RI.getEquivalentVGPRClass(SRC);
} else {		} else {
VRC = RI.hasAGPRs(getOpRegClass(MI, 0))		VRC = RI.isAGPRClass(getOpRegClass(MI, 0))
? RI.getEquivalentAGPRClass(VRC)		? RI.getEquivalentAGPRClass(VRC)
: RI.getEquivalentVGPRClass(VRC);		: RI.getEquivalentVGPRClass(VRC);
}		}
RC = VRC;		RC = VRC;
} else {		} else {
RC = SRC;		RC = SRC;
}		}

// Update all the operands so they have the same type.		// Update all the operands so they have the same type.
for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {		for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
▲ Show 20 Lines • Show All 1,428 Lines • ▼ Show 20 Lines	const TargetRegisterClass *SIInstrInfo::getDestEquivalentVGPRClass(
case AMDGPU::PHI:		case AMDGPU::PHI:
case AMDGPU::REG_SEQUENCE:		case AMDGPU::REG_SEQUENCE:
case AMDGPU::INSERT_SUBREG:		case AMDGPU::INSERT_SUBREG:
case AMDGPU::WQM:		case AMDGPU::WQM:
case AMDGPU::SOFT_WQM:		case AMDGPU::SOFT_WQM:
case AMDGPU::STRICT_WWM:		case AMDGPU::STRICT_WWM:
case AMDGPU::STRICT_WQM: {		case AMDGPU::STRICT_WQM: {
const TargetRegisterClass *SrcRC = getOpRegClass(Inst, 1);		const TargetRegisterClass *SrcRC = getOpRegClass(Inst, 1);
if (RI.hasAGPRs(SrcRC)) {		if (RI.isAGPRClass(SrcRC)) {
if (RI.hasAGPRs(NewDstRC))		if (RI.isAGPRClass(NewDstRC))
return nullptr;		return nullptr;

switch (Inst.getOpcode()) {		switch (Inst.getOpcode()) {
case AMDGPU::PHI:		case AMDGPU::PHI:
case AMDGPU::REG_SEQUENCE:		case AMDGPU::REG_SEQUENCE:
case AMDGPU::INSERT_SUBREG:		case AMDGPU::INSERT_SUBREG:
NewDstRC = RI.getEquivalentAGPRClass(NewDstRC);		NewDstRC = RI.getEquivalentAGPRClass(NewDstRC);
break;		break;
default:		default:
NewDstRC = RI.getEquivalentVGPRClass(NewDstRC);		NewDstRC = RI.getEquivalentVGPRClass(NewDstRC);
}		}

if (!NewDstRC)		if (!NewDstRC)
return nullptr;		return nullptr;
} else {		} else {
if (RI.hasVGPRs(NewDstRC) \|\| NewDstRC == &AMDGPU::VReg_1RegClass)		if (RI.isVGPRClass(NewDstRC) \|\| NewDstRC == &AMDGPU::VReg_1RegClass)
return nullptr;		return nullptr;

NewDstRC = RI.getEquivalentVGPRClass(NewDstRC);		NewDstRC = RI.getEquivalentVGPRClass(NewDstRC);
if (!NewDstRC)		if (!NewDstRC)
return nullptr;		return nullptr;
}		}

return NewDstRC;		return NewDstRC;
▲ Show 20 Lines • Show All 969 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

Show First 20 Lines • Show All 1,580 Lines • ▼ Show 20 Lines	if (CI.InstClass == S_BUFFER_LOAD_IMM) {
case 8:		case 8:
return &AMDGPU::SGPR_256RegClass;		return &AMDGPU::SGPR_256RegClass;
case 16:		case 16:
return &AMDGPU::SGPR_512RegClass;		return &AMDGPU::SGPR_512RegClass;
}		}
}		}

unsigned BitWidth = 32 * (CI.Width + Paired.Width);		unsigned BitWidth = 32 * (CI.Width + Paired.Width);
return TRI->hasAGPRs(getDataRegClass(*CI.I))		return TRI->isAGPRClass(getDataRegClass(*CI.I))
? TRI->getAGPRClassForBitWidth(BitWidth)		? TRI->getAGPRClassForBitWidth(BitWidth)
: TRI->getVGPRClassForBitWidth(BitWidth);		: TRI->getVGPRClassForBitWidth(BitWidth);
}		}

MachineBasicBlock::iterator SILoadStoreOptimizer::mergeBufferStorePair(		MachineBasicBlock::iterator SILoadStoreOptimizer::mergeBufferStorePair(
CombineInfo &CI, CombineInfo &Paired,		CombineInfo &CI, CombineInfo &Paired,
const SmallVectorImpl<MachineInstr *> &InstsToMove) {		const SmallVectorImpl<MachineInstr *> &InstsToMove) {
MachineBasicBlock *MBB = CI.I->getParent();		MachineBasicBlock *MBB = CI.I->getParent();
▲ Show 20 Lines • Show All 637 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp

Show First 20 Lines • Show All 1,164 Lines • ▼ Show 20 Lines	void SIPeepholeSDWA::legalizeScalarOperands(MachineInstr &MI,
const MCInstrDesc &Desc = TII->get(MI.getOpcode());		const MCInstrDesc &Desc = TII->get(MI.getOpcode());
unsigned ConstantBusCount = 0;		unsigned ConstantBusCount = 0;
for (MachineOperand &Op : MI.explicit_uses()) {		for (MachineOperand &Op : MI.explicit_uses()) {
if (!Op.isImm() && !(Op.isReg() && !TRI->isVGPR(*MRI, Op.getReg())))		if (!Op.isImm() && !(Op.isReg() && !TRI->isVGPR(*MRI, Op.getReg())))
continue;		continue;

unsigned I = MI.getOperandNo(&Op);		unsigned I = MI.getOperandNo(&Op);
if (Desc.OpInfo[I].RegClass == -1 \|\|		if (Desc.OpInfo[I].RegClass == -1 \|\|
!TRI->hasVGPRs(TRI->getRegClass(Desc.OpInfo[I].RegClass)))		!TRI->isVGPRClass(TRI->getRegClass(Desc.OpInfo[I].RegClass)))
continue;		continue;

if (ST.hasSDWAScalar() && ConstantBusCount == 0 && Op.isReg() &&		if (ST.hasSDWAScalar() && ConstantBusCount == 0 && Op.isReg() &&
TRI->isSGPRReg(*MRI, Op.getReg())) {		TRI->isSGPRReg(*MRI, Op.getReg())) {
++ConstantBusCount;		++ConstantBusCount;
continue;		continue;
}		}

▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	public:

LLVM_READONLY		LLVM_READONLY
const TargetRegisterClass *getVGPRClassForBitWidth(unsigned BitWidth) const;		const TargetRegisterClass *getVGPRClassForBitWidth(unsigned BitWidth) const;

LLVM_READONLY		LLVM_READONLY
const TargetRegisterClass *getAGPRClassForBitWidth(unsigned BitWidth) const;		const TargetRegisterClass *getAGPRClassForBitWidth(unsigned BitWidth) const;

LLVM_READONLY		LLVM_READONLY
		const TargetRegisterClass *
		getVectorSuperClassForBitWidth(unsigned BitWidth) const;

		LLVM_READONLY
static const TargetRegisterClass *getSGPRClassForBitWidth(unsigned BitWidth);		static const TargetRegisterClass *getSGPRClassForBitWidth(unsigned BitWidth);

/// Return the 'base' register class for this register.		/// Return the 'base' register class for this register.
/// e.g. SGPR0 => SReg_32, VGPR => VGPR_32 SGPR0_SGPR1 -> SReg_32, etc.		/// e.g. SGPR0 => SReg_32, VGPR => VGPR_32 SGPR0_SGPR1 -> SReg_32, etc.
const TargetRegisterClass *getPhysRegClass(MCRegister Reg) const;		const TargetRegisterClass *getPhysRegClass(MCRegister Reg) const;

/// \returns true if this class contains only SGPR registers		/// \returns true if this class contains only SGPR registers
bool isSGPRClass(const TargetRegisterClass *RC) const {		bool isSGPRClass(const TargetRegisterClass *RC) const {
Show All 12 Lines	bool isVGPRClass(const TargetRegisterClass *RC) const {
return hasVGPRs(RC) && !hasAGPRs(RC);		return hasVGPRs(RC) && !hasAGPRs(RC);
}		}

/// \returns true if this class contains only AGPR registers		/// \returns true if this class contains only AGPR registers
bool isAGPRClass(const TargetRegisterClass *RC) const {		bool isAGPRClass(const TargetRegisterClass *RC) const {
return hasAGPRs(RC) && !hasVGPRs(RC);		return hasAGPRs(RC) && !hasVGPRs(RC);
}		}

		/// \returns true only if this class contains both VGPR and AGPR registers
		bool isVectorSuperClass(const TargetRegisterClass *RC) const {
		return hasVGPRs(RC) && hasAGPRs(RC);
		}

/// \returns true if this class contains VGPR registers.		/// \returns true if this class contains VGPR registers.
bool hasVGPRs(const TargetRegisterClass *RC) const;		bool hasVGPRs(const TargetRegisterClass *RC) const;

/// \returns true if this class contains AGPR registers.		/// \returns true if this class contains AGPR registers.
bool hasAGPRs(const TargetRegisterClass *RC) const;		bool hasAGPRs(const TargetRegisterClass *RC) const;

/// \returns true if this class contains any vector registers.		/// \returns true if this class contains any vector registers.
bool hasVectorRegisters(const TargetRegisterClass *RC) const {		bool hasVectorRegisters(const TargetRegisterClass *RC) const {
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 1,060 Lines • ▼ Show 20 Lines	void SIRegisterInfo::buildSpillLoadStore(
bool IsStore = Desc->mayStore();		bool IsStore = Desc->mayStore();
bool IsFlat = TII->isFLATScratch(LoadStoreOp);		bool IsFlat = TII->isFLATScratch(LoadStoreOp);

bool Scavenged = false;		bool Scavenged = false;
MCRegister SOffset = ScratchOffsetReg;		MCRegister SOffset = ScratchOffsetReg;

const TargetRegisterClass *RC = getRegClassForReg(MF->getRegInfo(), ValueReg);		const TargetRegisterClass *RC = getRegClassForReg(MF->getRegInfo(), ValueReg);
// On gfx90a+ AGPR is a regular VGPR acceptable for loads and stores.		// On gfx90a+ AGPR is a regular VGPR acceptable for loads and stores.
const bool IsAGPR = !ST.hasGFX90AInsts() && hasAGPRs(RC);		const bool IsAGPR = !ST.hasGFX90AInsts() && isAGPRClass(RC);
const unsigned RegWidth = AMDGPU::getRegBitWidth(RC->getID()) / 8;		const unsigned RegWidth = AMDGPU::getRegBitWidth(RC->getID()) / 8;

// Always use 4 byte operations for AGPRs because we need to scavenge		// Always use 4 byte operations for AGPRs because we need to scavenge
// a temporary VGPR.		// a temporary VGPR.
unsigned EltSize = (IsFlat && !IsAGPR) ? std::min(RegWidth, 16u) : 4u;		unsigned EltSize = (IsFlat && !IsAGPR) ? std::min(RegWidth, 16u) : 4u;
unsigned NumSubRegs = RegWidth / EltSize;		unsigned NumSubRegs = RegWidth / EltSize;
unsigned Size = NumSubRegs * EltSize;		unsigned Size = NumSubRegs * EltSize;
unsigned RemSize = RegWidth - Size;		unsigned RemSize = RegWidth - Size;
▲ Show 20 Lines • Show All 974 Lines • ▼ Show 20 Lines	SIRegisterInfo::getAGPRClassForBitWidth(unsigned BitWidth) const {
if (BitWidth <= 16)		if (BitWidth <= 16)
return &AMDGPU::AGPR_LO16RegClass;		return &AMDGPU::AGPR_LO16RegClass;
if (BitWidth <= 32)		if (BitWidth <= 32)
return &AMDGPU::AGPR_32RegClass;		return &AMDGPU::AGPR_32RegClass;
return ST.needsAlignedVGPRs() ? getAlignedAGPRClassForBitWidth(BitWidth)		return ST.needsAlignedVGPRs() ? getAlignedAGPRClassForBitWidth(BitWidth)
: getAnyAGPRClassForBitWidth(BitWidth);		: getAnyAGPRClassForBitWidth(BitWidth);
}		}

		static const TargetRegisterClass *
		getAnyVectorSuperClassForBitWidth(unsigned BitWidth) {
		if (BitWidth <= 64)
		return &AMDGPU::AV_64RegClass;
		if (BitWidth <= 96)
		return &AMDGPU::AV_96RegClass;
		if (BitWidth <= 128)
		return &AMDGPU::AV_128RegClass;
		if (BitWidth <= 160)
		return &AMDGPU::AV_160RegClass;
		if (BitWidth <= 192)
		return &AMDGPU::AV_192RegClass;
		if (BitWidth <= 224)
		return &AMDGPU::AV_224RegClass;
		if (BitWidth <= 256)
		return &AMDGPU::AV_256RegClass;
		if (BitWidth <= 512)
		return &AMDGPU::AV_512RegClass;
		if (BitWidth <= 1024)
		return &AMDGPU::AV_1024RegClass;

		return nullptr;
		}

		static const TargetRegisterClass *
		getAlignedVectorSuperClassForBitWidth(unsigned BitWidth) {
		if (BitWidth <= 64)
		return &AMDGPU::AV_64_Align2RegClass;
		if (BitWidth <= 96)
		return &AMDGPU::AV_96_Align2RegClass;
		if (BitWidth <= 128)
		return &AMDGPU::AV_128_Align2RegClass;
		if (BitWidth <= 160)
		return &AMDGPU::AV_160_Align2RegClass;
		if (BitWidth <= 192)
		return &AMDGPU::AV_192_Align2RegClass;
		if (BitWidth <= 224)
		return &AMDGPU::AV_224_Align2RegClass;
		if (BitWidth <= 256)
		return &AMDGPU::AV_256_Align2RegClass;
		if (BitWidth <= 512)
		return &AMDGPU::AV_512_Align2RegClass;
		if (BitWidth <= 1024)
		return &AMDGPU::AV_1024_Align2RegClass;

		return nullptr;
		}

		const TargetRegisterClass *
		SIRegisterInfo::getVectorSuperClassForBitWidth(unsigned BitWidth) const {
		if (BitWidth <= 16)
		return &AMDGPU::VGPR_LO16RegClass;
		if (BitWidth <= 32)
		return &AMDGPU::AV_32RegClass;
		return ST.needsAlignedVGPRs()
		? getAlignedVectorSuperClassForBitWidth(BitWidth)
		: getAnyVectorSuperClassForBitWidth(BitWidth);
		}

const TargetRegisterClass *		const TargetRegisterClass *
SIRegisterInfo::getSGPRClassForBitWidth(unsigned BitWidth) {		SIRegisterInfo::getSGPRClassForBitWidth(unsigned BitWidth) {
if (BitWidth <= 16)		if (BitWidth <= 16)
return &AMDGPU::SGPR_LO16RegClass;		return &AMDGPU::SGPR_LO16RegClass;
if (BitWidth <= 32)		if (BitWidth <= 32)
return &AMDGPU::SReg_32RegClass;		return &AMDGPU::SReg_32RegClass;
if (BitWidth <= 64)		if (BitWidth <= 64)
return &AMDGPU::SReg_64RegClass;		return &AMDGPU::SReg_64RegClass;
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines

const TargetRegisterClass *SIRegisterInfo::getSubRegClass(		const TargetRegisterClass *SIRegisterInfo::getSubRegClass(
const TargetRegisterClass *RC, unsigned SubIdx) const {		const TargetRegisterClass *RC, unsigned SubIdx) const {
if (SubIdx == AMDGPU::NoSubRegister)		if (SubIdx == AMDGPU::NoSubRegister)
return RC;		return RC;

// We can assume that each lane corresponds to one 32-bit register.		// We can assume that each lane corresponds to one 32-bit register.
unsigned Size = getNumChannelsFromSubReg(SubIdx) * 32;		unsigned Size = getNumChannelsFromSubReg(SubIdx) * 32;
if (isSGPRClass(RC)) {		if (isAGPRClass(RC)) {
if (Size == 32)
RC = &AMDGPU::SGPR_32RegClass;
else
RC = getSGPRClassForBitWidth(Size);
} else if (hasAGPRs(RC)) {
RC = getAGPRClassForBitWidth(Size);		RC = getAGPRClassForBitWidth(Size);
} else {		} else if (isVGPRClass(RC)) {
RC = getVGPRClassForBitWidth(Size);		RC = getVGPRClassForBitWidth(Size);
		} else if (isVectorSuperClass(RC)) {
		RC = getVectorSuperClassForBitWidth(Size);
		} else {
		RC = getSGPRClassForBitWidth(Size);
}		}
assert(RC && "Invalid sub-register class size");		assert(RC && "Invalid sub-register class size");
return RC;		return RC;
}		}

const TargetRegisterClass *		const TargetRegisterClass *
SIRegisterInfo::getCompatibleSubRegClass(const TargetRegisterClass *SuperRC,		SIRegisterInfo::getCompatibleSubRegClass(const TargetRegisterClass *SuperRC,
const TargetRegisterClass *SubRC,		const TargetRegisterClass *SubRC,
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	unsigned SIRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
unsigned Occupancy = ST.getOccupancyWithLocalMemSize(MFI->getLDSSize(),		unsigned Occupancy = ST.getOccupancyWithLocalMemSize(MFI->getLDSSize(),
MF.getFunction());		MF.getFunction());
switch (RC->getID()) {		switch (RC->getID()) {
default:		default:
return AMDGPUGenRegisterInfo::getRegPressureLimit(RC, MF);		return AMDGPUGenRegisterInfo::getRegPressureLimit(RC, MF);
case AMDGPU::VGPR_32RegClassID:		case AMDGPU::VGPR_32RegClassID:
case AMDGPU::VGPR_LO16RegClassID:		case AMDGPU::VGPR_LO16RegClassID:
case AMDGPU::VGPR_HI16RegClassID:		case AMDGPU::VGPR_HI16RegClassID:
		case AMDGPU::AV_32RegClassID:
		rampitecUnsubmitted Not Done Reply Inline Actions So again this is not that simple with a combined class. This is a lower bound, probably twice smaller than in reality. But I guess we cannot answer better. rampitec: So again this is not that simple with a combined class. This is a lower bound, probably twice…
return std::min(ST.getMaxNumVGPRs(Occupancy), ST.getMaxNumVGPRs(MF));		return std::min(ST.getMaxNumVGPRs(Occupancy), ST.getMaxNumVGPRs(MF));
case AMDGPU::SGPR_32RegClassID:		case AMDGPU::SGPR_32RegClassID:
case AMDGPU::SGPR_LO16RegClassID:		case AMDGPU::SGPR_LO16RegClassID:
return std::min(ST.getMaxNumSGPRs(Occupancy, true), ST.getMaxNumSGPRs(MF));		return std::min(ST.getMaxNumSGPRs(Occupancy, true), ST.getMaxNumSGPRs(MF));
}		}
}		}

unsigned SIRegisterInfo::getRegPressureSetLimit(const MachineFunction &MF,		unsigned SIRegisterInfo::getRegPressureSetLimit(const MachineFunction &MF,
unsigned Idx) const {		unsigned Idx) const {
if (Idx == AMDGPU::RegisterPressureSets::VGPR_32 \|\|		if (Idx == AMDGPU::RegisterPressureSets::VGPR_32 \|\|
Idx == AMDGPU::RegisterPressureSets::AGPR_32)		Idx == AMDGPU::RegisterPressureSets::AGPR_32 \|\|
		Idx == AMDGPU::RegisterPressureSets::AV_32)
		rampitecUnsubmitted Not Done Reply Inline Actions I do not think there can be a separate pressure on AV. What does that mean for AV pressure if you have pressure 256 on V and 256 on A? It cannot be increased or decreased separately. rampitec: I do not think there can be a separate pressure on AV. What does that mean for AV pressure if…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The assert here https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/CodeGenRegisters.cpp#L2028 enforces at least one PSet when we make the AV classes allocatable. There should have been a flag to entirely avoid the psets for allocatable regclasses. cdevadas: The assert here https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/CodeGenRegis…
		rampitecUnsubmitted Not Done Reply Inline Actions There is GeneratePressureSet to avoid it, but the issue is conceptual. I do not understand a strategy of tracking pressure for a combined class. rampitec: There is GeneratePressureSet to avoid it, but the issue is conceptual. I do not understand a…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions GeneratePressureSet doesn't help to avoid the PSet entirely. As you can see, for all AV classes except AV_32, GeneratePressureSet is currently zero. The moment I reset this flag for AV_32, it asserts as I indicated earlier. cdevadas: GeneratePressureSet doesn't help to avoid the PSet entirely. As you can see, for all AV classes…
		rampitecUnsubmitted Not Done Reply Inline Actions I did not see that assert because I was using optimized tablegen. Shall we omit the assert if GeneratePressureSet = 0? On practice it passes testing with GeneratePressureSet = 0 and optimized tablegen. rampitec: I did not see that assert because I was using optimized tablegen. Shall we omit the assert if…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, it all worked well with no Psets generated for AV classes when I used the optimized tablegen. I can post a patch to remove that assertion. cdevadas: Yes, it all worked well with no Psets generated for AV classes when I used the optimized…
		rampitecUnsubmitted Not Done Reply Inline Actions Maybe just add `\|\| !GeneratePressureSet` to the assertion? rampitec: Maybe just add `\|\| !GeneratePressureSet` to the assertion?
return getRegPressureLimit(&AMDGPU::VGPR_32RegClass,		return getRegPressureLimit(&AMDGPU::VGPR_32RegClass,
const_cast<MachineFunction &>(MF));		const_cast<MachineFunction &>(MF));

if (Idx == AMDGPU::RegisterPressureSets::SReg_32)		if (Idx == AMDGPU::RegisterPressureSets::SReg_32)
return getRegPressureLimit(&AMDGPU::SGPR_32RegClass,		return getRegPressureLimit(&AMDGPU::SGPR_32RegClass,
const_cast<MachineFunction &>(MF));		const_cast<MachineFunction &>(MF));

llvm_unreachable("Unexpected register pressure set!");		llvm_unreachable("Unexpected register pressure set!");
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	MCPhysReg SIRegisterInfo::get32BitRegister(MCPhysReg Reg) const {

return AMDGPU::NoRegister;		return AMDGPU::NoRegister;
}		}

bool SIRegisterInfo::isProperlyAlignedRC(const TargetRegisterClass &RC) const {		bool SIRegisterInfo::isProperlyAlignedRC(const TargetRegisterClass &RC) const {
if (!ST.needsAlignedVGPRs())		if (!ST.needsAlignedVGPRs())
return true;		return true;

if (hasVGPRs(&RC))		if (isVGPRClass(&RC))
return RC.hasSuperClassEq(getVGPRClassForBitWidth(getRegSizeInBits(RC)));		return RC.hasSuperClassEq(getVGPRClassForBitWidth(getRegSizeInBits(RC)));
if (hasAGPRs(&RC))		if (isAGPRClass(&RC))
return RC.hasSuperClassEq(getAGPRClassForBitWidth(getRegSizeInBits(RC)));		return RC.hasSuperClassEq(getAGPRClassForBitWidth(getRegSizeInBits(RC)));
		if (isVectorSuperClass(&RC))
		return RC.hasSuperClassEq(
		getVectorSuperClassForBitWidth(getRegSizeInBits(RC)));

return true;		return true;
}		}

bool SIRegisterInfo::isConstantPhysReg(MCRegister PhysReg) const {		bool SIRegisterInfo::isConstantPhysReg(MCRegister PhysReg) const {
switch (PhysReg) {		switch (PhysReg) {
case AMDGPU::SGPR_NULL:		case AMDGPU::SGPR_NULL:
case AMDGPU::SRC_SHARED_BASE:		case AMDGPU::SRC_SHARED_BASE:
Show All 25 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

Show First 20 Lines • Show All 851 Lines • ▼ Show 20 Lines	def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16], 32,
let isAllocatable = 0;		let isAllocatable = 0;
let HasVGPR = 1;		let HasVGPR = 1;
}		}

def VS_64 : SIRegisterClass<"AMDGPU", [i64, f64, v2f32], 32, (add VReg_64, SReg_64)> {		def VS_64 : SIRegisterClass<"AMDGPU", [i64, f64, v2f32], 32, (add VReg_64, SReg_64)> {
let isAllocatable = 0;		let isAllocatable = 0;
let HasVGPR = 1;		let HasVGPR = 1;
}		}
		} // End GeneratePressureSet = 0

def AV_32 : SIRegisterClass<"AMDGPU", VGPR_32.RegTypes, 32,		def AV_32 : SIRegisterClass<"AMDGPU", VGPR_32.RegTypes, 32, (add VGPR_32, AGPR_32)> {
(add AGPR_32, VGPR_32)> {
let isAllocatable = 0;
let HasVGPR = 1;
let HasAGPR = 1;
}

def AV_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
(add AReg_64, VReg_64)> {
let isAllocatable = 0;
let HasVGPR = 1;		let HasVGPR = 1;
let HasAGPR = 1;		let HasAGPR = 1;
}		}
} // End GeneratePressureSet = 0

		// Define a register tuple class, along with one requiring an even
		// aligned base register.
		multiclass AVRegClass<int numRegs, list<ValueType> regTypes, dag regList> {
let HasVGPR = 1, HasAGPR = 1 in {		let HasVGPR = 1, HasAGPR = 1 in {
def AV_96 : SIRegisterClass<"AMDGPU", VReg_96.RegTypes, 32,		// Define the regular class.
(add AReg_96, VReg_96)> {		def "" : VRegClassBase<numRegs, regTypes, regList>;
let isAllocatable = 0;
}

def AV_128 : SIRegisterClass<"AMDGPU", VReg_128.RegTypes, 32,		// Define 2-aligned variant
(add AReg_128, VReg_128)> {		def _Align2 : VRegClassBase<numRegs, regTypes, (decimate regList, 2)>;
let isAllocatable = 0;
}		}

def AV_160 : SIRegisterClass<"AMDGPU", VReg_160.RegTypes, 32,
(add AReg_160, VReg_160)> {
let isAllocatable = 0;
}		}
} // End HasVGPR = 1, HasAGPR = 1
		defm AV_64 : AVRegClass<2, VReg_64.RegTypes, (add VGPR_64, AGPR_64)>;
		defm AV_96 : AVRegClass<3, VReg_96.RegTypes, (add VGPR_96, AGPR_96)>;
		defm AV_128 : AVRegClass<4, VReg_128.RegTypes, (add VGPR_128, AGPR_128)>;
		defm AV_160 : AVRegClass<5, VReg_160.RegTypes, (add VGPR_160, AGPR_160)>;
		defm AV_192 : AVRegClass<6, VReg_160.RegTypes, (add VGPR_192, AGPR_192)>;
		defm AV_224: AVRegClass<7, VReg_160.RegTypes, (add VGPR_224, AGPR_224)>;
		defm AV_256 : AVRegClass<8, VReg_160.RegTypes, (add VGPR_256, AGPR_256)>;
		defm AV_512 : AVRegClass<16, VReg_160.RegTypes, (add VGPR_512, AGPR_512)>;
		defm AV_1024 : AVRegClass<32, VReg_160.RegTypes, (add VGPR_1024, AGPR_1024)>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Register operands		// Register operands
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class RegImmMatcher<string name> : AsmOperandClass {		class RegImmMatcher<string name> : AsmOperandClass {
let Name = name;		let Name = name;
let RenderMethod = "addRegOrImmOperands";		let RenderMethod = "addRegOrImmOperands";
▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

	Show First 20 Lines • Show All 1,421 Lines • ▼ Show 20 Lines

	void SIWholeQuadMode::lowerCopyInstrs() {			void SIWholeQuadMode::lowerCopyInstrs() {
	for (MachineInstr *MI : LowerToMovInstrs) {			for (MachineInstr *MI : LowerToMovInstrs) {
	assert(MI->getNumExplicitOperands() == 2);			assert(MI->getNumExplicitOperands() == 2);

	const Register Reg = MI->getOperand(0).getReg();			const Register Reg = MI->getOperand(0).getReg();
	const unsigned SubReg = MI->getOperand(0).getSubReg();			const unsigned SubReg = MI->getOperand(0).getSubReg();

	if (TRI->isVGPR(*MRI, Reg)) {			if (TRI->hasVGPRs(TRI->getRegClassForReg(*MRI, Reg))) {
				rampitecUnsubmitted Not Done Reply Inline Actions It can only be VGPR here. rampitec: It can only be VGPR here.
	const TargetRegisterClass *regClass =			const TargetRegisterClass *regClass =
	Reg.isVirtual() ? MRI->getRegClass(Reg) : TRI->getPhysRegClass(Reg);			Reg.isVirtual() ? MRI->getRegClass(Reg) : TRI->getPhysRegClass(Reg);
	if (SubReg)			if (SubReg)
	regClass = TRI->getSubRegClass(regClass, SubReg);			regClass = TRI->getSubRegClass(regClass, SubReg);

	const unsigned MovOp = TII->getMovOpcode(regClass);			const unsigned MovOp = TII->getMovOpcode(regClass);
	MI->setDesc(TII->get(MovOp));			MI->setDesc(TII->get(MovOp));

				// Use VGPR regclass if it is an AV class.
				if (Reg.isVirtual() && TRI->isVectorSuperClass(regClass))
				rampitecUnsubmitted Not Done Reply Inline Actions Ditto. rampitec: Ditto.
				MRI->setRegClass(Reg, TRI->getEquivalentVGPRClass(regClass));

	// Check that it already implicitly depends on exec (like all VALU movs			// Check that it already implicitly depends on exec (like all VALU movs
	// should do).			// should do).
	assert(any_of(MI->implicit_operands(), [](const MachineOperand &MO) {			assert(any_of(MI->implicit_operands(), [](const MachineOperand &MO) {
	return MO.isUse() && MO.getReg() == AMDGPU::EXEC;			return MO.isUse() && MO.getReg() == AMDGPU::EXEC;
	}));			}));
	} else {			} else {
	// Remove early-clobber and exec dependency from simple SGPR copies.			// Remove early-clobber and exec dependency from simple SGPR copies.
	// This allows some to be eliminated during/post RA.			// This allows some to be eliminated during/post RA.
	▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 1,616 Lines • ▼ Show 20 Lines	unsigned getRegBitWidth(unsigned RCID) {
case AMDGPU::VS_32RegClassID:		case AMDGPU::VS_32RegClassID:
case AMDGPU::AV_32RegClassID:		case AMDGPU::AV_32RegClassID:
case AMDGPU::SReg_32RegClassID:		case AMDGPU::SReg_32RegClassID:
case AMDGPU::SReg_32_XM0RegClassID:		case AMDGPU::SReg_32_XM0RegClassID:
case AMDGPU::SRegOrLds_32RegClassID:		case AMDGPU::SRegOrLds_32RegClassID:
return 32;		return 32;
case AMDGPU::SGPR_64RegClassID:		case AMDGPU::SGPR_64RegClassID:
case AMDGPU::VS_64RegClassID:		case AMDGPU::VS_64RegClassID:
case AMDGPU::AV_64RegClassID:
case AMDGPU::SReg_64RegClassID:		case AMDGPU::SReg_64RegClassID:
case AMDGPU::VReg_64RegClassID:		case AMDGPU::VReg_64RegClassID:
case AMDGPU::AReg_64RegClassID:		case AMDGPU::AReg_64RegClassID:
case AMDGPU::SReg_64_XEXECRegClassID:		case AMDGPU::SReg_64_XEXECRegClassID:
case AMDGPU::VReg_64_Align2RegClassID:		case AMDGPU::VReg_64_Align2RegClassID:
case AMDGPU::AReg_64_Align2RegClassID:		case AMDGPU::AReg_64_Align2RegClassID:
		case AMDGPU::AV_64RegClassID:
		case AMDGPU::AV_64_Align2RegClassID:
return 64;		return 64;
case AMDGPU::SGPR_96RegClassID:		case AMDGPU::SGPR_96RegClassID:
case AMDGPU::SReg_96RegClassID:		case AMDGPU::SReg_96RegClassID:
case AMDGPU::VReg_96RegClassID:		case AMDGPU::VReg_96RegClassID:
case AMDGPU::AReg_96RegClassID:		case AMDGPU::AReg_96RegClassID:
case AMDGPU::VReg_96_Align2RegClassID:		case AMDGPU::VReg_96_Align2RegClassID:
case AMDGPU::AReg_96_Align2RegClassID:		case AMDGPU::AReg_96_Align2RegClassID:
case AMDGPU::AV_96RegClassID:		case AMDGPU::AV_96RegClassID:
		case AMDGPU::AV_96_Align2RegClassID:
return 96;		return 96;
case AMDGPU::SGPR_128RegClassID:		case AMDGPU::SGPR_128RegClassID:
case AMDGPU::SReg_128RegClassID:		case AMDGPU::SReg_128RegClassID:
case AMDGPU::VReg_128RegClassID:		case AMDGPU::VReg_128RegClassID:
case AMDGPU::AReg_128RegClassID:		case AMDGPU::AReg_128RegClassID:
case AMDGPU::VReg_128_Align2RegClassID:		case AMDGPU::VReg_128_Align2RegClassID:
case AMDGPU::AReg_128_Align2RegClassID:		case AMDGPU::AReg_128_Align2RegClassID:
case AMDGPU::AV_128RegClassID:		case AMDGPU::AV_128RegClassID:
		case AMDGPU::AV_128_Align2RegClassID:
return 128;		return 128;
case AMDGPU::SGPR_160RegClassID:		case AMDGPU::SGPR_160RegClassID:
case AMDGPU::SReg_160RegClassID:		case AMDGPU::SReg_160RegClassID:
case AMDGPU::VReg_160RegClassID:		case AMDGPU::VReg_160RegClassID:
case AMDGPU::AReg_160RegClassID:		case AMDGPU::AReg_160RegClassID:
case AMDGPU::VReg_160_Align2RegClassID:		case AMDGPU::VReg_160_Align2RegClassID:
case AMDGPU::AReg_160_Align2RegClassID:		case AMDGPU::AReg_160_Align2RegClassID:
case AMDGPU::AV_160RegClassID:		case AMDGPU::AV_160RegClassID:
		case AMDGPU::AV_160_Align2RegClassID:
return 160;		return 160;
case AMDGPU::SGPR_192RegClassID:		case AMDGPU::SGPR_192RegClassID:
case AMDGPU::SReg_192RegClassID:		case AMDGPU::SReg_192RegClassID:
case AMDGPU::VReg_192RegClassID:		case AMDGPU::VReg_192RegClassID:
case AMDGPU::AReg_192RegClassID:		case AMDGPU::AReg_192RegClassID:
case AMDGPU::VReg_192_Align2RegClassID:		case AMDGPU::VReg_192_Align2RegClassID:
case AMDGPU::AReg_192_Align2RegClassID:		case AMDGPU::AReg_192_Align2RegClassID:
		case AMDGPU::AV_192RegClassID:
		case AMDGPU::AV_192_Align2RegClassID:
return 192;		return 192;
case AMDGPU::SGPR_224RegClassID:		case AMDGPU::SGPR_224RegClassID:
case AMDGPU::SReg_224RegClassID:		case AMDGPU::SReg_224RegClassID:
case AMDGPU::VReg_224RegClassID:		case AMDGPU::VReg_224RegClassID:
case AMDGPU::AReg_224RegClassID:		case AMDGPU::AReg_224RegClassID:
case AMDGPU::VReg_224_Align2RegClassID:		case AMDGPU::VReg_224_Align2RegClassID:
case AMDGPU::AReg_224_Align2RegClassID:		case AMDGPU::AReg_224_Align2RegClassID:
		case AMDGPU::AV_224RegClassID:
		case AMDGPU::AV_224_Align2RegClassID:
return 224;		return 224;
case AMDGPU::SGPR_256RegClassID:		case AMDGPU::SGPR_256RegClassID:
case AMDGPU::SReg_256RegClassID:		case AMDGPU::SReg_256RegClassID:
case AMDGPU::VReg_256RegClassID:		case AMDGPU::VReg_256RegClassID:
case AMDGPU::AReg_256RegClassID:		case AMDGPU::AReg_256RegClassID:
case AMDGPU::VReg_256_Align2RegClassID:		case AMDGPU::VReg_256_Align2RegClassID:
case AMDGPU::AReg_256_Align2RegClassID:		case AMDGPU::AReg_256_Align2RegClassID:
		case AMDGPU::AV_256RegClassID:
		case AMDGPU::AV_256_Align2RegClassID:
return 256;		return 256;
case AMDGPU::SGPR_512RegClassID:		case AMDGPU::SGPR_512RegClassID:
case AMDGPU::SReg_512RegClassID:		case AMDGPU::SReg_512RegClassID:
case AMDGPU::VReg_512RegClassID:		case AMDGPU::VReg_512RegClassID:
case AMDGPU::AReg_512RegClassID:		case AMDGPU::AReg_512RegClassID:
case AMDGPU::VReg_512_Align2RegClassID:		case AMDGPU::VReg_512_Align2RegClassID:
case AMDGPU::AReg_512_Align2RegClassID:		case AMDGPU::AReg_512_Align2RegClassID:
		case AMDGPU::AV_512RegClassID:
		case AMDGPU::AV_512_Align2RegClassID:
return 512;		return 512;
case AMDGPU::SGPR_1024RegClassID:		case AMDGPU::SGPR_1024RegClassID:
case AMDGPU::SReg_1024RegClassID:		case AMDGPU::SReg_1024RegClassID:
case AMDGPU::VReg_1024RegClassID:		case AMDGPU::VReg_1024RegClassID:
case AMDGPU::AReg_1024RegClassID:		case AMDGPU::AReg_1024RegClassID:
case AMDGPU::VReg_1024_Align2RegClassID:		case AMDGPU::VReg_1024_Align2RegClassID:
case AMDGPU::AReg_1024_Align2RegClassID:		case AMDGPU::AReg_1024_Align2RegClassID:
		case AMDGPU::AV_1024RegClassID:
		case AMDGPU::AV_1024_Align2RegClassID:
return 1024;		return 1024;
default:		default:
llvm_unreachable("Unexpected register class");		llvm_unreachable("Unexpected register class");
}		}
}		}

unsigned getRegBitWidth(const MCRegisterClass &RC) {		unsigned getRegBitWidth(const MCRegisterClass &RC) {
return getRegBitWidth(RC.getID());		return getRegBitWidth(RC.getID());
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	}			}

	; Check support for returning several floats			; Check support for returning several floats
	define double @test_multiple_register_outputs_mixed() #0 {			define double @test_multiple_register_outputs_mixed() #0 {
	; CHECK-LABEL: name: test_multiple_register_outputs_mixed			; CHECK-LABEL: name: test_multiple_register_outputs_mixed
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $sgpr30_sgpr31			; CHECK: liveins: $sgpr30_sgpr31
	; CHECK: [[COPY:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31			; CHECK: [[COPY:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31
	; CHECK: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* attdialect /, 1835018 / regdef:VGPR_32 /, def %1, 2883594 / regdef:VReg_64 */, def %2			; CHECK: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* attdialect /, 1835018 / regdef:VGPR_32 /, def %1, 2949130 / regdef:VReg_64 */, def %2
	; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY %1			; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY %1
	; CHECK: [[COPY2:%[0-9]+]]:_(s64) = COPY %2			; CHECK: [[COPY2:%[0-9]+]]:_(s64) = COPY %2
	; CHECK: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY2]](s64)			; CHECK: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY2]](s64)
	; CHECK: $vgpr0 = COPY [[UV]](s32)			; CHECK: $vgpr0 = COPY [[UV]](s32)
	; CHECK: $vgpr1 = COPY [[UV1]](s32)			; CHECK: $vgpr1 = COPY [[UV1]](s32)
	; CHECK: [[COPY3:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY]]			; CHECK: [[COPY3:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY]]
	; CHECK: S_SETPC_B64_return [[COPY3]], implicit $vgpr0, implicit $vgpr1			; CHECK: S_SETPC_B64_return [[COPY3]], implicit $vgpr0, implicit $vgpr1
	%1 = call { float, double } asm "v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", "=v,=v"()			%1 = call { float, double } asm "v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", "=v,=v"()
	▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -stop-after=finalize-isel -o - %s \| FileCheck -check-prefix=GFX908 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -stop-after=finalize-isel -o - %s \| FileCheck -check-prefix=GFX908 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -stop-after=finalize-isel -o - %s \| FileCheck -check-prefix=GFX90A %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -stop-after=finalize-isel -o - %s \| FileCheck -check-prefix=GFX90A %s

	; Make sure we only use one 128-bit register instead of 2 for i128 asm			; Make sure we only use one 128-bit register instead of 2 for i128 asm
	; constraints			; constraints

	define amdgpu_kernel void @s_input_output_i128() {			define amdgpu_kernel void @s_input_output_i128() {
	; GFX908-LABEL: name: s_input_output_i128			; GFX908-LABEL: name: s_input_output_i128
	; GFX908: bb.0 (%ir-block.0):			; GFX908: bb.0 (%ir-block.0):
	; GFX908: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5111818 / regdef:SGPR_128 */, def %4			; GFX908: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5701642 / regdef:SGPR_128 */, def %4
	; GFX908: [[COPY:%[0-9]+]]:sgpr_128 = COPY %4			; GFX908: [[COPY:%[0-9]+]]:sgpr_128 = COPY %4
	; GFX908: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5111817 / reguse:SGPR_128 */, [[COPY]]			; GFX908: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5701641 / reguse:SGPR_128 */, [[COPY]]
	; GFX908: S_ENDPGM 0			; GFX908: S_ENDPGM 0
	; GFX90A-LABEL: name: s_input_output_i128			; GFX90A-LABEL: name: s_input_output_i128
	; GFX90A: bb.0 (%ir-block.0):			; GFX90A: bb.0 (%ir-block.0):
	; GFX90A: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5111818 / regdef:SGPR_128 */, def %4			; GFX90A: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5701642 / regdef:SGPR_128 */, def %4
	; GFX90A: [[COPY:%[0-9]+]]:sgpr_128 = COPY %4			; GFX90A: [[COPY:%[0-9]+]]:sgpr_128 = COPY %4
	; GFX90A: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5111817 / reguse:SGPR_128 */, [[COPY]]			; GFX90A: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5701641 / reguse:SGPR_128 */, [[COPY]]
	; GFX90A: S_ENDPGM 0			; GFX90A: S_ENDPGM 0
	%val = tail call i128 asm sideeffect "; def $0", "=s"()			%val = tail call i128 asm sideeffect "; def $0", "=s"()
	call void asm sideeffect "; use $0", "s"(i128 %val)			call void asm sideeffect "; use $0", "s"(i128 %val)
	ret void			ret void
	}			}

	define amdgpu_kernel void @v_input_output_i128() {			define amdgpu_kernel void @v_input_output_i128() {
	; GFX908-LABEL: name: v_input_output_i128			; GFX908-LABEL: name: v_input_output_i128
	; GFX908: bb.0 (%ir-block.0):			; GFX908: bb.0 (%ir-block.0):
	; GFX908: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 4718602 / regdef:VReg_128 */, def %4			; GFX908: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5242890 / regdef:VReg_128 */, def %4
	; GFX908: [[COPY:%[0-9]+]]:vreg_128 = COPY %4			; GFX908: [[COPY:%[0-9]+]]:vreg_128 = COPY %4
	; GFX908: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 4718601 / reguse:VReg_128 */, [[COPY]]			; GFX908: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5242889 / reguse:VReg_128 */, [[COPY]]
	; GFX908: S_ENDPGM 0			; GFX908: S_ENDPGM 0
	; GFX90A-LABEL: name: v_input_output_i128			; GFX90A-LABEL: name: v_input_output_i128
	; GFX90A: bb.0 (%ir-block.0):			; GFX90A: bb.0 (%ir-block.0):
	; GFX90A: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 4849674 / regdef:VReg_128_Align2 */, def %4			; GFX90A: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5439498 / regdef:VReg_128_Align2 */, def %4
	; GFX90A: [[COPY:%[0-9]+]]:vreg_128_align2 = COPY %4			; GFX90A: [[COPY:%[0-9]+]]:vreg_128_align2 = COPY %4
	; GFX90A: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 4849673 / reguse:VReg_128_Align2 */, [[COPY]]			; GFX90A: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5439497 / reguse:VReg_128_Align2 */, [[COPY]]
	; GFX90A: S_ENDPGM 0			; GFX90A: S_ENDPGM 0
	%val = tail call i128 asm sideeffect "; def $0", "=v"()			%val = tail call i128 asm sideeffect "; def $0", "=v"()
	call void asm sideeffect "; use $0", "v"(i128 %val)			call void asm sideeffect "; use $0", "v"(i128 %val)
	ret void			ret void
	}			}

	define amdgpu_kernel void @a_input_output_i128() {			define amdgpu_kernel void @a_input_output_i128() {
	; GFX908-LABEL: name: a_input_output_i128			; GFX908-LABEL: name: a_input_output_i128
	; GFX908: bb.0 (%ir-block.0):			; GFX908: bb.0 (%ir-block.0):
	; GFX908: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 4653066 / regdef:AReg_128 */, def %4			; GFX908: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5046282 / regdef:AReg_128 */, def %4
	; GFX908: [[COPY:%[0-9]+]]:areg_128 = COPY %4			; GFX908: [[COPY:%[0-9]+]]:areg_128 = COPY %4
	; GFX908: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 4653065 / reguse:AReg_128 */, [[COPY]]			; GFX908: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5046281 / reguse:AReg_128 */, [[COPY]]
	; GFX908: S_ENDPGM 0			; GFX908: S_ENDPGM 0
	; GFX90A-LABEL: name: a_input_output_i128			; GFX90A-LABEL: name: a_input_output_i128
	; GFX90A: bb.0 (%ir-block.0):			; GFX90A: bb.0 (%ir-block.0):
	; GFX90A: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 4784138 / regdef:AReg_128_Align2 */, def %4			; GFX90A: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5373962 / regdef:AReg_128_Align2 */, def %4
	; GFX90A: [[COPY:%[0-9]+]]:areg_128_align2 = COPY %4			; GFX90A: [[COPY:%[0-9]+]]:areg_128_align2 = COPY %4
	; GFX90A: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 4784137 / reguse:AReg_128_Align2 */, [[COPY]]			; GFX90A: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 5373961 / reguse:AReg_128_Align2 */, [[COPY]]
	; GFX90A: S_ENDPGM 0			; GFX90A: S_ENDPGM 0
	%val = call i128 asm sideeffect "; def $0", "=a"()			%val = call i128 asm sideeffect "; def $0", "=a"()
	call void asm sideeffect "; use $0", "a"(i128 %val)			call void asm sideeffect "; use $0", "a"(i128 %val)
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Make vector superclasses allocatableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 371594

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll

llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll

[AMDGPU] Make vector superclasses allocatable
ClosedPublic