This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users.
ClosedPublic

Authored by cdevadas on Mar 2 2022, 6:51 AM.

Diff Detail

Event Timeline

cdevadas created this revision.Mar 2 2022, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 6:51 AM
cdevadas requested review of this revision.Mar 2 2022, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 6:51 AM
arsenm added a comment.Mar 2 2022, 6:56 AM

What's the reasoning for this? Won't this break chains of REG_SEQUENCE? In particular I do expect to see some REG_SEQUENCE (REG_SEQUENCE), (REG_SEQUENCE) patterns

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4758 ↗(On Diff #412398)

isUnknownRegClass?

foad added inline comments.Mar 2 2022, 6:59 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4761 ↗(On Diff #412398)

Why do you need to test isVariadic here()? Surely as long as you compare OpIdx against Desc.getNumOperands() (not MI.getNumOperands()) you're OK?

What's the reasoning for this? Won't this break chains of REG_SEQUENCE? In particular I do expect to see some REG_SEQUENCE (REG_SEQUENCE), (REG_SEQUENCE) patterns

Currently skipping it to avoid a crash that occurred with REG_SEQUENCE(REG_SEQUENCE) while trying to look for the regclass @OpIdx from the UseMI's MCInstrDesc.
These special cases involving target-independent and/or variadic instructions should be handled separately.
This folding will take place only when the regclass at OpIdx is a vectorSuperClass https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp#L1603
Currently, we don't select any AV operands and I guess it is ok to skip them.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4758 ↗(On Diff #412398)

Will use

4761 ↗(On Diff #412398)

No, OpIdx >= Desc.getNumOperands() didn't hold true when I checked the case REG_SEQUENCE (REG_SEQUENCE). Is it because the passed Op index is already adjusted from UseMI->getOperand(0)?

cdevadas updated this revision to Diff 412591.Mar 2 2022, 7:05 PM
cdevadas retitled this revision from AMDGPU: Skip tryFoldRegSequence if no valid regclasses for its users. to AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users..

Used a better name for the function.

rampitec added inline comments.Mar 3 2022, 9:47 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4761 ↗(On Diff #412398)

Even variadic instructions shall have proper operand description before the variadic part starts.

cdevadas updated this revision to Diff 412876.Mar 3 2022, 5:29 PM

Removed variadic instruction check.
Desc.OpInfo[OpIdx].RegClass == -1 will catch those invalid cases.

foad added inline comments.Mar 4 2022, 4:00 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4760 ↗(On Diff #412876)

Just use MI.getDesc()?

cdevadas updated this revision to Diff 412982.Mar 4 2022, 4:34 AM

Addressed Jay's suggestion.

foad added inline comments.Mar 4 2022, 6:04 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1602

You could just use TargetInstrInfo::getRegClass(InstDesc, OpIdx, TRI, MF) instead of TRI->getRegClass and it would do all these checks for you already.

cdevadas added inline comments.Mar 4 2022, 6:14 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1602

We can't use that equivalent of getRegClass here. The call to adjustAllocatableRegClass inside it would change AV classes to V thereby disabling the folding in almost all valid cases.

foad added inline comments.Mar 4 2022, 6:21 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1602

The call to adjustAllocatableRegClass is in the AMDGPU-specific getOpRegClass, not in the generic getRegClass.

cdevadas added inline comments.Mar 4 2022, 10:03 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1602

@foad thanks for pointing that generic function.

We have some variants in SIInstrInfo.cpp to get regclass value from InstDesc.
I initially tried to use SIInstrInfo::getOpRegClass(const MachineInstr &MI, unsigned OpNo) and found the adjustAllocatableRegClass calls inside it prevented the folding for some tests involving loads and stores.

But SIInstrInfo::getRegClass(InstDesc, OpIdx, TRI, MF) seems to be working. Though it also has a call to adjustAllocatableRegClass, it is more restrained and appeared to be not missing any folding opportunity for loads/stores that failed earlier with the other variant.
I am going to use this function. Will update the patch.

cdevadas updated this revision to Diff 413064.Mar 4 2022, 10:32 AM
cdevadas added a reviewer: foad.

Used existing 'SIInstrInfo::getRegClass' function to get the RC at OpIdx.

arsenm accepted this revision.Mar 7 2022, 10:01 AM
This revision is now accepted and ready to land.Mar 7 2022, 10:01 AM
This revision was landed with ongoing or failed builds.Mar 7 2022, 8:44 PM
This revision was automatically updated to reflect the committed changes.