This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Use GlobalPriority for largest register tuples
ClosedPublic

Authored by arsenm on Sep 13 2022, 5:22 AM.

Details

Reviewers
rampitec
foad
Group Reviewers
Restricted Project
Summary

Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.

It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.

The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.

Diff Detail

Event Timeline

arsenm created this revision.Sep 13 2022, 5:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 5:22 AM
arsenm requested review of this revision.Sep 13 2022, 5:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 5:22 AM
Herald added a subscriber: wdng. · View Herald Transcript
This revision is now accepted and ready to land.Sep 13 2022, 12:41 PM
foad added inline comments.Sep 14 2022, 2:12 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
796

Would it be cleaner to set this inside SRegClass based on !ge(numRegs, 16)? Or do it in a common base class of SRegClass/VRegClass/ARegClass?

arsenm added inline comments.Sep 15 2022, 7:03 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
796

I don't think it would really be cleaner. The common base class would be all the way up to SIRegisterClass, which currently doesn't have a number of registers parameter. As is we could move this to let blocks inside of the multiclass definitions for each class, which I think is even uglier