This is an archive of the discontinued LLVM Phabricator instance.

RegAllocGreedy: Try local instruction splitting with subranges
ClosedPublic

Authored by arsenm on Jul 27 2022, 1:07 PM.

Details

Reviewers
qcolombet
MatzeB
kparzysz
uabelho
foad
Group Reviewers
Restricted Project
Summary

This was only trying this to relax register class constraints, but
this can also help if there are subranges involved.

This solves a compilation failure for AMDGPU when there is high
pressure created by large register tuples. If one virtual register is
using most of the available budget, we need to be able to evict
subranges.

This solves the immediate failure, but this solution leaves a lot to
be desired. In the relevant testcases, we have 32-element tuples but
most of the uses are operations on 1 element subranges of it. What
we're now getting is a spill and restore of the full 1024 bits and an
extract of the used 32-bits. It would be far better if we introduced a
copy to a new virtual register with a smaller register class and used
narrower spills.

Furthermore, we could probably do a better job if the allocator were
to introduce new subranges where none previously existed in the
highest pressure scenarios. The block and region splits should also
try to split specific subranges out.

The mve-vst3.ll test changes looks like noise to me, but instruction
count increased by one. mve-vst4.ll looks like a solid improvement
with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir
also shows a solid reduction in total spill count.

This could use more tests but it's pretty tiring to come up with cases
that fail on this.

Diff Detail

Event Timeline

arsenm created this revision.Jul 27 2022, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 1:07 PM
arsenm requested review of this revision.Jul 27 2022, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 1:07 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm updated this revision to Diff 448223.Jul 27 2022, 6:06 PM

Add another small test that reduces spilling (although it should be possible to eliminate it entirely)

lkail added a subscriber: lkail.Jul 27 2022, 7:41 PM
qcolombet accepted this revision.Aug 25 2022, 11:28 AM

LGTM with nit.

llvm/lib/CodeGen/RegAllocGreedy.cpp
1257

I don't think we can early return here.
Technically different operands could be on different register classes and IIRC that may give different LaneBitmasks.

This revision is now accepted and ready to land.Aug 25 2022, 11:28 AM