This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Inefficient register allocation of ACC registers results in many copies.
ClosedPublic

Authored by stefanp on Jul 12 2021, 4:05 PM.

Details

Summary

ACC registers are a combination of four consecutive vector registers.
If the vector registers are assigned first this often forces a number
of copies to appear just before the ACC register is created. If the ACC
register is assigned first then fewer copies are generated when the vector
registers are assigned.

This patch tries to force the register allocator to assign the ACC registers first
and then the UACC registers and then the vector pair registers. It does this
by changing the priority of the register classes.

This patch also adds hints to help the register allocator assign UACC registers from
known ACC registers and vector pair registers from known UACC registers.

Diff Detail

Event Timeline

stefanp created this revision.Jul 12 2021, 4:05 PM
stefanp requested review of this revision.Jul 12 2021, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 4:05 PM
lkail added a subscriber: lkail.Jul 12 2021, 7:47 PM
nemanjai added inline comments.Jul 13 2021, 2:34 AM
llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
483

I am not 100% positive this is the intent here, but a comment should definitely be added (update my suggestion if it does not adequately capture the intent).

// Call the base implementation first to set any hints based on
// the usual heuristics and decide what the return value should
// be. We want to return the same value returned by the base
// implementation as we are not looking to force a specific allocation
// but rather to just provide a hint.
498

I think this comment is more appropriate on the call to the function from the base class (see above).

511

Is it possible for this to not be true? Should this maybe just be an assert instead?

llvm/lib/Target/PowerPC/PPCRegisterInfo.h
97
// Provide hints to the register allocator for allocating subregisters
// of primed and unprimed accumulators. For example, if accumulator
// ACC5 is assigned, we also want to assign UACC5 to the input.
// Similarly if UACC5 is assigned, we want to assign VSRp10, VSRp11
// to its inputs.
llvm/lib/Target/PowerPC/PPCRegisterInfo.td
452

Please add a comment regarding how the values were selected.

llvm/test/CodeGen/PowerPC/ppc64-acc-schedule.ll
1

I think that for this test case, it would be more useful to see how the code generation changes with this patch. Please pre-commit it with current code generation and then this review will show the difference.

stefanp retitled this revision from [PowerPC] Inefficient scheduling of ACC registers results in many copies. to [PowerPC] Inefficient register allocation of ACC registers results in many copies..Jul 13 2021, 6:18 AM
stefanp edited the summary of this revision. (Show Details)
stefanp added a reviewer: power-llvm-team.
stefanp updated this revision to Diff 358553.Jul 14 2021, 4:00 AM

Added a number of comments.
Changed if to assert where if should always be true.
Pre-committed the test so that changes can be seen in diff.

jsji added a reviewer: lkail.Jul 15 2021, 8:45 PM
nemanjai accepted this revision.Jul 16 2021, 2:39 AM

LGTM other than a couple of comment nits.
Also, since this affects target independent code in the register allocator, please hold off on committing in case @qcolombet has concerns regarding the changes to the register allocator.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
492
// We are interested in instructions that copy values to ACC/UACC.
// The copy into UACC will be simply a COPY to a subreg so we
// want to allocate the corresponding physical subreg for the source.
// The copy into ACC will be a BUILD_UACC so we want to allocate
// the same number UACC for the source.
llvm/lib/Target/PowerPC/PPCRegisterInfo.td
455

s/advacent/adjacent

Also, V4-V7 are VMX registers. The ACC registers overlap the low VSX registers (the FPR's). So that should probably say VS4 - VS7 rather than V4, V5, V6, V7.

This revision is now accepted and ready to land.Jul 16 2021, 2:39 AM
stefanp updated this revision to Diff 360127.Jul 20 2021, 7:37 AM

Updated comments as per review comments.

This revision was landed with ongoing or failed builds.Jul 20 2021, 8:53 AM
This revision was automatically updated to reflect the committed changes.
arsenm added a subscriber: arsenm.Aug 3 2021, 4:20 PM
arsenm added inline comments.
llvm/include/llvm/CodeGen/TargetRegisterInfo.h
876

Why is this a target option? Why not just always apply the target defined priority? It seems like a bug to me that this wasn't added before

qiucf added a subscriber: qiucf.
qiucf added inline comments.
llvm/include/llvm/CodeGen/TargetRegisterInfo.h
876

I think it breaks lots of AMDGPU tests (30+) so it will go beyond this patch's scope. I'm not sure it's better or worse for AMDGPU, so I posted https://reviews.llvm.org/D108010.