This is an archive of the discontinued LLVM Phabricator instance.

Introduce target hook for optimizing register copies
ClosedPublic

Authored by arsenm on Sep 3 2015, 12:01 AM.

Details

Summary

Allow a target to do something other than search for copies
that will avoid cross register bank copies.

Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.

I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid kinds
of subregister copies on some targets.

I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.

The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 33902.Sep 3 2015, 12:01 AM
arsenm retitled this revision from to PeepholeOptimizer: Look through simple extracts of reg_sequence.
arsenm updated this object.
arsenm added reviewers: MatzeB, tstellarAMD.
arsenm added a subscriber: llvm-commits.
arsenm updated this revision to Diff 33904.Sep 3 2015, 12:29 AM

Include ARM test updates. These all seem to be the same change where a different register is used for a load and the tests are too strict and test the exact register.

qcolombet requested changes to this revision.Sep 3 2015, 9:27 AM
qcolombet added a reviewer: qcolombet.
qcolombet added a subscriber: qcolombet.

Hi Matt,

The changes in the peephole optimizer and AMDGPU seem to be to separate patches to me.
Please stage them like that if this is indeed the case.

Also, the thing that you add to the rewriter does not fit with the intended design, see inline comments.

Cheers,
-Quentin

lib/CodeGen/PeepholeOptimizer.cpp
1226

The design of the copy rewriter is to find alternative sources within the value tracker.
This change does not fit that scheme and looks like an ad hoc solution.
I believe what it is missing is to teach the value tracker about sub register.
Could you look into that direction or explain why do you think we should do it like you propose in this patch?

This revision now requires changes to proceed.Sep 3 2015, 9:27 AM
arsenm added inline comments.Sep 3 2015, 4:27 PM
lib/CodeGen/PeepholeOptimizer.cpp
1226

This all seemed like it was solving a different problem at first.

This already does seem to track sub registers correctly, it just stops the search for a better source too early.

If in findNextSource I change

ShouldRewrite = shareSameRegisterFile(*TRI, DefRC, SubReg, SrcRC,

to only do this if the subregister index is the same, it fixes my cases. This however breaks a few AArch64 tests. The search stopping condition needs to change, but I'm not sure exactly what it should be. It's true in this case that shareSameRegisterFile, but we also want to reduce the register size by looking through the reg_sequence.

arsenm added inline comments.Sep 3 2015, 4:30 PM
lib/CodeGen/PeepholeOptimizer.cpp
1226

One option might be a target hook for whether a superclass should be considered sharing the same register file.

arsenm updated this revision to Diff 34115.Sep 5 2015, 4:52 PM
arsenm retitled this revision from PeepholeOptimizer: Look through simple extracts of reg_sequence to Introduce target hook for optimizing register copies .
arsenm updated this object.
arsenm edited edge metadata.

Replace with target hook instead of shareSameRegisterFile

Hi Matt,

I would have expected shareSameRegisterFile to be a win for every target, but you seem to drop that check.
Is that intentional?

I am mostly fine with the change, just not sure about what to do with shareSameRegisterFile. See my inline comment for further discussions.

Thanks,
-Quentin

include/llvm/Target/TargetRegisterInfo.h
512

Does it make sense to expose this API here?
I.e., do we expect more users of this?
If not, I would rather keep it where it was as a static function or at least being private.

523

Just return false, see further for the complete suggestion.

lib/CodeGen/PeepholeOptimizer.cpp
667

shareSameRegisterFile is supposed to produce register coalescer friendly copies and should benefit every target.
So I would have expect this check to be
shareSameRegisterFile || TRI->shouldRewriteCopySrc.

I may be wrong, but I want to be sure the change is intentional.

arsenm added inline comments.Sep 8 2015, 12:19 PM
include/llvm/Target/TargetRegisterInfo.h
512

This was to keep it as the default implementation of shouldRewriteCopySrc. I was thinking a target might want to insert an additional constraint on top of sharesSameRegisterFile in shouldRewriteCopySrc although I don't think I have a use for this.

lib/CodeGen/PeepholeOptimizer.cpp
667

This is intentional. The problem is that this returns true and not false in this case. For most every legal case on AMDGPU, shareSameRegisterFile is going to be true, so it decides to rewrite the copy earlier and doesn't continue up the use-def chain to the value inserted into the reg_sequence. Even for the cases where this would return false, we sometimes would often prefer a copy from what this would consider to be one register file to another. We don't have cross register file copies, because only one direction is ever legal and we don't have any type distinctions between register types.

qcolombet accepted this revision.Sep 8 2015, 1:41 PM
qcolombet edited edge metadata.

Thanks Matt for the clarifications.
LGTM.

if shareSameRegisterFile is not used, maybe sink its implementation in the generic shouldRewriteCopySrc.
Targets can still call the parent method when overriding.

This is just a suggestion though, I am fine with the patch as it is. Up to you.

Cheers,
-Quentin

This revision is now accepted and ready to land.Sep 8 2015, 1:41 PM
arsenm closed this revision.Sep 24 2015, 1:38 AM

r248478