Page MenuHomePhabricator

[SystemZ] Swap compare operands in foldMemoryOperandImpl() if possible.
Needs ReviewPublic

Authored by jonpa on Wed, Sep 11, 4:42 AM.

Details

Reviewers
uweigand
Summary

Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users.

This is relatively straight forward if the live-in lists for non-allocatable registers (CC) are assumed to be correct during register allocation. The experimental verifyLiveInLists_CC() has not detected any missing CC in any live-in lists so far, but this is still a missing piece to add if this is found useful enough.

On SPEC 2006, I see:

cg             :                10184                10861     +677
lg             :               373873               373204     -669
je             :               118876               119381     +505
cgrje          :                19726                19301     -425
c              :                17535                17941     +406
l              :                73704                73319     -385
jlh            :                61456                61810     +354
cgrjlh         :                11370                11189     -181
crjlh          :                 9502                 9353     -149
crje           :                 6488                 6406      -82
jle            :                12939                13000      +61
crjhe          :                 3843                 3787      -56
cgr            :                 1805                 1754      -51
crjle          :                 2508                 2462      -46
jhe            :                10908                10953      +45
lr             :                27011                26967      -44

I interpret this to mean that this patch handles some additional ~1000 cases. This seems to mean that the compare is now folded with the load load instead of with the jump, like 'lg; cgrje' => 'cg; je'. I am not sure this is really better, then? Would it be worth the trouble of adding some kind of live-in lists check before regalloc for non-allocatable phys reg(s)?

Diff Detail

Event Timeline

jonpa created this revision.Wed, Sep 11, 4:42 AM

I interpret this to mean that this patch handles some additional ~1000 cases. This seems to mean that the compare is now folded with the load load instead of with the jump, like 'lg; cgrje' => 'cg; je'. I am not sure this is really better, then? Would it be worth the trouble of adding some kind of live-in lists check before regalloc for non-allocatable phys reg(s)?

In general, 'lg; cgrje' should have the same performance as 'cg; je' (both have one single-issue and one dual-issue opcode on current micro-archs). However, the 'cg; je' version does not require a GPR to hold the value to be compared against, so it may be preferable due to register pressure concerns.

I'm not sure about the isCCLiveOut. In SystemZElimCompare we simply rely on this being correct. Why can't we rely on it here as well? Can we find out the specific rules that specify when the isLiveIn flags can be relied upon and when they cannot?

lib/Target/SystemZ/SystemZInstrInfo.cpp
1840

I'm not sure it is safe to assert here. If there's no CC users of a compare, then the compare is dead and will usually have been optimized away. But I'm not sure we can fully rely on that to have happened in all cases (e.g. what about -O0?).

1849

This logic duplicates the CC commutation operation done in SystemZISelLowering::combineCCMask, so it would be preferable to have a helper routine somewhere.

1871

Why do you need to modify the instruction here in the first place? The caller throws it away anyway and creates the memory variant. Should this just do the commutation in place there?

lib/Target/SystemZ/SystemZRegisterInfo.cpp
81

If we move it here, the copy in SystemZElimCompare is now no longer needed, right?

jonpa added a comment.Thu, Sep 12, 7:30 AM

In general, 'lg; cgrje' should have the same performance as 'cg; je' (both have one single-issue and one dual-issue opcode on current micro-archs). However, the 'cg; je' version does not require a GPR to hold the value to be compared against, so it may be preferable due to register pressure concerns.

That's a good point. I see "Spill/Reload" comments in output go up with 4, while Copies go -28. (206 files different). On SPEC 2017, I see ~2600 cases, with "Spill/Reload" comments in output go up with 127, and Copies go -71. (371 files different).

I don't know exactly why there is an *increase* in spill/reload instructions and a decrease in register move instructions, but it anyways seem to be from this viewpoint not that interesting of a change, or?

I'm not sure about the isCCLiveOut. In SystemZElimCompare we simply rely on this being correct. Why can't we rely on it here as well? Can we find out the specific rules that specify when the isLiveIn flags can be relied upon and when they cannot?

When I asked about this on the list in June, the answer is that the live-in lists "...were intended only for post-ra passes and are only setup during the final rewriting pass.". So it seems there is *not* some kind of general agreement that live-in lists need to be correct at all times pre-/during RA. However, that seems to refer to VirtRegMap adding the allocated physical registers to the live-in lists, while CC is non-allocatable.

I tried a simple .mir program where I made CC used by a branch in a basic block where the definition (compare) was in the predecessor block. The MachineVerifier caught this as a "Bad machine code: Using an undefined physical register". If I add the CC to the live-in list, it passes the initial verification. So in a way the MachineVerifier has at least in this example demanded CC to either be defined in the basic block or be in the live-in list at the point of use.

SystemZElimCompare is after regalloc, but I can see that trusting live-in lists there or anywhere else seems to have the same implications. Maybe I was thinking that regalloc would set up all the live in lists correctly after its analysis of the function.

I am still not sure if I should go ahead and spend more time on this, given the very small change in spill/copies..?