This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Fix handling of CC values
ClosedPublic

Authored by jonpa on Aug 28 2019, 3:31 AM.

Details

Reviewers
uweigand
Summary

It was recently discovered that the handling of CC values was actually broken since overflow was not properly handled.

Add and sub instructions have now a new target specific instruction flag called SystemZII::CCIfNoSWrapOnly. This means that the CC result can be used instead of a compare with 0 only if the instruction has the 'nsw' flag set. Without 'nsw', the compare can not be eliminated.

Load complement/negative/positive can also overflow which means they should as well not be used for compare-with-0 elimination.

Diff Detail

Event Timeline

jonpa created this revision.Aug 28 2019, 3:31 AM
jonpa updated this revision to Diff 218413.Sep 3 2019, 2:12 AM

Patch updated.

uweigand added inline comments.Sep 3 2019, 4:33 AM
lib/Target/SystemZ/SystemZInstrInfo.td
1020–1021

So strictly speaking, subtraction is different than addition in that there can never be any overflow with a zero result for a subtraction. For for sub, the CompareZeroCCMask = 0x8 setting is actually correct.

Of course, the CCIfNoSWrapOnly = 1 would also be correct. Is there a way to support both (if we have nsw, support any comparison, if we don't, only support comparison against 0)?

test/CodeGen/SystemZ/int-cmp-44.ll
8–9

This comment is actually wrong now. I believe this comment should state that the CC value of an addition can be used to omit a compare if we know no signed overflow happens.

In fact, I think it would be good to have two sets of test cases: one (probably the current one) that verifies that in the absence of the nsw flag, we get the extra compares, and another one that verifies that in the presence of nsw, we never have a compare.

121–122

Same comment as above applies.

jonpa updated this revision to Diff 218656.Sep 4 2019, 5:25 AM
jonpa marked 6 inline comments as done.

Updated per the review so far.

lib/Target/SystemZ/SystemZInstrInfo.td
1020–1021

As discussed this should work by just adding back the CompareZeroCCMask of 0x8 for the subs.

test/CodeGen/SystemZ/int-cmp-44.ll
8–9

Fixed comment.

As discussed, this is covered by the new int-cmp-56.ll test.

121–122

I added similar new tests for subtraction in int-cmp-45.ll.

jonpa marked an inline comment as done.Sep 4 2019, 5:27 AM
jonpa added inline comments.
test/CodeGen/SystemZ/int-cmp-44.ll
121–122

Sorry - they were added in int-cmp-56.ll.

With the current form of the patch, do we see any noticeable regression due to no additional compares after addition? If no, the patch would be OK; if yes, we'll probably have to add support for AL etc. first.

lib/Target/SystemZ/SystemZInstrFormats.td
97

I'd say something like "sets CC like a compare of the result against zero" instead of "according its CCValues".

test/CodeGen/SystemZ/loop-01.ll
97 ↗(On Diff #218656)

This is because of this, right?
http://lists.llvm.org/pipermail/llvm-dev/2019-September/134876.html

That should preferably be fixed first. (Or at least marked with a FIXME here.)

jonpa updated this revision to Diff 219318.Sep 9 2019, 4:20 AM
jonpa marked 4 inline comments as done.

Patch updated:

adjustCCMasksForInstr() extended to

  • recognize signed add with immediate and allow these cases by setting the overflow flag in CCMask as required. Now there are many fewer compares instead of many more on SPEC 2006.
  • allow logical instructions to replace a compare with 0 for EQ/NE. This only handled just a few more cases. Next step is to find the best way to emit more logical instructions or change opcode to in cases where it will eliminate the comparison.

CCMASK_LOGICAL_NONZERO should be: CCMASK_1 | CCMASK_3, right?

jonpa updated this revision to Diff 219508.Sep 10 2019, 4:01 AM
  • Added assert when detecting an add with immediate, that this is not an immediate offset of a memory instruction but indeed an immediate operand.
  • Removed check of SystemZII::getCompareZeroCCMask() when detecting add with immediate, since there are no sub with immediate with SystemZII::CCIfNoSignedWrap set.
  • int-cmp-56.ll test updated to reflect the use of add logical.
  • New method convertToLogical() in SystemZElimCompare that tries conversion to logical add instructions. This handles just some 130 more cases. Is opcode table ok in convertToLogical(), or should it be done next to the instruction definitions. I tried to do it in tablegen, but I couldn't find a neat way to do it - here is what I have so far: .

I tried to do this conversion also during instruction selection. I made an experimental patch that would select logical adds for just those cases where the instruction does not have "nsw" while it is being compared if equal to zero. I found drawbacks with this: 1. MachineCSE is less successful now since it cannot remove an ALR even if the operands are the same as of the AR. This seemed to be ~400 cases on SPEC 2006. 2. This does not handle the cases of spilled registers that have been folded into an add with memory instruction. 3. The method I had to write for this became quite cumbersome (the might be a simpler way...), and it still was not "perfect" - there is no simple way to check if CC is clobbered between the add and the compare, by a call for instance. This is my patch where I tried this:

In addition, I also tried to select add logical during instruction selection "always except in case of 'nsw'". This seems to give very similar results to convertToLogical(), except for all the add logical instructions everywhere. +50 tests fail with this.

  • With this patch there are now ~3000 less compares (including fused ones) than on master. This is due to the improved handling of add with immediate.
  • As seen in int-cmp-58.mir, the printed instruction names reflect CC as if it was set by a signed instruction: LOGICAL_ZERO becomes "locrhe", and LOGICAL_NONZERO "locrnhe", after a logical add. After a sub I see "locrh" / "locrnhe". Would it be feasible to support more reasonable mnemonics for these cases?

I am waiting for review OK for correctness before doing more extensive benchmarking.

lib/Target/SystemZ/SystemZInstrFormats.td
97

nice - changed it.

test/CodeGen/SystemZ/loop-01.ll
97 ↗(On Diff #218656)

yes, that's what I saw and asked about on the list.

I marked it as FIXME for now. It should get fixed once we handle add with immediate cases in the backend, regardless of LSR/SelectionDAGBuilder.

jonpa updated this revision to Diff 219674.Sep 11 2019, 2:09 AM

Remove the assert "A logical opcode was selected in presence of the nsw flag." This is too aggressive, as a zero extension can be folded into the instruction regardless of "nsw" (e.g. ALGFR).

See inline comments. As an aside, the whole adjustCCMasksForInstr routine is now rather complicated -- this could probably use a complete rewrite, but I'm not really sure what the best approach here would be ...

lib/Target/SystemZ/SystemZElimCompare.cpp
364

The comment should probably elaborate a bit, along the lines of: "If adding a positive immediate overflows, the result must be less than zero. If adding a negative immediate overflow, the result must be larger than zero (except in the special case of adding the minimum value of the result range, in which case we cannot predict whether the result is larger than or equal to zero)."

Note that this logic is only correct if the operation performs an addition. Given that you only set the CCIfNoSignedWrap for add and sub instructions, and there is no sub with immediate operand, this condition happens to be true here, but that's a bit implicit ... Maybe add some assert?

366

This should probably use int64_t to avoid creating a dependency on the size of the host "int" type.

392

Probably better to move that assert now up to the only branch where it is useful.

396

This now looks incorrect in either the IsLogical case or in the OFImplies case. In either of those cases, we must go through the loop below rewriting CCUsers.

Again, probably best to move the setting of MIEquivalentToCmp to the branches above.

437

The assert seems unnecessary -- any CCMask other than CCMASK_CMP_EQ can be correctly mapped to CCMASK_LOGICAL_NONZERO here if the loop above passed the OutMask check.

jonpa updated this revision to Diff 224967.Oct 15 2019, 1:47 AM
jonpa marked 9 inline comments as done.

Updated per review.

In addition, I removed

CCValues &= ~SystemZ::CCMASK_ARITH_OVERFLOW;

from the case where both MachineInstr::NoSWrap and SystemZII::CCIfNoSignedWrap are present. This seems mostly confusing to me know, as CCValues is supposed to simply be what CC values the instruction may set, so if the instruction can set the OF bit, CCValues should already reflect this (which it also currently does).

I also added an assert for the CC operands of the CCUsers.

All the changes of this update are NFC on SPEC 2006.

As an aside, the whole adjustCCMasksForInstr routine is now rather complicated -- this could probably use a complete rewrite, but I'm not really sure what the best approach here would be ...

I agree that there are too many assumptions and complications - I might give it a try to find a more simplistic approach.

lib/Target/SystemZ/SystemZElimCompare.cpp
364

Comment updated.

I added an assert with a check against known opcodes for add with immediate, since I could not find any MI flag to check against.

392

ah, yes.

396

Yes... It seems that those cases were currently not passing this test anyway, but I agree that it certainly makes sense to move this up.

The first case with MachineInstr::NoSWrap also should not set this as the CCValues will now include the OF bit and this should be updated in the CCUsers CCValid operands.

437

ok - removed it.

Thanks, this LGTM now.

I think we need to verify that benchmark results are good before committing this change, however.

jonpa added a comment.Oct 15 2019, 5:16 AM

Here are some minor NFC changes in an attempt to make things easier to follow. Is this better?

Yes, this does look a bit better. Maybe you can even avoid the IntCC variable by doing something like:

unsigned ReusableCCMask = CCValues;
if (CompareFlags & SystemZII::IsLogical)
  ReusableCCMask &= SystemZ::CCMASK_CMP_EQ;

before the various conditions, and then simply & new constraints into ReuableCCMask as required. For the CCIfNoSignedWrap case, this would be nothing, for the IsLogical case, this would be

ReusableCCMask &= SystemZ::CCMASK_CMP_EQ;

and for the CompareZeroCCMask case this would be

ReusableCCMask &= SystemZII::getCompareZeroCCMask(MIFlags);
jonpa updated this revision to Diff 225187.Oct 16 2019, 3:12 AM

Maybe you can even avoid the IntCC variable...

Yes :-)

I like the AND:ing away things from ReusableCCMask after starting out with CCValues, however the logical case cannot play along here since those instructions has CCValues in the logical domain. Locigal subtracts have 0x7, which does not include the SystemZ::CCMASK_CMP_EQ bit. So this really needs to set ReusableCCMask to SystemZ::CCMASK_CMP_EQ based on the assumption that all logical instructions that has any CCValues support a comparison to zero.

I kept the empty first if clause in lack of any better ideas.

Maybe we could remove

if (ReusableCCMask == 0)

return false;

since the loop will return immediately anyway?

I strengthened the assert further for "CC operands of CCUser", to also check that CCMask is non-zero and that it is not equal to CCValid.

Maybe you can even avoid the IntCC variable...

Yes :-)

I like the AND:ing away things from ReusableCCMask after starting out with CCValues, however the logical case cannot play along here since those instructions has CCValues in the logical domain. Locigal subtracts have 0x7, which does not include the SystemZ::CCMASK_CMP_EQ bit. So this really needs to set ReusableCCMask to SystemZ::CCMASK_CMP_EQ based on the assumption that all logical instructions that has any CCValues support a comparison to zero.

OK, I see.

I kept the empty first if clause in lack of any better ideas.

Fine with me. Maybe just a bit more comment than "done" :-)

Maybe we could remove

if (ReusableCCMask == 0)

return false;

since the loop will return immediately anyway?

Not sure this is true. In any case, having this as early exist makes sense to me ...

I strengthened the assert further for "CC operands of CCUser", to also check that CCMask is non-zero and that it is not equal to CCValid.

I'm not sure this is completely safe either. Yes, in theory we should never see always-true or always-false conditions, but I believe it is not completely impossible that those might occur depending on optimization levels etc. In any case, the rest of the code should actually handle those cases correctly, so I'm not sure we really need to prevent them here.

Otherwise, the patch looks good to me now, assuming we get encouraging performance results.

jonpa updated this revision to Diff 225387.Oct 17 2019, 3:29 AM

Updated per review: Comment improved, and extra assertions removed.

Not sure this is true. In any case, having this as early exist makes sense to me ...

The asserts passed with -O0/SPEC 2006, so I assumed this was guaranteed.

jonpa updated this revision to Diff 234337.Dec 17 2019, 10:22 AM

Patch rebased, no manual changes needed...

Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2019, 10:22 AM
uweigand accepted this revision.Dec 20 2019, 7:13 AM

LGTM, thanks!

This revision is now accepted and ready to land.Dec 20 2019, 7:13 AM
jonpa closed this revision.Dec 20 2019, 10:24 AM

Thanks for review!

3174683e