This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr
ClosedPublic

Authored by eastig on Apr 6 2016, 11:38 AM.

Details

Summary

AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. Details can be found here: https://llvm.org/bugs/show_bug.cgi?id=27158

  1. Fix:
  • A condition code used after CmpInstr and before the next modification of NZCV is found. The optimization is not applied if different condition codes are used. It might be difficult to find a candidate for substitution to satisfy all of them. I think this case with multiple used condition codes does not happen often.
  • Then it’s checked in 'canInstrSubstituteCmpInstr' that the instruction which defines a register for CmpInstr can produce the needed condition code itself or its S variant. If it or its S variant can produce then CmpInstr is removed.
  1. A regression test is added.
  2. A new test to check that SUBS is replaced by SUB is added.

Diff Detail

Repository
rL LLVM

Event Timeline

eastig updated this revision to Diff 52827.Apr 6 2016, 11:38 AM
eastig retitled this revision from to [AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr.
eastig updated this object.
eastig added a reviewer: jmolloy.
eastig added a subscriber: llvm-commits.
eastig updated this object.Apr 6 2016, 11:39 AM
t.p.northover added inline comments.
lib/Target/AArch64/AArch64InstrInfo.cpp
975–977 ↗(On Diff #52827)

I think the documentation should call out the fact that it really does check whether *all* uses are equivalent.

981 ↗(On Diff #52827)

Why only NE? At the very least NE/EQ are pretty equivalent.

985–988 ↗(On Diff #52827)

I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why an FI operand is involved at all.

1007–1033 ↗(On Diff #52827)

I think this is probably a bit too tricksy, and both less efficient and less clear than the naive loop.

I also think that looking at this from the position of CondCodes is probably the wrong level. The real distinction is which bits of NZCV are used (in particular, ADDS sets C & V differently from SUBS, as I recall, so mixing tests that only use Z & N is fine).

With both of those observations, I think this function reduces to something like:

for(MI= CmpInstr; MI != instr_end; ++MI) {
  if (MI->readsRegister(NZCV))
    NZCVUsed |= findNZCVUsedByInstr(MI);
  if (MI->modifiesRegister(NZCV)) {
    // Possibly return 0b1111 if MI also reads.
    return NZCVUsed;
  }
}
1034 ↗(On Diff #52827)

I think we should also check (or assert, if also checked before) that NZCV doesn't escape the basic block.

1043 ↗(On Diff #52827)

We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical.

1068–1071 ↗(On Diff #52827)

These are unconditional (i.e. work on all possible AArch64CC) aren't they?

Anyway, I think (with the repurposing of findUsedCondCodeAfterCmp suggested above) that the NZCV part of this function is really:

if (sameSForm(..., isADDS) || sameSForm(..., isSUBS))
  return true;
auto NZCVUsed = findUsedNZCVAfterCMP(...);
return !NZCVUsed.C && !NZCVUsed.V;
1094 ↗(On Diff #52827)

This completely short-circuits the other walks you're adding doesn't it? (E.g. findUsedCondCodeAfterCmp will never find more than one use).

test/CodeGen/AArch64/arm64-regress-opt-cmp.ll
1 ↗(On Diff #52827)

I think these tests are rather weak given how much code is changing and the complexity of the result. I'd probably suggest a MIR test for this one; they can be a bit temperamental to get started, but allow you to exercise many more edge cases quickly.

eastig added a comment.Apr 7 2016, 3:48 PM

Hi Tim,

Thank you for comments. They are very useful.
See my answers.

lib/Target/AArch64/AArch64InstrInfo.cpp
975–977 ↗(On Diff #52827)

It seems this function name is too general and does not correspond to what the function does. It was based on tests:

  • CodeGen/AArch64/arm64-arm64-dead-def-elimination-flag.ll
  • CodeGen/AArch64/arm64-dead-def-frame-index.ll

The tests have a comparison of a result of 'alloca' with null. They expect a compare operation to be removed. A sequence of instruction is ADDX+SUBSX+CSINCW.

I overcomplicated things.

981 ↗(On Diff #52827)

You are right.

985–988 ↗(On Diff #52827)

The logic is wrong.

1007–1033 ↗(On Diff #52827)

Yes, you are right.

1034 ↗(On Diff #52827)

Do you mean to check if flags are alive in successors of the basic block? If yes, this is checked in substituteCmpInstr.

1043 ↗(On Diff #52827)

It's a pity. Maybe such functions already exist?

1068–1071 ↗(On Diff #52827)

You are right.

1094 ↗(On Diff #52827)

It checks accesses(read, write) before Cmp and after MI. It was in the original code. I think this was done in more strict way to make code simpler because such a situation is rare if it ever exists.

test/CodeGen/AArch64/arm64-regress-opt-cmp.ll
1 ↗(On Diff #52827)

I fully agree with you. I tried to write a simpler test but I failed to write it in IR. How can I write in MIR?

t.p.northover added inline comments.Apr 7 2016, 4:19 PM
lib/Target/AArch64/AArch64InstrInfo.cpp
1034 ↗(On Diff #52827)

Yep, that's what I meant (I noticed it later). I think an assertion is probably still a good idea somewhere in the function (as documentation that anyone reading it shouldn't bother worrying, basically).

1043 ↗(On Diff #52827)

Not as far as I'm aware, but I expect an explicit "Opcode == A || Opcode == B || ..." to be just as efficient when compiled. More verbose, but less worrying.

1094 ↗(On Diff #52827)

Yep, but to me it looks like your code already handles this properly, and would permit the optimization in more cases (albeit rare ones, as you say).

test/CodeGen/AArch64/arm64-regress-opt-cmp.ll
1 ↗(On Diff #52827)

The hardest part last time I did it was making sure the pass was registered properly for it (hint: make sure initializeXYZPass gets called). Fortunately for you, it looks like this is already done properly for the peephole optimizer so you'd use "llc -run-pass=peephole-opts /path/to/file.mir" in the RUN line.

Other than that, to get a basic MIR file to test you could either run "llc -stop-after=some-pass" (useful if you don't know quite how to write some construct) or just copy an existing one and modify it as needed. The format is less obvious and documented than LLVM IR so writing one from fresh is not a good idea (yet, at least).

eastig updated this revision to Diff 53407.Apr 12 2016, 8:35 AM

Updated the patch according to Tim's comments.

t.p.northover added inline comments.Apr 18 2016, 12:13 PM
lib/Target/AArch64/AArch64InstrInfo.cpp
993 ↗(On Diff #53407)

Can you explicitly annotate this fall-through?

1005–1006 ↗(On Diff #53407)

This shouldn't be a fallthrough if I'm reading the manual correctly (definition of ConditionHolds on page J1-5267 for example).

1044–1045 ↗(On Diff #53407)

Do we ever check that they're comparing agains the same value as MI? (Always 0 I believe).

1053–1055 ↗(On Diff #53407)

What's allowed between MI and CmpInstr depends on what MI is:

  • If it's an ADDS/SUBS already then any use of flags is permitted and only definitions are wrong.
  • If it's an ADD/SUB then neither uses nor defs are allowed (of any flags). Uses imply NZCV is already live and we're going to clobber it; defs imply it would be clobbered before it reached CmpInstr.
1083–1084 ↗(On Diff #53407)

I think this function should either return an NZCVUsed (to be checked by the caller) or check the forms of MI and CmpInstr itself. If they're SUBS/CMP or ADDS/CMN then the substitution is valid regardless of flags used.

test/CodeGen/AArch64/arm64-regress-opt-cmp.mir
1 ↗(On Diff #53407)

This test has lots of extra cruft and only ends up checking one instance anyway. The point of MIR tests is that we can exercise more aspects of the logic than from IR alone.

Hi Tim,

Thank you for comments. See my answers.
I am updating the patch. Need some time. Quite busy on armcc.

lib/Target/AArch64/AArch64InstrInfo.cpp
993 ↗(On Diff #53407)

I will add comments to each case to show which flags are used.

1005–1006 ↗(On Diff #53407)

Good catch. Thank you.
It's a bug. 'break' is missed.

1044–1045 ↗(On Diff #53407)

I will rename 'substituteCmpInstr' to 'substituteCmpToZero' to be clear what is the case. I don't think we need to check that MI and CmpInstr use the same value.
I added these checks because 'optimizeCompareInstr' is called for instructions which are supported by 'analyzeCompare'. Currently they are ADDS, SUBS and ANDS. We cannot substitute ANDS because 'ANDS vreg, 0' always produces 0. We can substitute 'ANDS vreg, -1' but it's not comparision with zero. Is this case worth?

1053–1055 ↗(On Diff #53407)

You are right.
I would rewrite this as follows:

  • If MI opcode is the S form there must be no defs of flags.
  • If MI opcode is not the S form there must be neither defs of flags nor uses of flags.
1083–1084 ↗(On Diff #53407)

We can only be sure for N and Z flags. SUBS/CMP or ADDS/CMN can produce different C and V flags, e.g.

%vr2  =  SUBS %vr1,  1 ; sets C to 0 when %vr1 == 0
%cmp = SUBS %vr2, 0  ; sets C to 1
test/CodeGen/AArch64/arm64-regress-opt-cmp.mir
1 ↗(On Diff #53407)

Yes, it looks cumbersome. I am reducing it as much as possible.

t.p.northover added inline comments.Apr 19 2016, 1:10 PM
lib/Target/AArch64/AArch64InstrInfo.cpp
1044–1045 ↗(On Diff #53407)

Sorry about that, I was talking nonsense. I forgot that SrcReg was the destination of the candidate instruction so comparison against 0 was automatic if flags were set. No need to support anything else.

1083–1084 ↗(On Diff #53407)

Oh, of course. Same flawed reasoning from me as above I think.

eastig updated this revision to Diff 54365.Apr 20 2016, 7:12 AM

Updated according to Tim's comments

t.p.northover accepted this revision.Apr 20 2016, 1:21 PM
t.p.northover added a reviewer: t.p.northover.

Thanks for the updates. This looks good to me now.

Tim.

lib/Target/AArch64/AArch64InstrInfo.cpp
1058–1059 ↗(On Diff #54365)

Nice implementation!

This revision is now accepted and ready to land.Apr 20 2016, 1:21 PM
This revision was automatically updated to reflect the committed changes.