Page MenuHomePhabricator

ARM: avoid infinite combining loop
ClosedPublic

Authored by t.p.northover on Oct 12 2018, 4:46 AM.

Details

Summary

One of the transformations done by PerformCMOVCombine increases DAG complexity so that one of the output values can be shared with the compare operation (written as a SUBS). Since this is the reverse of normal combining, it can be collapsed back later in the worklist, leading to an infinite loop.

I think there was a little existing logic to avoid this (the LHS != RHS check) , but it's fragile and nowhere near covers all possible combines; I could not extend it satisfactorily to cover this case and others.

So this patch changes the inserted ISD::SUB into a new ARMISD::OpaqueSUB with the same semantics but that doesn't undergo combining to avoid the problem.

Diff Detail

Repository
rL LLVM

Event Timeline

t.p.northover created this revision.Oct 12 2018, 4:46 AM
nhaehnle removed a subscriber: nhaehnle.Oct 16 2018, 9:20 AM

I guess you're mainly asking review on this to get opinions on the approach to introduce new pseudo instructions just to avoid cycles during DAG combining and/or lowering?
From the patch, I furthermore guess that introducing a new pseudo leads to quite a bit of code duplication/violations of the "don't-repeat-yourself" principle in the instruction matching patterns?

With that, I wonder:

  • It seems unlikely this is the only place where a problem with cycles during DAG combining/lowering pops up.
    • If it is the only place: what is so exceptional here that this really is the only place?
    • If it is not: what are the other approaches already taken elsewhere to avoid cycles? Why are alternative approaches not applicable here (or why are they an even worse solution)? Is there actually some guidance written up somewhere on which approaches exist and which to take to avoid cycles?
  • Instead of producing a whole new pseudo, I guess an alternative would be to introduce a way to flag on ISD nodes "do not combine this node any further" as a more generic mechanism. I guess for that to work, some bit somewhere on the ISD node would be needed to store that info. I wonder if you considered such an approach too? I guess it has the advantage that it doesn't require violating the DRY principle in the instruction patterns?

It seems unlikely this is the only place where a problem with cycles during DAG combining/lowering pops up.

I think it's pretty rare. Mostly combines genuinely do simplify the DAG, or replace the lot with a target node. I did consider the latter but decided getting equivalent CodeGen out of it would be just as bad as (or worse than) the duplicated patterns for the subtraction.

I guess an alternative would be to introduce a way to flag on ISD nodes "do not combine this node any further" as a more generic mechanism. I guess for that to work, some bit somewhere on the ISD node would be needed to store that info.

I hadn't considered an approach like that, but I think obeying such a flag would be pretty difficult. You don't just have to check the root of a combine, but every node involved. That probably means checks throughout the DAGCombiner.

I thought the normal way to stop combining was to return the original node. Could you not manually replace N with Res and then return N?

I thought the normal way to stop combining was to return the original node. Could you not manually replace N with Res and then return N?

You can sort of hide the new node from the worklist, but it's not usually a good idea: if the node gets back on the worklist some other way, you can end up with an infinite combining loop. The goal should be a "stable" DAG, which DAGCombine won't modify further.

Ok, fair point. If we are going to introduce a new node to fix this issue, could we have a SUBS node that can be glued to the CMOV?

Ok, fair point. If we are going to introduce a new node to fix this issue, could we have a SUBS node that can be glued to the CMOV?

I believe later passes already tidy things up enough that we don't need to embellish the DAG phase. For example if I compile:

define i32 @foo(i32 %lhs, i32 %rhs, i32 %in) {
  %tst = icmp eq i32 %lhs, %rhs
  %res = select i1 %tst, i32 0, i32 %in
  ret i32 %res
}

with this patch, I still get the optimum CodeGen of a single cmp and cmov.

My point was that, even though this extra work and as you mentioned in the comments, the test case isn't generating optimum code. We could introduce a subs node to remove the unnecessary cmp, making the sub opaque and improving codegen. Unless there's a reason why we couldn't do this?

Ah, I see what you mean. It's not pretty, but this updated patch seems to do the trick.

samparker accepted this revision.Nov 28 2018, 7:24 AM

Great to see those other test changes! LGTM with the few minor comments, no need to re-review. cheers!

llvm/lib/Target/ARM/ARMISelLowering.h
88

Maybe just SUBS now? And with an updated comment.

llvm/lib/Target/ARM/ARMInstrInfo.td
3626

whitespace.

llvm/lib/Target/ARM/ARMInstrThumb.td
1285

whitespace.

llvm/lib/Target/ARM/ARMInstrThumb2.td
2084

whitespace.

This revision is now accepted and ready to land.Nov 28 2018, 7:24 AM
t.p.northover closed this revision.Dec 3 2018, 3:19 AM

Great to see those other test changes!

Yes, definitely a pleasant surprise. Thanks for the reviews; I've committed it with your suggested changes as r348122.