This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] transform sub-of-shifted-signbit to add
ClosedPublic

Authored by spatel on Jul 27 2018, 11:49 AM.

Details

Summary

This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH

This is another step towards improving select-of-constants codegen (see D48970).

x86 is the motivating target, and those diffs all appear to be wins. PPC looks neutral. I'm not sure about AArch64.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'.

Note that we're also missing this canonicalization in IR, but I'm less sure which direction we should go in there. 'lshr' gives us better knownbits, but again the chance of subsequent folds seems more likely with 'add'. We should choose one form or the other.

Diff Detail

Event Timeline

spatel created this revision.Jul 27 2018, 11:49 AM
lebedev.ri added inline comments.Jul 27 2018, 12:29 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2742–2753
// sub N0, (lshr N10, width-1)  ->  add N0, (ashr N10, width-1)
spatel updated this revision to Diff 157784.Jul 27 2018, 3:06 PM
spatel marked an inline comment as done.

Patch updated:
Add a code comment to describe the transform and motivation.

lebedev.ri accepted this revision.Jul 30 2018, 4:50 AM

The code+x86 test change looks ok to me.

In aarch64 case, as far as i can tell, the main change is that we avoided having to materialize the immediate in register,
although i'm not sure why we no longer fuse the shift into Operand2 of add/sub, commutativity overlook?
https://godbolt.org/g/yAFJS4 - but i don't think i'm comparing them correctly, that syntax is rather alien to me.
So yeah, not sure about aarch64.

This revision is now accepted and ready to land.Jul 30 2018, 4:50 AM

The code+x86 test change looks ok to me.

Thanks!

In aarch64 case, as far as i can tell, the main change is that we avoided having to materialize the immediate in register,
although i'm not sure why we no longer fuse the shift into Operand2 of add/sub, commutativity overlook?
https://godbolt.org/g/yAFJS4 - but i don't think i'm comparing them correctly, that syntax is rather alien to me.
So yeah, not sure about aarch64.

Let me add some more ARM experts to see if we can get an answer on those test diffs; I don't know if we can trust llvm-mca for aarch yet.

Yes, the AArch64 issue is that we decide to use add-with-immediate rather than sub-with-shifted-operand. Which form is better probably depends on the CPU and the context: by itself, on an A57 you're trading an expensive instruction for cheaper instruction, but if the immediate can be hoisted out of a hot loop the expensive instruction might be cheaper than two cheap instructions. But I think the sub-with-shifted-operand might be cheaper on other CPUs?

Anyway, that's basically independent of this patch, so don't worry about it.

Yes, the AArch64 issue is that we decide to use add-with-immediate rather than sub-with-shifted-operand. Which form is better probably depends on the CPU and the context: by itself, on an A57 you're trading an expensive instruction for cheaper instruction, but if the immediate can be hoisted out of a hot loop the expensive instruction might be cheaper than two cheap instructions. But I think the sub-with-shifted-operand might be cheaper on other CPUs?

Anyway, that's basically independent of this patch, so don't worry about it.

Thanks!

This revision was automatically updated to reflect the committed changes.