This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] allow narrowing of add followed by truncate
ClosedPublic

Authored by spatel on Dec 18 2018, 4:38 PM.

Details

Summary

trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Dec 18 2018, 4:38 PM
andreadb added inline comments.Dec 19 2018, 3:24 AM
test/CodeGen/X86/pr32345.ll
122–126 ↗(On Diff #178804)

Not sure what is going on here..

We fail to realize that XOR is both commutative and associative.
We are essentially doing (%eax XOR %ecx) XOR %ecx.
That is equivalent to %eax XOR (%ecx XOR %ecx). The second XOR is a zero-idiom.
So, it becomes %eax XOR 0, which is %eax.

That entire computation could be folded away.

We should generate just this:

movzwl var_27, %ecx
movzwl var_22, %eax

As a side note: we even fail to realize that the last zero-extending move is redundant! The upper half of EAX is already zero, because it already comes from a zero-extending move. The XOR is done with ECX, which also is zero-extended! So, the upper bits cannot possibly be anything else other than zero.
So, the zero-extending move is completely redundant here, because the result is already zero.

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

I don't think it's worth the effort if we look at the background for this test; the intent was to check for a crash with *unoptimized* code:
https://bugs.llvm.org/show_bug.cgi?id=32345
rL298923
...so the test includes CSE values that the DAG never expects to encounter. I think the optimized RUN was just added for completeness/sanity. The whole thing is folded to 'unreachable' in IR.

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

I don't think it's worth the effort if we look at the background for this test; the intent was to check for a crash with *unoptimized* code:
https://bugs.llvm.org/show_bug.cgi?id=32345
rL298923
...so the test includes CSE values that the DAG never expects to encounter. I think the optimized RUN was just added for completeness/sanity. The whole thing is folded to 'unreachable' in IR.

Fair enough. I didn't check if it was folded by the optimizers at IR level.
Given that the whole sequence is optimized out in IR, then I agree with you: it is not not worthy to look into it.

andreadb accepted this revision.Dec 19 2018, 7:01 AM
This revision is now accepted and ready to land.Dec 19 2018, 7:01 AM
This revision was automatically updated to reflect the committed changes.