This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Teach DAGCombine to fold the aext + select pattern
ClosedPublic

Authored by steven.zhang on Jun 13 2019, 10:56 PM.

Details

Summary

Teach the DAGCombine to fold this pattern(c1 and c2 is constant).

// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
// fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2)
// fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)

Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e.

t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0>
t2: i64 = any_extend t1
 -->
t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0>
 -->
 t4: i64 = sign_extend_inreg t3

Diff Detail

Event Timeline

steven.zhang created this revision.Jun 13 2019, 10:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 10:56 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper added inline comments.Jun 14 2019, 12:13 AM
llvm/test/CodeGen/X86/cmov-promotion.ll
130

This is slightly worse. Maybe don't do this when zext is free?

steven.zhang marked an inline comment as done.Jun 14 2019, 3:50 AM
steven.zhang added inline comments.
llvm/test/CodeGen/X86/cmov-promotion.ll
130

I check the scheduling information for cmovneq and cmovnel, both latency are 1. I didn't catch the "slightly worse" you mean ... Could you explain it more, as I am new to X86 instr. Thank you!

craig.topper added inline comments.Jun 14 2019, 8:16 AM
llvm/test/CodeGen/X86/cmov-promotion.ll
130

Cmovneq’s encoding is 1 byte longer than cmovnel. The 64-bit size requires a REX prefix.

steven.zhang marked an inline comment as done.Jun 18 2019, 12:12 AM
steven.zhang added inline comments.
llvm/test/CodeGen/X86/cmov-promotion.ll
130

I wonder if we can do this inside X86 target, as it seems a valid improvement for x86. For cmovneq, if the high 32bit is zero, use cmovnel ?

spatel added inline comments.Jun 18 2019, 7:25 AM
llvm/test/CodeGen/X86/cmov-promotion.ll
130

The problem is larger than just this transform or x86. We do the same transform in instcombine, so we need to check constant values to reverse it.

But there's no reason to make the problem worse by not using the existing TLI hook suggested by Craig. If we just add one more clause to the 'if' check, we avoid this test diff without changing any others:

if (isa<ConstantSDNode>(Op1) && isa<ConstantSDNode>(Op2) &&
    (Opcode != ISD::ZERO_EXTEND || !TLI.isZExtFree(N0.getValueType(), VT))) {
steven.zhang marked an inline comment as done.Jun 18 2019, 9:52 PM
steven.zhang added inline comments.
llvm/test/CodeGen/X86/cmov-promotion.ll
130

Ah, sorry, I didn't get the point. Sure, it makes sense.

Add the isZextFree check.

This comment was removed by steven.zhang.
craig.topper added a comment.EditedJun 18 2019, 11:24 PM

X86 looks ok to me other than i8->i64 zext problem.

llvm/test/CodeGen/X86/cmov-promotion.ll
59

The zextisfree check isn't enough to fix this :( It's an i8->i64 zext which isn't free. I guess we'll have to handle this in the x86 backend.

steven.zhang marked an inline comment as done.Jun 19 2019, 12:35 AM
steven.zhang added inline comments.
llvm/test/CodeGen/X86/cmov-promotion.ll
59

Yes. The hook only check the type i32->i64. Do we need to pass the value instead of the type for the isZextFree to fix this issue ?

spatel added inline comments.Jun 19 2019, 6:58 AM
llvm/test/CodeGen/X86/cmov-promotion.ll
59

I don't think changing the TLI call will be enough to solve the general problem (the IR is likely already in the form that we're trying to avoid).

steven.zhang marked an inline comment as done.Jun 19 2019, 7:03 PM

As the X86 review is done (Another issue exposed by this patch, and x86 backend will work on that ?), let's continue the review for the PowerPC backend and DAGCombine logic.

llvm/test/CodeGen/X86/cmov-promotion.ll
59

ok. That sounds to be another issue.

spatel added inline comments.Jun 20 2019, 6:00 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8647–8648

This sentence is not clear to me. Is this better?
"For any_extend, choose sign extension of the constants to allow a possible further transform to sign_extend_inreg."

I'm not sure if using sign extension is the best choice in all cases, but if there are no visible test regressions, I guess that's ok for now.

Update the comments.

spatel accepted this revision.Jun 25 2019, 8:40 AM

LGTM

This revision is now accepted and ready to land.Jun 25 2019, 8:40 AM
This revision was automatically updated to reflect the committed changes.