Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.
Not sure if the other targets here are good or bad. Or if there's some other changes that could be made to recover any loss.
Paths
| Differential D42679
[DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors. ClosedPublic Authored by craig.topper on Jan 30 2018, 1:16 AM.
Details Summary Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code. Not sure if the other targets here are good or bad. Or if there's some other changes that could be made to recover any loss.
Diff Detail
Event TimelineHerald added subscribers: javed.absar, nemanjai, jholewinski. · View Herald TranscriptJan 30 2018, 1:16 AM RKSimon added inline comments.
Comment Actions Rebase after D43063 The NVPTX test got weirder now. The setp.eq.b16instruction was always there, but not part of the CHECKs. The and.b32 is new. Comment Actions
That setp should not be there to start with. AFAICT it's result is not used by anything. I'll take a look at what's going on. Comment Actions
Some of it. Something weird is going on during lowering or i1 arguments in NVPTX and we end up with CopyToReg on a chain, but with no other uses. I've found references to what looks like a similar issue in PPC code (ANDIGlueBug), so it may not be a new issue. It certainly has nothing to do with your patch. LGTM as far as NVPTX is concerned. Comment Actions
That unused setp is an artifact of -O0. Without it (or with any other -O level) NVPTX generates more sensible code (with or without this patch): ld.param.u8 %r1, [test_i1_param_0]; and.b32 %r2, %r1, 1; st.param.b32 [func_retval0+0], %r2; ret; This revision is now accepted and ready to land.Mar 1 2018, 1:04 PM Closed by commit rL326500: [DAGCombiner] When combining zero_extend of a truncate, only mask before… (authored by ctopper). · Explain WhyMar 1 2018, 2:35 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 131932 lib/CodeGen/SelectionDAG/DAGCombiner.cpp
test/CodeGen/AArch64/arm64-aapcs.ll
test/CodeGen/AArch64/arm64-arith.ll
test/CodeGen/AArch64/bitfield.ll
test/CodeGen/NVPTX/param-load-store.ll
test/CodeGen/PowerPC/vec_extract_p9_2.ll
test/CodeGen/SystemZ/insert-05.ll
test/CodeGen/X86/3addr-or.ll
test/CodeGen/X86/avx512cd-intrinsics-upgrade.ll
test/CodeGen/X86/avx512cdvl-intrinsics-upgrade.ll
test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
test/CodeGen/X86/memset-nonzero.ll
test/CodeGen/X86/pr27591.ll
test/CodeGen/X86/pr35763.ll
test/CodeGen/X86/sext-i1.ll
test/CodeGen/X86/swift-return.ll
test/DebugInfo/X86/sdag-combine.ll
|
This looks like the only regression - I think cvt.u32.u16 effectively acts as "ZERO_EXTEND_INREG" in this case (which we don't have).
@jholewinski any thoughts?