bug fix for https://reviews.llvm.org/D64551
Prevent combining X86ISD::VZEXT_MOVL into wrong DAGs
https://reviews.llvm.org/D64551 has produced a bug in llvm:
For t.ll, llvm without this patch produces the correct asm while llvm with this patch produces bad asm:
llvm with this patch:
vmovd xmm0, dword ptr [rdi + 260] # xmm0 = mem[0],zero,zero,zero vmovd xmm1, dword ptr [rdi + 252] # xmm1 = mem[0],zero,zero,zero vpunpcklqdq xmm0, xmm1, xmm0 # xmm0 = xmm1[0],xmm0[0] vxorps xmm1, xmm1, xmm1 vinserti128 ymm0, ymm1, xmm0, 1 vmovdqa ymmword ptr [rsi + 672], ymm0
llvm without this patch:
vmovq xmm0, qword ptr [rdi + 252] # xmm0 = mem[0],zero movabs rax, offset .LCPI0_0 vmovdqa xmm1, xmmword ptr [rax] vpshufb xmm0, xmm0, xmm1 vmovd xmm1, dword ptr [rdi + 260] # xmm1 = mem[0],zero,zero,zero vpunpcklqdq xmm0, xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0] vxorps xmm1, xmm1, xmm1 vinserti128 ymm0, ymm1, xmm0, 1 vmovdqa ymmword ptr [rsi + 672], ymm0
When I compared the ‘llc -debug’ output, I found the procedure of Combining: t53: v4i32 = X86ISD::VZEXT_MOVL t43 are different:
llvm without this patch:
Legalizing: t53: v4i32 = X86ISD::VZEXT_MOVL t43 Legal node: nothing to do Combining: t53: v4i32 = X86ISD::VZEXT_MOVL t43 Creating new node: t55: v2i64 = undef Creating constant: t56: i8 = Constant<4> Creating constant: t57: i8 = Constant<5> Creating constant: t58: i8 = Constant<6> Creating constant: t59: i8 = Constant<7> Creating constant: t60: i8 = Constant<-1>
llvm with this patch:
Legalizing: t53: v4i32 = X86ISD::VZEXT_MOVL t43 Legal node: nothing to do Combining: t53: v4i32 = X86ISD::VZEXT_MOVL t43
Then I used ‘dbg llc’ and found that EltsFromConsecutiveLoads will return SDValue() while llvm with this patch won’t.
The return SDValue() was deleted by your patch:
if (!ISD::isNON_EXTLoad(Elt.getNode())) return SDValue();
So, I added these two lines back to llvm with this patch, then output asm is correct and the same with llvm without this patch:
This misses the entire point of D64551 which adds support for drilling down through other ops to find a NON_EXTLoad with a byte offset.