This is an archive of the discontinued LLVM Phabricator instance.

[x86] fix miscompile in buildvector v16i8 lowering
ClosedPublic

Authored by spatel on Jul 7 2020, 9:16 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
lebedev.ri

Commits

rG642eed37134d: [x86] fix miscompile in buildvector v16i8 lowering

Summary

In the test based on PR46586:
https://bugs.llvm.org/show_bug.cgi?id=46586
...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch.

(It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.)

This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jul 7 2020, 9:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2020, 9:16 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

ouch! LGTM - cheers.

IIRC I attempted to add a DAGCombine for ANY/ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR()) for something similar to the poor codegen - I can't remember what the problem was I hit though.

This revision is now accepted and ready to land.Jul 7 2020, 9:27 AM

LGTM

Closed by commit rG642eed37134d: [x86] fix miscompile in buildvector v16i8 lowering (authored by spatel). · Explain WhyJul 7 2020, 10:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

7 lines

test/

CodeGen/

X86/

buildvec-insertvec.ll

3 lines

Diff 276114

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,996 Lines • ▼ Show 20 Lines	if (NextIsNonZero) {
NextElt = DAG.getNode(ISD::SHL, dl, MVT::i32, NextElt,		NextElt = DAG.getNode(ISD::SHL, dl, MVT::i32, NextElt,
DAG.getConstant(8, dl, MVT::i8));		DAG.getConstant(8, dl, MVT::i8));
if (ThisIsNonZero)		if (ThisIsNonZero)
Elt = DAG.getNode(ISD::OR, dl, MVT::i32, NextElt, Elt);		Elt = DAG.getNode(ISD::OR, dl, MVT::i32, NextElt, Elt);
else		else
Elt = NextElt;		Elt = NextElt;
}		}

// If our first insertion is not the first index then insert into zero		// If our first insertion is not the first index or zeros are needed, then
// vector to break any register dependency else use SCALAR_TO_VECTOR.		// insert into zero vector. Otherwise, use SCALAR_TO_VECTOR (leaves high
		// elements undefined).
if (!V) {		if (!V) {
if (i != 0)		if (i != 0 \|\| NumZero)
V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl);		V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl);
else {		else {
V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt);		V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt);
V = DAG.getBitcast(MVT::v8i16, V);		V = DAG.getBitcast(MVT::v8i16, V);
continue;		continue;
}		}
}		}
Elt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i16, Elt);		Elt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i16, Elt);
▲ Show 20 Lines • Show All 42,072 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/buildvec-insertvec.ll

Show First 20 Lines • Show All 778 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%2 = extractelement <4 x i32> %1, i32 %a0		%2 = extractelement <4 x i32> %1, i32 %a0
%3 = extractelement <4 x i32> <i32 30, i32 53, i32 42, i32 12>, i32 %2		%3 = extractelement <4 x i32> <i32 30, i32 53, i32 42, i32 12>, i32 %2
%4 = extractelement <4 x i32> zeroinitializer, i32 %2		%4 = extractelement <4 x i32> zeroinitializer, i32 %2
%5 = insertelement <4 x i32> undef, i32 %3, i32 undef		%5 = insertelement <4 x i32> undef, i32 %3, i32 undef
store i32 %4, i32* undef		store i32 %4, i32* undef
ret <4 x i32> %5		ret <4 x i32> %5
}		}

; FIXME: If we do not define all bytes that are extracted, this is a miscompile.		; If we do not define all bytes that are extracted, this is a miscompile.

define i32 @PR46586(i8* %p, <4 x i32> %v) {		define i32 @PR46586(i8* %p, <4 x i32> %v) {
; SSE2-LABEL: PR46586:		; SSE2-LABEL: PR46586:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movzbl 3(%rdi), %eax		; SSE2-NEXT: movzbl 3(%rdi), %eax
		; SSE2-NEXT: pxor %xmm1, %xmm1
; SSE2-NEXT: pinsrw $6, %eax, %xmm1		; SSE2-NEXT: pinsrw $6, %eax, %xmm1
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3]
; SSE2-NEXT: movd %xmm1, %eax		; SSE2-NEXT: movd %xmm1, %eax
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]
; SSE2-NEXT: movd %xmm0, %ecx		; SSE2-NEXT: movd %xmm0, %ecx
; SSE2-NEXT: xorl %edx, %edx		; SSE2-NEXT: xorl %edx, %edx
; SSE2-NEXT: divl %ecx		; SSE2-NEXT: divl %ecx
; SSE2-NEXT: movl %edx, %eax		; SSE2-NEXT: movl %edx, %eax
Show All 32 Lines