Changeset View
Changeset View
Standalone View
Standalone View
llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py | ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py | ||||
; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 | FileCheck %s | ; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 | FileCheck %s | ||||
define void @_Z10fooConvertPDv4_xS0_S0_PKS_() { | define void @_Z10fooConvertPDv4_xS0_S0_PKS_() { | ||||
; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_( | ; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_( | ||||
; CHECK-NEXT: entry: | ; CHECK-NEXT: entry: | ||||
; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4 | ; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4 | ||||
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5 | ; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5 | ||||
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0 | ; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0 | ||||
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1 | ; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1 | ||||
; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float> | ; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float> | ||||
; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32> | ; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32> | ||||
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> | ; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> | ||||
; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> undef, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7> | ; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef> | ||||
; CHECK-NEXT: [[TMP7:%.*]] = freeze <8 x i32> [[VECINS_I_5_I1]] | |||||
vdmitrie: I'm not sure I understand well why freeze is generated here. I likely missed something but the… | |||||
ABataevAuthorUnsubmitted It is going to be changed soon to be poisoned instead of undefined. ABataev: It is going to be changed soon to be poisoned instead of undefined. | |||||
vdmitrieUnsubmitted Not Done ReplyInline ActionsAh, okay. Hence there must be a discussion about that somewhere. Can you please point out to the discussion? Since this is yet anticipated change the patch description should definitely be more detailed. vdmitrie: Ah, okay. Hence there must be a discussion about that somewhere. Can you please point out to… | |||||
ABataevAuthorUnsubmitted ABataev: https://reviews.llvm.org/D93818 | |||||
ABataevAuthorUnsubmitted Done ABataev: Done | |||||
; CHECK-NEXT: ret void | ; CHECK-NEXT: ret void | ||||
; | ; | ||||
entry: | entry: | ||||
%0 = extractelement <16 x half> undef, i32 4 | %0 = extractelement <16 x half> undef, i32 4 | ||||
%conv.i.4.i = fpext half %0 to float | %conv.i.4.i = fpext half %0 to float | ||||
%1 = bitcast float %conv.i.4.i to i32 | %1 = bitcast float %conv.i.4.i to i32 | ||||
%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4 | %vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4 | ||||
%2 = extractelement <16 x half> undef, i32 5 | %2 = extractelement <16 x half> undef, i32 5 | ||||
%conv.i.5.i = fpext half %2 to float | %conv.i.5.i = fpext half %2 to float | ||||
%3 = bitcast float %conv.i.5.i to i32 | %3 = bitcast float %conv.i.5.i to i32 | ||||
%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5 | %vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5 | ||||
ret void | ret void | ||||
} | } |
I'm not sure I understand well why freeze is generated here. I likely missed something but the LangRef says:
"If the shuffle mask is undefined, the result vector is undefined. If the shuffle mask selects an undefined element from one of the input vectors, the resulting element is undefined. An undefined element in the mask vector specifies that the resulting element is undefined. An undefined element in the mask vector prevents a poisoned vector element from propagating."
So here the first two lanes of VECINS_I_5_I1 are taken from TMP6, all the rest is undef. The for TMP6 the first two lanes taken from TMP5, all the rest of TM5 is undef, TMP5 is a bitcast of TMP4, both lines of which are normal values. It means that all lanes of VECINS_I_5_I1 are either normal values or undef. So why freeze is here?
It would be nice if you extend patch description with some information about what is deemed to be a canonical form of a shuffle.