If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This appears to have injected a failure into Halide codegen; I'll be providing a repro case shortly.
Still trying to get something self-contained for you, but the difference in codegen is something like:
before:
%193 = shufflevector <2 x double> %179, <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> ... %200 = insertelement <4 x double> %193, double 1.000000e+02, i32 3 %201 = insertelement <4 x double> %200, double %184, i32 2
after:
%193 = shufflevector <2 x double> %179, <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> ... %200 = insertelement <4 x double> %193, double %184, i32 2
Hmm, very strange, the patch removes extra shuffle instruction, it has nothing to do with the insertelements. Waiting for the reproducer for investigation.
Some raw info from debugging one of our likely failure cases:
ShuffleOrOp = InsertElement
E->dump() is:
0. Operand 0: <4 x double> <double poison, double poison, double poison, double 1.000000e+02> %269 = insertelement <4 x double> <double poison, double poison, double poison, double 1.000000e+02>, double %218, i32 0 Operand 1: %218 = load double, double* %210, align 8, !tbaa !38 %215 = load double, double* %214, align 8, !tbaa !38 Scalars: %269 = insertelement <4 x double> <double poison, double poison, double poison, double 1.000000e+02>, double %218, i32 0 %270 = insertelement <4 x double> %269, double %215, i32 1 State: Vectorize MainOp: %269 = insertelement <4 x double> <double poison, double poison, double poison, double 1.000000e+02>, double %218, i32 0 AltOp: %269 = insertelement <4 x double> <double poison, double poison, double poison, double 1.000000e+02>, double %218, i32 0 VectorizedValue: NULL ReuseShuffleIndices: Empty ReorderIndices: UserTreeIndices:
VL0->dump() is:
%269 = insertelement <4 x double> <double poison, double poison, double poison, double 1.000000e+02>, double %218, i32 0
The initial value for V (i.e., vectorizeTree(E->getOperand(1))) is:
%217 = load <2 x double>, <2 x double>* %216, align 8, !tbaa !38
We calculate:
NumElts = 4, NumScalars = 2, IsIdentity = 1 , Offset = 0
Thus we replace V with Builder.CreateShuffleVector(V, Mask), yielding a new V:
%268 = shufflevector <2 x double> %217, <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
Since Offset = 0, we (now) skip inserting the extra shufflevector, but this is wrong, since the extra shufflevector would produce:
%269 = shufflevector <4 x double> <double poison, double poison, double poison, double 1.000000e+02>, <4 x double> %268, <4 x i32> <i32 4, i32 5, i32 2, i32 3>
It's not entirely clear to me how the intended logic of this patch is supposed to work, but skipping the extra shufflevector when Offset == 0 seems puzzling here; in this case, we are definitely producing a meaningfully different result in this case (essentially, inserting value of 100 at index 3). I'm not sure what the correct fix is here (still looking) but I think this should give you enough info to make progress. (If not, we should should revert this change pending a proper fix.)
Yes, you're right, need an extra check here for the FirstInsert->getOperand(0), will fix it.
As this change is causing miscompiles on halide, I think it wise to revert until fixed. Any objections?
To be fair, it's (potentially) causing miscompiles everywhere; Halide just exercises vectorization a lot more than most LLVM clients :-)
Thanks! Please cc me on the updated patch and I will test it locally to verify that Halide is happy with it.