Add support for negation of constant build vectors.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
I haven't enabled handling of undefs in build vector - but can anyone think of a reason that I shouldn't?
Yeah, I think undef should be handled. InstSimplify (and other IR passes IIRC) do it:
define <2 x float> @fsub_-0_-0_x_vec_undef_elts(<2 x float> %a) {
; CHECK-LABEL: @fsub_-0_-0_x_vec_undef_elts(
; CHECK-NEXT: ret <2 x float> [[A:%.*]]
;
%t1 = fsub <2 x float> <float undef, float -0.0>, %a
%ret = fsub <2 x float> <float -0.0, float undef>, %t1
ret <2 x float> %ret
}Comment Actions
LGTM, although I'm not fluent in AMDGPU assembly. You may want to wait a little while to see if an expert comes along.
| lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
|---|---|---|
| 913 ↗ | (On Diff #203570) | Nit: I don't anticipate any regressions from this change, but this could be split-off to a separate patch -- if we're being pedantic. |
Comment Actions
This breaks (at least) PowerPC with the typical DAG Combine cycle (i.e. one combine undoes the other in a cycle). Here's a minimal test case to show this:
define dso_local <4 x double> @sub(double %b, double* nocapture readonly %ptr) local_unnamed_addr {
entry:
%arrayidx = getelementptr inbounds double, double* %ptr, i64 45320
%0 = load double, double* %arrayidx, align 4
%vecinit = insertelement <4 x double> undef, double %0, i32 0
%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 176
%1 = load double, double* %arrayidx1, align 4
%vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1
%arrayidx3 = getelementptr inbounds double, double* %ptr, i64 2734
%2 = load double, double* %arrayidx3, align 4
%vecinit4 = insertelement <4 x double> %vecinit2, double %2, i32 2
%arrayidx5 = getelementptr inbounds double, double* %ptr, i64 7
%3 = load double, double* %arrayidx5, align 4
%vecinit6 = insertelement <4 x double> %vecinit4, double %3, i32 3
%splat.splatinsert = insertelement <4 x double> undef, double %b, i32 0
%splat.splat = shufflevector <4 x double> %splat.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer
%div = fdiv fast <4 x double> %vecinit6, %splat.splat
%sub = fsub fast <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, %div
ret <4 x double> %sub
}Compile with llc -mtriple=powerpc64le-unknown-unknown
Comment Actions
Reduced:
define <4 x double> @sub(double %a0, <4 x double> %a1) {
entry:
%splat.splatinsert = insertelement <4 x double> undef, double %a0, i32 0
%splat.splat = shufflevector <4 x double> %splat.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer
%div = fdiv fast <4 x double> %a1, %splat.splat
%sub = fsub fast <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, %div
ret <4 x double> %sub
}