Add support for negation of constant build vectors.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
I haven't enabled handling of undefs in build vector - but can anyone think of a reason that I shouldn't?
Yeah, I think undef should be handled. InstSimplify (and other IR passes IIRC) do it:
define <2 x float> @fsub_-0_-0_x_vec_undef_elts(<2 x float> %a) { ; CHECK-LABEL: @fsub_-0_-0_x_vec_undef_elts( ; CHECK-NEXT: ret <2 x float> [[A:%.*]] ; %t1 = fsub <2 x float> <float undef, float -0.0>, %a %ret = fsub <2 x float> <float -0.0, float undef>, %t1 ret <2 x float> %ret }
Comment Actions
LGTM, although I'm not fluent in AMDGPU assembly. You may want to wait a little while to see if an expert comes along.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
913 | Nit: I don't anticipate any regressions from this change, but this could be split-off to a separate patch -- if we're being pedantic. |
Comment Actions
This breaks (at least) PowerPC with the typical DAG Combine cycle (i.e. one combine undoes the other in a cycle). Here's a minimal test case to show this:
define dso_local <4 x double> @sub(double %b, double* nocapture readonly %ptr) local_unnamed_addr { entry: %arrayidx = getelementptr inbounds double, double* %ptr, i64 45320 %0 = load double, double* %arrayidx, align 4 %vecinit = insertelement <4 x double> undef, double %0, i32 0 %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 176 %1 = load double, double* %arrayidx1, align 4 %vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1 %arrayidx3 = getelementptr inbounds double, double* %ptr, i64 2734 %2 = load double, double* %arrayidx3, align 4 %vecinit4 = insertelement <4 x double> %vecinit2, double %2, i32 2 %arrayidx5 = getelementptr inbounds double, double* %ptr, i64 7 %3 = load double, double* %arrayidx5, align 4 %vecinit6 = insertelement <4 x double> %vecinit4, double %3, i32 3 %splat.splatinsert = insertelement <4 x double> undef, double %b, i32 0 %splat.splat = shufflevector <4 x double> %splat.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer %div = fdiv fast <4 x double> %vecinit6, %splat.splat %sub = fsub fast <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, %div ret <4 x double> %sub }
Compile with llc -mtriple=powerpc64le-unknown-unknown
Comment Actions
Reduced:
define <4 x double> @sub(double %a0, <4 x double> %a1) { entry: %splat.splatinsert = insertelement <4 x double> undef, double %a0, i32 0 %splat.splat = shufflevector <4 x double> %splat.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer %div = fdiv fast <4 x double> %a1, %splat.splat %sub = fsub fast <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, %div ret <4 x double> %sub }
Nit: I don't anticipate any regressions from this change, but this could be split-off to a separate patch -- if we're being pedantic.