This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] GetNegatedExpression - constant float vector support (PR42105)
ClosedPublic

Authored by RKSimon on Jun 6 2019, 8:50 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jun 6 2019, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2019, 8:50 AM

I haven't enabled handling of undefs in build vector - but can anyone think of a reason that I shouldn't?

Yeah, I think undef should be handled. InstSimplify (and other IR passes IIRC) do it:

define <2 x float> @fsub_-0_-0_x_vec_undef_elts(<2 x float> %a) {
; CHECK-LABEL: @fsub_-0_-0_x_vec_undef_elts(
; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
;
  %t1 = fsub <2 x float> <float undef, float -0.0>, %a
  %ret = fsub <2 x float> <float -0.0, float undef>, %t1
  ret <2 x float> %ret
}
RKSimon updated this revision to Diff 203570.Jun 7 2019, 9:16 AM
RKSimon edited the summary of this revision. (Show Details)

Added undef handling

cameron.mcinally accepted this revision.Jun 7 2019, 12:19 PM

LGTM, although I'm not fluent in AMDGPU assembly. You may want to wait a little while to see if an expert comes along.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
913 ↗(On Diff #203570)

Nit: I don't anticipate any regressions from this change, but this could be split-off to a separate patch -- if we're being pedantic.

This revision is now accepted and ready to land.Jun 7 2019, 12:19 PM

@arsenm Any comments regarding the amdgpu changes?

This revision was automatically updated to reflect the committed changes.

This breaks (at least) PowerPC with the typical DAG Combine cycle (i.e. one combine undoes the other in a cycle). Here's a minimal test case to show this:

define dso_local <4 x double> @sub(double %b, double* nocapture readonly %ptr) local_unnamed_addr {
entry:
  %arrayidx = getelementptr inbounds double, double* %ptr, i64 45320
  %0 = load double, double* %arrayidx, align 4
  %vecinit = insertelement <4 x double> undef, double %0, i32 0
  %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 176
  %1 = load double, double* %arrayidx1, align 4
  %vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1
  %arrayidx3 = getelementptr inbounds double, double* %ptr, i64 2734
  %2 = load double, double* %arrayidx3, align 4
  %vecinit4 = insertelement <4 x double> %vecinit2, double %2, i32 2
  %arrayidx5 = getelementptr inbounds double, double* %ptr, i64 7
  %3 = load double, double* %arrayidx5, align 4
  %vecinit6 = insertelement <4 x double> %vecinit4, double %3, i32 3
  %splat.splatinsert = insertelement <4 x double> undef, double %b, i32 0
  %splat.splat = shufflevector <4 x double> %splat.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer
  %div = fdiv fast <4 x double> %vecinit6, %splat.splat
  %sub = fsub fast <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, %div
  ret <4 x double> %sub
}

Compile with llc -mtriple=powerpc64le-unknown-unknown

Same issue happens with arm64, aarch64, nvptx triples.

Reduced:

define <4 x double> @sub(double %a0, <4 x double> %a1) {
entry:
  %splat.splatinsert = insertelement <4 x double> undef, double %a0, i32 0
  %splat.splat = shufflevector <4 x double> %splat.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer
  %div = fdiv fast <4 x double> %a1, %splat.splat
  %sub = fsub fast <4 x double> <double 0.000000e+00, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, %div
  ret <4 x double> %sub
}