Ignore the multiple use heuristics of the default
implementation, and report cost based on inline immediates. This
is mostly interesting for -0 vs. 0. Gets a few small improvements.
fneg_fadd_0_f16 is a small regression. We could probably avoid this
if we handled folding fneg into div_fixup.
Details
Details
- Reviewers
rampitec foad sebastian-ne Pierre-vh - Group Reviewers
Restricted Project
Diff Detail
Diff Detail
Event Timeline
Comment Actions
Heads up, this is causing an infinite loop in the DAG combiner. I'm working on reducing a test case.
Comment Actions
llc -march=amdgcn -mcpu=gfx1030 hangs on this test case:
define float @f(float %arg) { bb: %i = fmul float %arg, 0.0 %i1 = fsub float 0.0, %i ret float %i1 }
Could you please fix or revert?
Comment Actions
An excerpt from the infinite debug output:
Combining: t7: ch,glue = CopyToReg # D:1 t0, Register:f32 $vgpr0, t3103 Combining: t3103: f32 = fsub # D:1 ConstantFP:f32<0.000000e+00>, t3102 Creating fp constant: t3104: f32 = ConstantFP<-0.000000e+00> Creating new node: t3105: f32 = fmul # D:1 t2, ConstantFP:f32<-0.000000e+00> Creating new node: t3106: f32 = fadd # D:1 t3105, ConstantFP:f32<0.000000e+00> ... into: t3106: f32 = fadd # D:1 t3105, ConstantFP:f32<0.000000e+00> Combining: t7: ch,glue = CopyToReg # D:1 t0, Register:f32 $vgpr0, t3106 Combining: t3106: f32 = fadd # D:1 t3105, ConstantFP:f32<0.000000e+00> Creating new node: t3107: f32 = fmul # D:1 t2, ConstantFP:f32<0.000000e+00> Creating new node: t3108: f32 = fsub # D:1 ConstantFP:f32<0.000000e+00>, t3107 ... into: t3108: f32 = fsub # D:1 ConstantFP:f32<0.000000e+00>, t3107
Comment Actions
I've reverted the patch and added this test case to test/CodeGen/AMDGPU/fneg-combines.new.ll