PTAL
Chad
Differential D32596
[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)). mcrosier on Apr 27 2017, 9:19 AM. Authored by
Details
Diff Detail Event TimelineComment Actions I would guess we should prefer fmul as the canonical form, for the same reason we prefer "shl %x, 1" over "add %x, %x": we optimize instructions where hasOneUse() is true more aggressively. I guess it doesn't make a big difference either way, though. Either way, please add a testcase for "fadd %x, %x", to confirm that we canonicalize both fadd and fmul to the same form. Comment Actions FWIW, I could also fix this in the DAG combiner. The particular case I care about looks like 'a * b - 2.0 * c', but the expression is transformed to 'a * c + -2.0 * c' before we hit the DAG Combine of interest (which only expects a +2.0). Comment Actions Another data-point: reassociate currently prefers to canonicalize "fadd %x, %x", to "fmul %x, 2". We don't want to fight back and forth in IR. Comment Actions Thanks, Eli. I'll work on extending the DAG combine in that case. Does that sound reasonable? Comment Actions Not sure if we need a target hook for this...? Replacing one instruction with two might not always be a good idea. Otherwise looks fine. Comment Actions This is worse for AMDGPU because it requires a larger instruction encoding for f16/f32. For f64 this is better
Comment Actions Address reviewers feedback by narrowing transform so that the negation is "free". @efriedma: This transform now replaces 2 instruction for 2 instruction in the worst case. For AArch64 it's actually replaces 3 for 2 because we now avoid materializing the -2.0. Probably true for other targets. Thank you everyone for your feedback. Very much appreciated.
Comment Actions My only comment is that you may be missing additional cases where the fneg could be folded away, but perhaps those can be fixed in a follow up change. Comment Actions Seems like isNegatibleForFree() would be the place to recognize patterns like this, and then we'd have a corresponding special case for -2.0 in GetNegatedExpression(). As written, this transform should be good for all x86 because it removes a constant load, so no objections, but... Some tests to think about below. We'll fold the first 3 in DAGCombiner after this patch (universally afaict), but InstCombine does nothing with those. Should InstCombine fold fnegs into constants and fsub -> fadd? The last case is transformed partially in InstCombine (div -> mul), but DAGCombiner does nothing with that. It's ok to not have a DAG fold for that because nothing this late is producing an fdiv? define float @add_mul_neg2(float %a, float %b) { %mul = fmul float %b, -2.0 %add = fadd float %a, %mul ret float %add } define float @sub_mul_neg2(float %a, float %b) { %mul = fmul float %b, -2.0 %sub = fsub float %a, %mul ret float %sub } define float @mul_mul_neg2(float %a, float %b) { %mul = fmul float %b, -2.0 %neg = fsub float -0.0, %a %mul2 = fmul float %neg, %mul ret float %mul2 } define float @sub_div_neghalf(float %a, float %b) { %div = fdiv float %b, -0.5 %sub = fsub float %a, %div ret float %sub } Comment Actions I'll investigate this suggestion. Thanks.
Okay, good.
As Eli pointed out, the reassociation pass prefers the fmul with constant (to expose additional factoring opportunities, IIRC). He also pointed out that we should prefer fmul X, 2.0 as the canonical form, for the same reason we prefer "shl %x, 1" over "add %x, %x": we optimize instructions where hasOneUse() is true more aggressively. Given these two implications I went ahead and implemented this as a DAG combine. However, I think it might make sense to have InstCombine canonicalize to the mul with constant form as well.
While InstCombine does nothing, the reassociation pass will canonicalize to (fsub A, (fmul/fdiv B, -C)) -> (fadd A, (fmul/fdiv B, C)) (fadd A, (fmul/fdiv B, -C)) -> (fsub A, (fmul/fdiv B, C)) where C is a constant and (fsub A, (fdiv b, -0.5)) -> (fadd A, (fmul b, 2.0)) for the last test. We could make InstCombine prefer these forms as well and that might make a difference, but it might not. My guess is it probably doesn't matter, but I'll experiment. Comment Actions Thanks for checking those out. I haven't looked at the reassociation pass very much. I'm surprised to see it flip a constant's sign and create an fsub rather than fadd. Any ideas why that is a good thing to do? |