This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] fix crash in getNegatedExpression() from altered number of uses
AbandonedPublic

Authored by spatel on Mar 19 2020, 11:23 AM.

Details

Summary

We've discussed this problem before in D70975. Copying directly from that description:
We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression().

The problem was similarly partly fixed in D75501, but we missed that the same behavior could originate from fmul/fdiv.

This is another incomplete fix for the whole problem, but I think that requires integrating getNegatibleCost() and getNegatedExpression(), so that we can't alter costs between analysis and rewrite.

Diff Detail

Event Timeline

spatel created this revision.Mar 19 2020, 11:23 AM

Shall we also need to update the cost evaluation for FMUL/FDIV, that is calculating both operands and select the cheapest cost of them ? And it seems that FADD also have such issues. -(A+B)->-A - B or -B-A
Technical speaking, we can't have the same expensive cost for both operands when negating the expression but now, we indeed have sometimes. And we choose one by chance which also would have problems.

I am not clear on the origin design but I wonder if we can remove the check of cost before calling the getNegateExpression, but call it directly, which allow to be null value to avoid such kind of out of sync between these two calls.

Shall we also need to update the cost evaluation for FMUL/FDIV, that is calculating both operands and select the cheapest cost of them ? And it seems that FADD also have such issues. -(A+B)->-A - B or -B-A
Technical speaking, we can't have the same expensive cost for both operands when negating the expression but now, we indeed have sometimes. And we choose one by chance which also would have problems.

Yes, this whole API is unsound - we need to rewrite it to be completely correct. But I think this change makes it at least a little bit safer and more similar to the FMA logic. I can add a FIXME comment for FADD, and try to come up with a similar failure case as this one.

I am not clear on the origin design but I wonder if we can remove the check of cost before calling the getNegateExpression, but call it directly, which allow to be null value to avoid such kind of out of sync between these two calls.

Yes, if I understand correctly, we will need to integrate the cost+rewrite functions, so they are always in sync.

steven.zhang added a comment.EditedMar 22 2020, 6:59 PM

We need to update the cost as they are pair. For now, it returns the first operand immediately if it is not expensive. But it could be neutral. And we will have problems if the cost of second operand is cheap. Because we should return cheap, but now, it is neutral. Does it make sense ?

We need to update the cost as they are pair. For now, it returns the first operand immediately if it is not expensive. But it could be neutral. And we will have problems if the cost of second operand is cheap. Because we should return cheap, but now, it is neutral. Does it make sense ?

Sorry, but I'm not understanding. With this patch we have fneg (fmul/fdiv X, Y):

  1. CostX == cheaper == 2; CostY == cheaper == 2 --> negate X
  2. CostX == cheaper == 2; CostY == neutral == 1 --> negate X
  3. CostX == neutral == 1; CostY == cheaper ==2 --> negate Y
  4. CostX == neutral == 1; CostY == neutral == 1 --> negate X

But I have an alternate idea to avoid the crash that is probably a better fix given the current logic:
D76638

We need to update the cost as they are pair. For now, it returns the first operand immediately if it is not expensive. But it could be neutral. And we will have problems if the cost of second operand is cheap. Because we should return cheap, but now, it is neutral. Does it make sense ?

Sorry, but I'm not understanding. With this patch we have fneg (fmul/fdiv X, Y):

  1. CostX == cheaper == 2; CostY == cheaper == 2 --> negate X
  2. CostX == cheaper == 2; CostY == neutral == 1 --> negate X
  3. CostX == neutral == 1; CostY == cheaper ==2 --> negate Y
  4. CostX == neutral == 1; CostY == neutral == 1 --> negate X

But I have an alternate idea to avoid the crash that is probably a better fix given the current logic:
D76638

I mean when evaluating the cost of fmul/fdiv X, Y, not the logic inside getNegatedExpression, but the logic inside getNegatibleCost. We have:

  • CostX == neutral == 1; CostY == cheaper --> Cost(fmul/fdiv) == neutral (Wrong)

This is the code.

case ISD::FMUL:
case ISD::FDIV: {
  // fold (fneg (fmul X, Y)) -> (fmul (fneg X), Y) or (fmul X, (fneg Y))
  NegatibleCost V0 = getNegatibleCost(Op.getOperand(0), DAG, LegalOperations,
                                      ForCodeSize, Depth + 1);
  if (V0 != NegatibleCost::Expensive)    // Steven: if it is neutral, we should not return immediately, we need to check the cost of operand 2 and compare them.
    return V0;

  // Ignore X * 2.0 because that is expected to be canonicalized to X + X.
  if (auto *C = isConstOrConstSplatFP(Op.getOperand(1)))
    if (C->isExactlyValue(2.0) && Op.getOpcode() == ISD::FMUL)
      return NegatibleCost::Expensive;

  return getNegatibleCost(Op.getOperand(1), DAG, LegalOperations, ForCodeSize,
                          Depth + 1);
spatel abandoned this revision.Apr 3 2020, 5:10 AM

Abandoning - D77319 will solve the larger problem.