Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.
Maybe add RISCV tests? That's another target that stores compare results as 0/1 instead of 0/-1 - could you also pre-commit the tests for RISCV/SystemZ with current codegen and rebase the patch to show the diff?
Unless I'm missing something trunk was already doing this fold. I added this as a test case because, without checking if we could fold away the not by converting back to a sign extend, we would have regressed here. After this patch the codegen of sext_of_not_cmp and dec_of_zexted_cmp should be identical. I can remove these if you think they're redundant.
I might have missed something - how are we guarding against these 2 transforms getting into an infinite loop?
Shouldn't that "zext" be "sext"? We are creating sext in this block.
We do not usually speculatively create nodes like that 'Not' end then not explicitly delete it if it wasn't necessary.
It's not clear to me from the description or code comments why we are doing this transform at all. This is the opposite of the patch title?
I don't believe this is a danger because going from (sext (not i1 x)) to (add (zext i1 x), -1) requires (not i1 x), and we only go back when we immediately fold away the created (not i1 x). We could instead decide not to do the initial fold if we have a (not i1 x), but if we fold to (add (zext i1 x), -1) first we have the opportunity to constant fold the -1 away, and additionally cover cases where we initially have the (add (zext i1 x), -1) version. (eg in the dec_of_zexted_cmp tests added to RISCV and SystemZ)
No, because the sext is only better if we can fold away the not. As the comment below indicates. The actual fold that creates the zext of this form is not here, but in visitSIGN_EXTEND.
I will update the patch to delete the Not node if not necessary.
As for why we are doing this, we would regress in some cases (eg in the sext_of_not_cmp tests added to RISCV and SystemZ) where we would have otherwise folded away the (not i1 x). In general as the TODO above indicates we should probably have a TLI method to determine whether we want to prefer the zext or sext versions of this and several other folds. It might make sense to use TLI.getBooleanContents for this, as we already do for the (add (sext i1 Y), X) -> (sub X, (zext i1 Y)) fold in visitADDLikeCommutative.
Yes - using getBooleanContents seems like a better option.
I tried an implementation using getBooleanContents here and in a few other places, and it seems promising, but there are a few regressions I was having a hard time nailing down. I would like to save this for a follow up patch if possible.
While inverting a setcc is probably the most common way we would be able to absorb the not, I'm don't know if it makes sense to only check that case. At the least, I think (not (or x, y)) -> (and (not x), (not y)) iff x or y are setcc could also be useful.
Ok - I understand now, but I think we can improve the order of the statements to make it clearer.
// add (zext i1 X), -1 -> sext (not i1 X) // Usually, we transform this pattern in the opposite // direction because most targets generate better code for // the zext form. However, if we can eliminate the 'not', the // sext form should be better. // TODO: ...
Please add a code comment with something like this:
// The speculatively created 'not' node could not be folded away, // so delete it immediately to avoid conflicting with other folds.
Can we use this here:
/// Returns true if \p V is a bitwise not operation. Assumes that an all ones /// constant is canonicalized to be operand 1. bool isBitwiseNot(SDValue V, bool AllowUndefs = false);
Using isBitwiseNot below as suggested caused us to regress in a new case. Upon investigating, I've realized my argument for using visitXOR here is incorrect. At the time we try to fold away the created not, the value we're trying to fold the not into has two uses, the new not, and the original zext. This causes most of the folds in visitXOR not to fire. I'm not sure there is a good way to overcome this. I believe conditioning the initial fold on having a non-foldable not is probably the better solution.
Avoid folding cases where (not i1 x) can already be folded
Switch to only matching sign extends from scalar i1s. This is what the X86 backend is currently doing. I don't believe this transform is always better for vector types as it requires materializing an all ones vector.