Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Maybe add RISCV tests? That's another target that stores compare results as 0/1 instead of 0/-1 - could you also pre-commit the tests for RISCV/SystemZ with current codegen and rebase the patch to show the diff?
@laytonio Please can you rebase? BTW I extended the RISCV tests to include i64 tests as well as i32
llvm/test/CodeGen/RISCV/sext-zext-trunc.ll | ||
---|---|---|
477 | Not sure what comment should be here if RISCV doesn't actually do the fold |
llvm/test/CodeGen/RISCV/sext-zext-trunc.ll | ||
---|---|---|
477 | Unless I'm missing something trunk was already doing this fold. I added this as a test case because, without checking if we could fold away the not by converting back to a sign extend, we would have regressed here. After this patch the codegen of sext_of_not_cmp and dec_of_zexted_cmp should be identical. I can remove these if you think they're redundant. |
I might have missed something - how are we guarding against these 2 transforms getting into an infinite loop?
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2317 | Shouldn't that "zext" be "sext"? We are creating sext in this block. | |
2329–2331 | We do not usually speculatively create nodes like that 'Not' end then not explicitly delete it if it wasn't necessary. It's not clear to me from the description or code comments why we are doing this transform at all. This is the opposite of the patch title? |
I don't believe this is a danger because going from (sext (not i1 x)) to (add (zext i1 x), -1) requires (not i1 x), and we only go back when we immediately fold away the created (not i1 x). We could instead decide not to do the initial fold if we have a (not i1 x), but if we fold to (add (zext i1 x), -1) first we have the opportunity to constant fold the -1 away, and additionally cover cases where we initially have the (add (zext i1 x), -1) version. (eg in the dec_of_zexted_cmp tests added to RISCV and SystemZ)
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2317 | No, because the sext is only better if we can fold away the not. As the comment below indicates. The actual fold that creates the zext of this form is not here, but in visitSIGN_EXTEND. | |
2329–2331 | I will update the patch to delete the Not node if not necessary. As for why we are doing this, we would regress in some cases (eg in the sext_of_not_cmp tests added to RISCV and SystemZ) where we would have otherwise folded away the (not i1 x). In general as the TODO above indicates we should probably have a TLI method to determine whether we want to prefer the zext or sext versions of this and several other folds. It might make sense to use TLI.getBooleanContents for this, as we already do for the (add (sext i1 Y), X) -> (sub X, (zext i1 Y)) fold in visitADDLikeCommutative. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2329–2331 | Yes - using getBooleanContents seems like a better option. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2329–2331 | I tried an implementation using getBooleanContents here and in a few other places, and it seems promising, but there are a few regressions I was having a hard time nailing down. I would like to save this for a follow up patch if possible. While inverting a setcc is probably the most common way we would be able to absorb the not, I'm don't know if it makes sense to only check that case. At the least, I think (not (or x, y)) -> (and (not x), (not y)) iff x or y are setcc could also be useful. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2317 | Ok - I understand now, but I think we can improve the order of the statements to make it clearer. // add (zext i1 X), -1 -> sext (not i1 X) // Usually, we transform this pattern in the opposite // direction because most targets generate better code for // the zext form. However, if we can eliminate the 'not', the // sext form should be better. // TODO: ... | |
2333 | Please add a code comment with something like this: // The speculatively created 'not' node could not be folded away, // so delete it immediately to avoid conflicting with other folds. | |
10668 | Can we use this here: /// Returns true if \p V is a bitwise not operation. Assumes that an all ones /// constant is canonicalized to be operand 1. bool isBitwiseNot(SDValue V, bool AllowUndefs = false); |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2329–2331 | Using isBitwiseNot below as suggested caused us to regress in a new case. Upon investigating, I've realized my argument for using visitXOR here is incorrect. At the time we try to fold away the created not, the value we're trying to fold the not into has two uses, the new not, and the original zext. This causes most of the folds in visitXOR not to fire. I'm not sure there is a good way to overcome this. I believe conditioning the initial fold on having a non-foldable not is probably the better solution. |
Use isBitwiseNot
Avoid folding cases where (not i1 x) can already be folded
Switch to only matching sign extends from scalar i1s. This is what the X86 backend is currently doing. I don't believe this transform is always better for vector types as it requires materializing an all ones vector.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10440 | This is unrelated refactoring? Please commit separately as an NFC patch. |
Thanks for the review! Could you commit this for me please? Layton Kifer <laytonkifer@gmail.com>
Sure - but I missed something: are we excluding vector types intentionally (do we have any test coverage?) or is that planned as a follow-up? Either way, we should add a code comment to explain.
are we excluding vector types intentionally (do we have any test coverage?) or is that planned as a follow-up? Either way, we should add a code comment to explain.
The thought behind excluding vector types was that materializing the all ones constant may be more expensive in some cases. Although thinking about it again, I'm not sure if that is really true as we would remove the constant we are xoring against, most likely making it a wash. I did try allowing vector types and it caused a regression in a single test. That regression did seem like it was avoidable though as it exposed a missing fold of (not (xor (setcc), (setcc))) -> (xor (not (setcc)), (setcc)). Implementing that turned into a bit of a rabbit hole though. What do you think the best approach is here?
I think it's ok to proceed with the scalar-only patch to make progress, but put a TODO comment on it about extending to vectors.
Hi! We have bisected a clang crash to this patch. The following reduced test case crashes clang when built with clang -O1 -frounding-math:
template <class> class a { int b() { return c == 0.0 ? 0 : -1; } int c; }; template class a<long>;
A debug build of clang produces this "assertion failed" error:
clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm:: SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.
Shouldn't that "zext" be "sext"? We are creating sext in this block.