This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
ClosedPublic

Authored by laytonio on Nov 16 2020, 10:51 PM.

Details

Summary

Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Diff Detail

Event Timeline

laytonio created this revision.Nov 16 2020, 10:51 PM
laytonio requested review of this revision.Nov 16 2020, 10:51 PM

Maybe add RISCV tests? That's another target that stores compare results as 0/1 instead of 0/-1 - could you also pre-commit the tests for RISCV/SystemZ with current codegen and rebase the patch to show the diff?

@laytonio Please can you rebase? BTW I extended the RISCV tests to include i64 tests as well as i32

laytonio updated this revision to Diff 306234.Nov 18 2020, 4:09 PM

Thanks!

Rebase

RKSimon added inline comments.
llvm/test/CodeGen/RISCV/sext-zext-trunc.ll
477

Not sure what comment should be here if RISCV doesn't actually do the fold

lkail added a subscriber: lkail.Nov 19 2020, 2:55 AM
laytonio added inline comments.Nov 19 2020, 4:42 AM
llvm/test/CodeGen/RISCV/sext-zext-trunc.ll
477

Unless I'm missing something trunk was already doing this fold. I added this as a test case because, without checking if we could fold away the not by converting back to a sign extend, we would have regressed here. After this patch the codegen of sext_of_not_cmp and dec_of_zexted_cmp should be identical. I can remove these if you think they're redundant.

Does anyone have any more commments?

I might have missed something - how are we guarding against these 2 transforms getting into an infinite loop?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2313

Shouldn't that "zext" be "sext"? We are creating sext in this block.

2325–2327

We do not usually speculatively create nodes like that 'Not' end then not explicitly delete it if it wasn't necessary.

It's not clear to me from the description or code comments why we are doing this transform at all. This is the opposite of the patch title?

I might have missed something - how are we guarding against these 2 transforms getting into an infinite loop?

I don't believe this is a danger because going from (sext (not i1 x)) to (add (zext i1 x), -1) requires (not i1 x), and we only go back when we immediately fold away the created (not i1 x). We could instead decide not to do the initial fold if we have a (not i1 x), but if we fold to (add (zext i1 x), -1) first we have the opportunity to constant fold the -1 away, and additionally cover cases where we initially have the (add (zext i1 x), -1) version. (eg in the dec_of_zexted_cmp tests added to RISCV and SystemZ)

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2313

No, because the sext is only better if we can fold away the not. As the comment below indicates. The actual fold that creates the zext of this form is not here, but in visitSIGN_EXTEND.

2325–2327

I will update the patch to delete the Not node if not necessary.

As for why we are doing this, we would regress in some cases (eg in the sext_of_not_cmp tests added to RISCV and SystemZ) where we would have otherwise folded away the (not i1 x). In general as the TODO above indicates we should probably have a TLI method to determine whether we want to prefer the zext or sext versions of this and several other folds. It might make sense to use TLI.getBooleanContents for this, as we already do for the (add (sext i1 Y), X) -> (sub X, (zext i1 Y)) fold in visitADDLikeCommutative.

spatel added inline comments.Nov 21 2020, 10:23 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2325–2327

Yes - using getBooleanContents seems like a better option.
Another possibility is that we extend the pattern matching to include a setcc node (assuming that can absorb the not op).

laytonio updated this revision to Diff 307026.Nov 23 2020, 3:26 AM

Remove speculatively create (not i1 x), when not used.

laytonio added inline comments.Nov 23 2020, 3:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2325–2327

I tried an implementation using getBooleanContents here and in a few other places, and it seems promising, but there are a few regressions I was having a hard time nailing down. I would like to save this for a follow up patch if possible.

While inverting a setcc is probably the most common way we would be able to absorb the not, I'm don't know if it makes sense to only check that case. At the least, I think (not (or x, y)) -> (and (not x), (not y)) iff x or y are setcc could also be useful.

spatel added inline comments.Nov 23 2020, 5:26 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2313

Ok - I understand now, but I think we can improve the order of the statements to make it clearer.
How about:

// add (zext i1 X), -1 -> sext (not i1 X)
// Usually, we transform this pattern in the opposite 
// direction because most targets generate better code for
// the zext form. However, if we can eliminate the 'not', the
// sext form should be better.
// TODO: ...
2329

Please add a code comment with something like this:

// The speculatively created 'not' node could not be folded away, 
// so delete it immediately to avoid conflicting with other folds.
10584

Can we use this here:

/// Returns true if \p V is a bitwise not operation. Assumes that an all ones
/// constant is canonicalized to be operand 1.
bool isBitwiseNot(SDValue V, bool AllowUndefs = false);
laytonio added inline comments.Nov 24 2020, 9:38 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2325–2327

Using isBitwiseNot below as suggested caused us to regress in a new case. Upon investigating, I've realized my argument for using visitXOR here is incorrect. At the time we try to fold away the created not, the value we're trying to fold the not into has two uses, the new not, and the original zext. This causes most of the folds in visitXOR not to fire. I'm not sure there is a good way to overcome this. I believe conditioning the initial fold on having a non-foldable not is probably the better solution.

lenary removed a subscriber: lenary.Nov 25 2020, 2:25 AM
laytonio updated this revision to Diff 307908.Nov 26 2020, 11:31 AM

Use isBitwiseNot
Avoid folding cases where (not i1 x) can already be folded

Switch to only matching sign extends from scalar i1s. This is what the X86 backend is currently doing. I don't believe this transform is always better for vector types as it requires materializing an all ones vector.

spatel added inline comments.Nov 27 2020, 2:09 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10356

This is unrelated refactoring? Please commit separately as an NFC patch.

laytonio updated this revision to Diff 308494.Nov 30 2020, 3:56 PM

Address comments

laytonio marked 4 inline comments as done.Nov 30 2020, 3:57 PM
spatel accepted this revision.Dec 2 2020, 1:58 PM

LGTM

This revision is now accepted and ready to land.Dec 2 2020, 1:58 PM

Thanks for the review! Could you commit this for me please? Layton Kifer <laytonkifer@gmail.com>

spatel added a comment.Dec 3 2020, 8:44 AM

Thanks for the review! Could you commit this for me please? Layton Kifer <laytonkifer@gmail.com>

Sure - but I missed something: are we excluding vector types intentionally (do we have any test coverage?) or is that planned as a follow-up? Either way, we should add a code comment to explain.

are we excluding vector types intentionally (do we have any test coverage?) or is that planned as a follow-up? Either way, we should add a code comment to explain.

The thought behind excluding vector types was that materializing the all ones constant may be more expensive in some cases. Although thinking about it again, I'm not sure if that is really true as we would remove the constant we are xoring against, most likely making it a wash. I did try allowing vector types and it caused a regression in a single test. That regression did seem like it was avoidable though as it exposed a missing fold of (not (xor (setcc), (setcc))) -> (xor (not (setcc)), (setcc)). Implementing that turned into a bit of a rabbit hole though. What do you think the best approach is here?

are we excluding vector types intentionally (do we have any test coverage?) or is that planned as a follow-up? Either way, we should add a code comment to explain.

The thought behind excluding vector types was that materializing the all ones constant may be more expensive in some cases. Although thinking about it again, I'm not sure if that is really true as we would remove the constant we are xoring against, most likely making it a wash. I did try allowing vector types and it caused a regression in a single test. That regression did seem like it was avoidable though as it exposed a missing fold of (not (xor (setcc), (setcc))) -> (xor (not (setcc)), (setcc)). Implementing that turned into a bit of a rabbit hole though. What do you think the best approach is here?

I think it's ok to proceed with the scalar-only patch to make progress, but put a TODO comment on it about extending to vectors.

This revision was automatically updated to reflect the committed changes.

Thanks you for this! I apologise I hadn't gotten to it yet.

jgorbe added a subscriber: jgorbe.Dec 14 2020, 5:23 PM

Hi! We have bisected a clang crash to this patch. The following reduced test case crashes clang when built with clang -O1 -frounding-math:

template <class> class a {
  int b() { return c == 0.0 ? 0 : -1; }
  int c;
};
template class a<long>;

A debug build of clang produces this "assertion failed" error:

clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm::
SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.

Hi! We have bisected a clang crash to this patch. The following reduced test case crashes clang when built with clang -O1 -frounding-math:

template <class> class a {
  int b() { return c == 0.0 ? 0 : -1; }
  int c;
};
template class a<long>;

A debug build of clang produces this "assertion failed" error:

clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm::
SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.

Thanks, I will look at this right now.