This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Infer flags from add/gep in any block
ClosedPublic

Authored by reames on Oct 5 2021, 3:03 PM.

Details

Summary

This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

@nikic If you can easily run compile time impact data, doing a sanity check on this patch is probably reasonable.

Diff Detail

Event Timeline

reames created this revision.Oct 5 2021, 3:03 PM
reames requested review of this revision.Oct 5 2021, 3:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 3:03 PM
mkazantsev accepted this revision.Oct 5 2021, 10:46 PM

I'm fine with that, provided that CT impact is good.

This revision is now accepted and ready to land.Oct 5 2021, 10:46 PM
nikic added a comment.Oct 6 2021, 10:05 AM

Compile-time impact looks acceptable: https://llvm-compile-time-tracker.com/compare.php?from=91d15aa0b8bff10bd1ccf279418560d17fea52ff&to=e9112b9b93ef7d96468bae3168c0d96c35d190c3&stat=instructions

The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

I can only make a guess here, but what this might be referring to is the fact that SCEV construction from IR will coalesce add/sub and mul expressions into a single call of getAddExpr/getMulExpr rather than building up a chain of binary adds/muls. Effectively, the change you do here defeats that (for the case where the IR instruction has flags, even if they are inapplicable), because you will end up calling getSCEV on each individual add due to the operand fetch in the poison check.

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll
10–11

Comment needs update.

The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

I can only make a guess here, but what this might be referring to is the fact that SCEV construction from IR will coalesce add/sub and mul expressions into a single call of getAddExpr/getMulExpr rather than building up a chain of binary adds/muls. Effectively, the change you do here defeats that (for the case where the IR instruction has flags, even if they are inapplicable), because you will end up calling getSCEV on each individual add due to the operand fetch in the poison check.

So, while reasonable, this isn't quite right or at least isn't so any longer. When constructing an add reduction tree, we will aggressively collapse into a single add node, dropping flags as needed. This is done inside getAddExpr. The code that runs during parsing - which goes out of it's way to separate by flag type - appears to just be canonicalized back into the flattened form )if not all flags match). Now, maybe there's a difference in the number of nodes from the interior of the tree formed, but that cost should be *very* minimal. (As for one thing, how would you construct an add tree with a non-SCEVable operand type?)

("add" can be replaced with "arithmetic operation" in the above to generalize)

This revision was automatically updated to reflect the committed changes.
nikic added a comment.Oct 6 2021, 11:24 AM

The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

I can only make a guess here, but what this might be referring to is the fact that SCEV construction from IR will coalesce add/sub and mul expressions into a single call of getAddExpr/getMulExpr rather than building up a chain of binary adds/muls. Effectively, the change you do here defeats that (for the case where the IR instruction has flags, even if they are inapplicable), because you will end up calling getSCEV on each individual add due to the operand fetch in the poison check.

So, while reasonable, this isn't quite right or at least isn't so any longer. When constructing an add reduction tree, we will aggressively collapse into a single add node, dropping flags as needed. This is done inside getAddExpr. The code that runs during parsing - which goes out of it's way to separate by flag type - appears to just be canonicalized back into the flattened form )if not all flags match). Now, maybe there's a difference in the number of nodes from the interior of the tree formed, but that cost should be *very* minimal. (As for one thing, how would you construct an add tree with a non-SCEVable operand type?)

I'm not sure I understand what you're saying here. Maybe easier to talk about an example:

%x = add %a, %b
%y = add nuw %x, %c

Let's say we call getSCEV(%y) in a position where flags could not be transferred. Prior to this change, this would call getAddExpr(S(%a), S(%b), S(%c)). After this change it will additionally call getAddExpr(S(%a), S(%b)) as part of the poison check, when evaluating the %x operand. This additional expression will go unused (beyond the poison check). Previously, this would only happen if the flags could actually be transferred.