This is an archive of the discontinued LLVM Phabricator instance.

[Profile[ Enhance expect lowering to handle correlated branches
ClosedPublic

Authored by davidxl on May 13 2017, 11:59 AM.

Details

Summary

There are situations buitlin_expect is used on a set of conditions (&&, ||) where there is only one expect builtin call inserted. There is also situations where the result of builtin_expect is not fed into any branches (e.g, a ternary expression). In the above cases, the lowering phase fails to annotate the all related branches.

This patch enhances the lowering phase by doing 'backward' analysis with Phidefs. For examples, see the test cases.

Diff Detail

Repository
rL LLVM

Event Timeline

davidxl created this revision.May 13 2017, 11:59 AM
wmi edited edge metadata.May 14 2017, 5:37 PM

Overall looks good. Some minor comments inlined.

lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
112–119 ↗(On Diff #98899)

BinaryOp should already be canonicalized here, so ConstantInt will be in Opnd1.

142 ↗(On Diff #98899)

docminating --> dominating

159 ↗(On Diff #98899)

wolk -> walk

162–164 ↗(On Diff #98899)

isa<ConstantInt> (PhiOpnd) can be replaced by
ConstantInt *CI;
if (CI = dyn_cast<ConstantInt>(PhiOpnd))

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
25 ↗(On Diff #98899)

Maybe add the whole br statement? Otherwise we may check a !prof annotated to a different branch other than we expect.

davidxl marked 4 inline comments as done.May 16 2017, 2:43 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
112–119 ↗(On Diff #98899)

Done. added an assert.

davidxl updated this revision to Diff 99208.May 16 2017, 2:43 PM

Address Wei's review feebback.

chandlerc requested changes to this revision.May 16 2017, 3:06 PM
chandlerc added a subscriber: chandlerc.
chandlerc added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
86–93 ↗(On Diff #99208)

Please use a doxygen comment, and format it consistently with modern documentation comments in LLVM.

94 ↗(On Diff #99208)

LLVM always uses static void not void static.

94 ↗(On Diff #99208)

Use a reference since this cannot be null.

95 ↗(On Diff #99208)

ArgValue seems redundant, just Arg?

99–100 ↗(On Diff #99208)

This comment isn't really verbose enough to help th ereader out. Can you use proper prose and expand on it so that folks not familiar with the specific pattern you're thinking of ('copy chain'?) understand what is going on?

102 ↗(On Diff #99208)

We write infinite loops more often as 'for (;;)', but I'm not sure this really needs to use that pattern...

113 ↗(On Diff #99208)

You can't assert that IR is in canonical form. You can only assert that it would pass the verifier.

But you don't need to *handle* non-canonical form, you can just fail to add the metadata.

115–117 ↗(On Diff #99208)

If you just want to test for zero, you can do that directly without zext-ing the constant...

134–135 ↗(On Diff #99208)

You could just use one variable above (V or Phi) and write the loop to essentially be:

// Walk up through the operands until we find a PHI node.
while (!isa<PHINode>(V)) {
  ...
}

auto *PN = dyn_cast<PHINode>(V);
if (!PN)
  // Nothing to do if we cannot find a PHI node.
  return;
147–150 ↗(On Diff #99208)

If the CFG isn't simplified, we need to walk more than one. If it is, we don't need to walk any. Do you have cases where this was necessary?

157 ↗(On Diff #99208)

This comment isn't really helping me. From the code, I know you're walking the PHI operands, but I don't know why or what you're planning to do with them.

176 ↗(On Diff #99208)

I think this is definitely a place where explaining more would help. The comment as is doesn't really help me read the code.

184–185 ↗(On Diff #99208)

Why not just call BR->setMetadata twice above? The variable doesn't seem to save a lot here...

279–282 ↗(On Diff #99208)

This doesn't erase a PHI node though... Maybe this comment just needs updating?

chandlerc added inline comments.May 16 2017, 3:06 PM
test/Transforms/LowerExpectIntrinsic/phi_merge.ll
14–24 ↗(On Diff #99208)

Can you minimize the test cases rather than including all of the boilerplate introduced by Clang?

Similarly, no use variadic function calls here now that it is an LLVM IR test case.

This revision now requires changes to proceed.May 16 2017, 3:06 PM
davidxl marked 9 inline comments as done.May 16 2017, 3:59 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
113 ↗(On Diff #99208)

Removed.

115–117 ↗(On Diff #99208)

This is not for constant but the value flow.

134–135 ↗(On Diff #99208)

The loop is not greatly simplified and more readable.

147–150 ↗(On Diff #99208)

The comment is stale. We need to handle diamond shaped control. See also phi_tern.ll for an example (where cfg is not simplified).

279–282 ↗(On Diff #99208)

It is referring to the removal after this call.

davidxl updated this revision to Diff 99222.May 16 2017, 3:59 PM
davidxl edited edge metadata.

Address Chandler's feedbacks.

chandlerc added inline comments.May 16 2017, 5:08 PM
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
97 ↗(On Diff #99222)

You don't check whether this is null... Either add a check or use cast?

100 ↗(On Diff #99222)

nit: Some vertical whitespace before this comment block would make it easier to read IMO.

121–136 ↗(On Diff #99222)

I think all of this can now be re-written using early return though, and I think that plus removing some unnecessary variables helps make the rest of the loop more readable:

BinaryOperator *BinOp = dyn_cast<BinaryOperator>(V)
if (!BinOp || BinOp->getOpcode() != Instruction::Xor)
  return;
  
ConstantInt *COperand = dyn_cast<ConstantInt>(BinOp->getOperand(1));
if (!COperand)
  return;

if (!COperand->isZero())
  ValueInverted = !ValueInverted;

V = BinOp->getOperand(0);

This also makes it more clear (at least to me) that there is a bug here. When the value isn't an i1 we can't assume that all non-zero xors negate. I think this will need to check that the constant operand is specifically either zero or all ones. Which ends up looking like:

ConstantInt *COperand = dyn_cast<ConstantInt>(BinOp->getOperand(1));
if (!COperand)
  return;

if (!COperand->isZero()) {
  // If this isn't a no-op copy, check that it is exactly a negation.
  if (!COperand->isMinusOne())
    return;

  ValueInverted = !ValueInverted;
}

V = BinOp->getOperand(0);
140 ↗(On Diff #99222)

nit: I actually prefer the dyn_cast line right before the condition. Also, declare this here with auto *?

144–145 ↗(On Diff #99222)

How do you want this to handle non-i1 expect calls?

149 ↗(On Diff #99222)

Capturing by copy is pretty expensive. Given that this is called only from within this function, capturing by reference seems a bit more appropriate?

181–186 ↗(On Diff #99222)

Reading this loop, I felt like this could end up setting the same metadata on a branch multiple times...

But reading this I feel like this isn't actually the case. But I'm also not exactly sure how this is working. I mean, I understand at a high level the cases you're trying to handle (I've read the test cases), but it really isn't clear how *this* code is effectively doing that.

Maybe an ascii-art diagram of the two cases you're trying to handle here would help? Maybe specifically talking about why we're looping over all of the phi's operands? (I had asked for the latter already...)

279–282 ↗(On Diff #99208)

Ok, but you've written this comment right above code that erases the *call* and not any phi node, so it seems a bit confusing to write the comment this way. And erasing the phi node doesn't seem as relevant as erasing the call...

davidxl marked 5 inline comments as done.May 16 2017, 8:28 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
144–145 ↗(On Diff #99222)

As other cases, arbitrary value builtin_expect is not yet properly handled. One possible use of it is to annotate switch statement, but it is pretty uncommon.

I added a TODO there

181–186 ↗(On Diff #99222)

I used a lambda function and I think it is clearer now.

davidxl updated this revision to Diff 99234.May 16 2017, 8:29 PM

Address review feedback.

Any more comments?

chandlerc added inline comments.May 19 2017, 11:04 AM
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
125 ↗(On Diff #99234)

My suggestion was somewhat specifically to use early return here rather than break?

127 ↗(On Diff #99234)

This comment isn't really helping. The code already is clearly handling XOR.

I would suggest something more like before you get the value as a binary operator saying:

// Lastly, we try to handle binary operators that either copy
// or invert the boolean value. These are canonicalized in LLVM
// to an `xor` instruction.

Once you have that, not sure you even need a comment here at all.

128–129 ↗(On Diff #99234)

These variables don't seem to help anything. They're each only used once. Sink the initializer to the one usage below?

136–139 ↗(On Diff #99234)

This looks like it is wrong when the xor operand is neither zero nor -1 (possibly due to being on a zero-extended value). We move V to the operand which might be the phi, and then break out.

As I had suggested, I find this easier to read when structured using early return and it also avoids the bug:

if (!CInt->isZero()) {
  if (!CInt->isMinusOne())
    return;

  ValueInverted = !ValueInverted;
}

That said, I would also personally only assign V to the operand after this block just to avoid confusion.

154 ↗(On Diff #99234)

nit: BI is more commonly used for a branch instruction than BR in LLVM.

166 ↗(On Diff #99234)

An old comment that didn't get addressed but is on a different line now:
"""
This comment isn't really helping me. From the code, I know you're walking the PHI operands, but I don't know why or what you're planning to do with them.
"""

175–177 ↗(On Diff #99234)

We typically use 'FIXME'. I'd put this comment up on the computing of IsPhiValueNonZero as that's when this became confusing for me.

I also don't think the IsUnlikely variable is useful here. I would just sink it into the if.

The comment Not an interesting case: doesn't really tell the reader much. I think it would be more helpful to explain why only matched operands and values are meaningful to turn into metadata.

187–192 ↗(On Diff #99234)

Minor prose suggestions:

"Figuring out which" -> "Test whether a particular"
"that leads" -> "leads"

"triangular shaped" should either be just "triangular" or "triangle shaped". I mildly prefer the former.

Ending this with a ':' doesn't make much sense to me.

Overall, the second sentence here doesn't help me as much. I would maybe suggest putting the explanation of the "how does this work" attached to the code inside the lambda. I'll suggest things there.

199 ↗(On Diff #99234)

Maybe comment:

// If this successor is the incoming block for this Phi operand, then
// this successor does lead to the Phi.

However, I feel like this needs to check that OpndIncomingBB->getUniquePredecessor() returns non-null. Otherwise this might not be an actual diamond.

201–202 ↗(On Diff #99234)

Maybe comment:

// Otherwise, if the edge is directly from the branch to the Phi,
// this successor is the one feeding this Phi operand.
97 ↗(On Diff #99222)

This doesn't appear to be done.

100 ↗(On Diff #99222)

nit: This is a blonk comment line rather than a blank line?

144–145 ↗(On Diff #99222)

I don't think a TODO is really enough. We shouldn't put completely wrong probabilities on something just because we don't handle non-i1 expects. I'm fine if you just want to check at the start of this that the type is i1 and return otherwise.

279–282 ↗(On Diff #99208)

Outstanding comment.

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
12 ↗(On Diff #99234)

CHECK-LABEL?

38 ↗(On Diff #99234)

I typically put checks after the IR they are matching against.

53 ↗(On Diff #99234)

CHECK-LABEL?

63 ↗(On Diff #99234)

It would be good to make this a more precise check by first checking for the branch, and then checking that the profile data isn't there:

; CHECK: br i1 %tmp13,
; CHECK-NOT: !prof
14–24 ↗(On Diff #99208)

Can still remove the '(...)' from the call instructions...

davidxl marked 10 inline comments as done.May 19 2017, 9:43 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
136–139 ↗(On Diff #99234)

I made some flow changes to match previous patterns. It is more readable now.

166 ↗(On Diff #99234)

Added more comments.

175–177 ↗(On Diff #99234)

Added more explanation.

199 ↗(On Diff #99234)

I added the comment. There is no need to check unique predecessor here though. The existance of other predecessors does not invalidate the fact that any incoming edge to OpndIncomingBB is unlikely.

davidxl updated this revision to Diff 99663.May 19 2017, 9:44 PM

Addressed review comments.

Any more comments?

chandlerc added inline comments.May 26 2017, 7:06 PM
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
86–87 ↗(On Diff #99663)

nit: wording might be clearer as:

Handler for PHINodes that define the value argument to an @llvm.expect call.
89–90 ↗(On Diff #99663)

nit: no need to repeate the summary here. I think you can jump to If the operand ....

98 ↗(On Diff #99663)

If you're just going to assert, you should use cast rather than dyn_cast -- it does exactly this, it asserts, but it does so more cheaply in non-asserts builds.

198 ↗(On Diff #99663)

nit: ':' -> '.'.

200–203 ↗(On Diff #99663)

I think this API is overall a bit confusing because we call it with integer indices, but it captures a specific branch and phi instruction...

Maybe, rather than getting the incoming block inside the lambda, why not pull the computation of OpndIncomingBB out of the lambda entirely? It doesn't change between the calls below.

Similarly, I feel like it would be a bit more readable to pass in the successor basic block and call BI->getSuccessor() in the call as that avoids documenting the magical 1 and 0 indices.

202 ↗(On Diff #99663)

OperandIdx and SuccIdx would be better parameter names here I think.

318 ↗(On Diff #99663)

Again, I think 'Before erasing the phi' is confusing here. This code doesn't erase the phi node. I think this should say 'Before erasing the llvm.expect call, ...' which can then make it easier to refer to the first argument in the rest of the sentence.

144–145 ↗(On Diff #99222)

I'm still not seeing any response to my follow-up.

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
14–24 ↗(On Diff #99208)

Outstanding comment.

davidxl marked 10 inline comments as done.May 26 2017, 9:25 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
200–203 ↗(On Diff #99663)

I modified the function.

318 ↗(On Diff #99663)

Ouch, I meant llvm.expect.

100 ↗(On Diff #99222)

Removed the comment marker

144–145 ↗(On Diff #99222)

I am not sure what your followup question is, but non-i1 case in the patch is handled correctly (but conservatively) -- basically no wrong probability will be emitted -- and it will early return as you expected (isUnlikely will be false). I enhanced the comment.

davidxl updated this revision to Diff 100522.May 26 2017, 9:26 PM
davidxl marked an inline comment as done.

Address Chandler's feedbacks.

davidxl updated this revision to Diff 101089.Jun 1 2017, 1:05 PM

Added support to handle integer with any bit-width. New test cases are added to test int type other than i1.

some of minor nits a a small thing left... thanks for the handling and the nice test cases around larger integers!

lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
96 ↗(On Diff #101089)

You'll need to use the APInt so this can handle larger than 64-bit integers (or you'll need to bail out when the integer requires more than 64-bits). The APInt code should be pretty straight forward here though.

(also, add a test case (regardless of which direction you go) with 73 bit integer or some such?)

116–117 ↗(On Diff #101089)

MathExtras.h provides maskTrailingOnes which implements this. However, if you end up moving to APInt it has a pretty rich interface for doing these kinds of operations. You may be able to just call .zext on it to model the zext.

122–124 ↗(On Diff #101089)

It's a bit surprising that these two statements are in the opposite order as above... But anyways, doesn't this need to be a sign extend of the expected value?

181–188 ↗(On Diff #101089)

FWIW, this comment now really helps me with the whole thing. Thanks!

203 ↗(On Diff #101089)

Hoist this above the lambda, and thin skip the lambda argument? It doesn't vary between calls.

204–205 ↗(On Diff #101089)

Skip the variables and just inline the successor access?

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
119 ↗(On Diff #101089)

Add a test (somewhere) for sext when the incoming value from the Phi node will get is -1? I'd probably use an i8 so its easy to write 255 as the value and sign extend it but expect that it is still 255. I think that test case would do the wrong thing currently? (see comment above, but maybe I've misread the code above.)

270–275 ↗(On Diff #101089)

Note that in this test case and all the others you can strip off the attributes and the metadata here.

14–24 ↗(On Diff #99208)

Still outstanding...

davidxl marked 8 inline comments as done.Jun 1 2017, 5:45 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
96 ↗(On Diff #101089)

Rewrote the code to use APInt. Also do a forward computation from the operand value using the recorded operations and compare with the expected value. This is more flexible which allows us to handle other operations in the future.

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
119 ↗(On Diff #101089)

Added two sext test case.

From i8 255 --> sext to i64 and compare with 255 -- does not match -- unlikely annotation emitted.
From i8 255 --> sext to i64 and compare with -1 --> match -- no annotation.

davidxl updated this revision to Diff 101150.Jun 1 2017, 5:46 PM
davidxl marked an inline comment as done.

Address review feebacks.

chandlerc accepted this revision.Jun 1 2017, 6:01 PM

Cool! I like the replay logic. It's a little alarming that we will replay arbitrary number of operations on an arbitrary number of phi inputs, but I'm *really* not worried here and the simplicity and clarity of the logic resulting from simple replay is great.

The remaining comments are just superficial stuff. If they make sense to you, feel free to go ahead and commit with those changes. If something isn't clear or needs more discussion, just lemme know.

lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
98–101 ↗(On Diff #101150)

Probably want to update this comment to explain that we're now recording the operations so we can replay them.

135 ↗(On Diff #101150)

Heh, for once, you can omit the return type! ;] Maybe add a brief comment explaining this lambda replays the operations on an APInt?

Or could name it something more explanatory like ApplyOperations

137 ↗(On Diff #101150)

You can use a range based for loop:

for (Instruction *Op : llvm::reverse(Operations)) {
141–142 ↗(On Diff #101150)

Just use cast here, and likely skip the variable.

186–191 ↗(On Diff #101150)

I would sink the comparison into the condition. The variable 'IsUnlikely' isn't as helpful as the comment here, and its only used for the if.

This revision is now accepted and ready to land.Jun 1 2017, 6:01 PM
davidxl marked 5 inline comments as done.Jun 1 2017, 6:51 PM

thanks for the review! I am pretty happy with the final version .

This revision was automatically updated to reflect the committed changes.