This is an archive of the discontinued LLVM Phabricator instance.

[Profile[ Enhance expect lowering to handle correlated branches
ClosedPublic

Authored by davidxl on May 13 2017, 11:59 AM.

Details

Summary

There are situations buitlin_expect is used on a set of conditions (&&, ||) where there is only one expect builtin call inserted. There is also situations where the result of builtin_expect is not fed into any branches (e.g, a ternary expression). In the above cases, the lowering phase fails to annotate the all related branches.

This patch enhances the lowering phase by doing 'backward' analysis with Phidefs. For examples, see the test cases.

Diff Detail

Event Timeline

davidxl created this revision.May 13 2017, 11:59 AM
wmi edited edge metadata.May 14 2017, 5:37 PM

Overall looks good. Some minor comments inlined.

lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
112–119

BinaryOp should already be canonicalized here, so ConstantInt will be in Opnd1.

142

docminating --> dominating

159

wolk -> walk

162–164

isa<ConstantInt> (PhiOpnd) can be replaced by
ConstantInt *CI;
if (CI = dyn_cast<ConstantInt>(PhiOpnd))

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
26

Maybe add the whole br statement? Otherwise we may check a !prof annotated to a different branch other than we expect.

davidxl marked 4 inline comments as done.May 16 2017, 2:43 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
112–119

Done. added an assert.

davidxl updated this revision to Diff 99208.May 16 2017, 2:43 PM

Address Wei's review feebback.

chandlerc requested changes to this revision.May 16 2017, 3:06 PM
chandlerc added a subscriber: chandlerc.
chandlerc added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
86–93

Please use a doxygen comment, and format it consistently with modern documentation comments in LLVM.

94

LLVM always uses static void not void static.

94

Use a reference since this cannot be null.

95

ArgValue seems redundant, just Arg?

99–100

This comment isn't really verbose enough to help th ereader out. Can you use proper prose and expand on it so that folks not familiar with the specific pattern you're thinking of ('copy chain'?) understand what is going on?

102

We write infinite loops more often as 'for (;;)', but I'm not sure this really needs to use that pattern...

113

You can't assert that IR is in canonical form. You can only assert that it would pass the verifier.

But you don't need to *handle* non-canonical form, you can just fail to add the metadata.

115–117

If you just want to test for zero, you can do that directly without zext-ing the constant...

134–135

You could just use one variable above (V or Phi) and write the loop to essentially be:

// Walk up through the operands until we find a PHI node.
while (!isa<PHINode>(V)) {
  ...
}

auto *PN = dyn_cast<PHINode>(V);
if (!PN)
  // Nothing to do if we cannot find a PHI node.
  return;
147–150

If the CFG isn't simplified, we need to walk more than one. If it is, we don't need to walk any. Do you have cases where this was necessary?

157

This comment isn't really helping me. From the code, I know you're walking the PHI operands, but I don't know why or what you're planning to do with them.

176

I think this is definitely a place where explaining more would help. The comment as is doesn't really help me read the code.

184–185

Why not just call BR->setMetadata twice above? The variable doesn't seem to save a lot here...

337–340

This doesn't erase a PHI node though... Maybe this comment just needs updating?

chandlerc added inline comments.May 16 2017, 3:06 PM
test/Transforms/LowerExpectIntrinsic/phi_merge.ll
15–25

Can you minimize the test cases rather than including all of the boilerplate introduced by Clang?

Similarly, no use variadic function calls here now that it is an LLVM IR test case.

This revision now requires changes to proceed.May 16 2017, 3:06 PM
davidxl marked 9 inline comments as done.May 16 2017, 3:59 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
113

Removed.

115–117

This is not for constant but the value flow.

134–135

The loop is not greatly simplified and more readable.

147–150

The comment is stale. We need to handle diamond shaped control. See also phi_tern.ll for an example (where cfg is not simplified).

337–340

It is referring to the removal after this call.

davidxl updated this revision to Diff 99222.May 16 2017, 3:59 PM
davidxl edited edge metadata.

Address Chandler's feedbacks.

chandlerc added inline comments.May 16 2017, 5:08 PM
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
97

You don't check whether this is null... Either add a check or use cast?

100

nit: Some vertical whitespace before this comment block would make it easier to read IMO.

121–136

I think all of this can now be re-written using early return though, and I think that plus removing some unnecessary variables helps make the rest of the loop more readable:

BinaryOperator *BinOp = dyn_cast<BinaryOperator>(V)
if (!BinOp || BinOp->getOpcode() != Instruction::Xor)
  return;
  
ConstantInt *COperand = dyn_cast<ConstantInt>(BinOp->getOperand(1));
if (!COperand)
  return;

if (!COperand->isZero())
  ValueInverted = !ValueInverted;

V = BinOp->getOperand(0);

This also makes it more clear (at least to me) that there is a bug here. When the value isn't an i1 we can't assume that all non-zero xors negate. I think this will need to check that the constant operand is specifically either zero or all ones. Which ends up looking like:

ConstantInt *COperand = dyn_cast<ConstantInt>(BinOp->getOperand(1));
if (!COperand)
  return;

if (!COperand->isZero()) {
  // If this isn't a no-op copy, check that it is exactly a negation.
  if (!COperand->isMinusOne())
    return;

  ValueInverted = !ValueInverted;
}

V = BinOp->getOperand(0);
140

nit: I actually prefer the dyn_cast line right before the condition. Also, declare this here with auto *?

144–145

How do you want this to handle non-i1 expect calls?

149

Capturing by copy is pretty expensive. Given that this is called only from within this function, capturing by reference seems a bit more appropriate?

181–186

Reading this loop, I felt like this could end up setting the same metadata on a branch multiple times...

But reading this I feel like this isn't actually the case. But I'm also not exactly sure how this is working. I mean, I understand at a high level the cases you're trying to handle (I've read the test cases), but it really isn't clear how *this* code is effectively doing that.

Maybe an ascii-art diagram of the two cases you're trying to handle here would help? Maybe specifically talking about why we're looping over all of the phi's operands? (I had asked for the latter already...)

337–340

Ok, but you've written this comment right above code that erases the *call* and not any phi node, so it seems a bit confusing to write the comment this way. And erasing the phi node doesn't seem as relevant as erasing the call...

davidxl marked 5 inline comments as done.May 16 2017, 8:28 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
144–145

As other cases, arbitrary value builtin_expect is not yet properly handled. One possible use of it is to annotate switch statement, but it is pretty uncommon.

I added a TODO there

181–186

I used a lambda function and I think it is clearer now.

davidxl updated this revision to Diff 99234.May 16 2017, 8:29 PM

Address review feedback.

Any more comments?

chandlerc added inline comments.May 19 2017, 11:04 AM
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
97

This doesn't appear to be done.

100

nit: This is a blonk comment line rather than a blank line?

125

My suggestion was somewhat specifically to use early return here rather than break?

127

This comment isn't really helping. The code already is clearly handling XOR.

I would suggest something more like before you get the value as a binary operator saying:

// Lastly, we try to handle binary operators that either copy
// or invert the boolean value. These are canonicalized in LLVM
// to an `xor` instruction.

Once you have that, not sure you even need a comment here at all.

128–129

These variables don't seem to help anything. They're each only used once. Sink the initializer to the one usage below?

136–139

This looks like it is wrong when the xor operand is neither zero nor -1 (possibly due to being on a zero-extended value). We move V to the operand which might be the phi, and then break out.

As I had suggested, I find this easier to read when structured using early return and it also avoids the bug:

if (!CInt->isZero()) {
  if (!CInt->isMinusOne())
    return;

  ValueInverted = !ValueInverted;
}

That said, I would also personally only assign V to the operand after this block just to avoid confusion.

144–145

I don't think a TODO is really enough. We shouldn't put completely wrong probabilities on something just because we don't handle non-i1 expects. I'm fine if you just want to check at the start of this that the type is i1 and return otherwise.

154

nit: BI is more commonly used for a branch instruction than BR in LLVM.

166

An old comment that didn't get addressed but is on a different line now:
"""
This comment isn't really helping me. From the code, I know you're walking the PHI operands, but I don't know why or what you're planning to do with them.
"""

175–177

We typically use 'FIXME'. I'd put this comment up on the computing of IsPhiValueNonZero as that's when this became confusing for me.

I also don't think the IsUnlikely variable is useful here. I would just sink it into the if.

The comment Not an interesting case: doesn't really tell the reader much. I think it would be more helpful to explain why only matched operands and values are meaningful to turn into metadata.

187–192

Minor prose suggestions:

"Figuring out which" -> "Test whether a particular"
"that leads" -> "leads"

"triangular shaped" should either be just "triangular" or "triangle shaped". I mildly prefer the former.

Ending this with a ':' doesn't make much sense to me.

Overall, the second sentence here doesn't help me as much. I would maybe suggest putting the explanation of the "how does this work" attached to the code inside the lambda. I'll suggest things there.

199

Maybe comment:

// If this successor is the incoming block for this Phi operand, then
// this successor does lead to the Phi.

However, I feel like this needs to check that OpndIncomingBB->getUniquePredecessor() returns non-null. Otherwise this might not be an actual diamond.

201–202

Maybe comment:

// Otherwise, if the edge is directly from the branch to the Phi,
// this successor is the one feeding this Phi operand.
337–340

Outstanding comment.

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
13

CHECK-LABEL?

15–25

Can still remove the '(...)' from the call instructions...

39

I typically put checks after the IR they are matching against.

54

CHECK-LABEL?

64

It would be good to make this a more precise check by first checking for the branch, and then checking that the profile data isn't there:

; CHECK: br i1 %tmp13,
; CHECK-NOT: !prof
davidxl marked 10 inline comments as done.May 19 2017, 9:43 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
136–139

I made some flow changes to match previous patterns. It is more readable now.

166

Added more comments.

175–177

Added more explanation.

199

I added the comment. There is no need to check unique predecessor here though. The existance of other predecessors does not invalidate the fact that any incoming edge to OpndIncomingBB is unlikely.

davidxl updated this revision to Diff 99663.May 19 2017, 9:44 PM

Addressed review comments.

Any more comments?

chandlerc added inline comments.May 26 2017, 7:06 PM
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
86–87

nit: wording might be clearer as:

Handler for PHINodes that define the value argument to an @llvm.expect call.
89–90

nit: no need to repeate the summary here. I think you can jump to If the operand ....

98

If you're just going to assert, you should use cast rather than dyn_cast -- it does exactly this, it asserts, but it does so more cheaply in non-asserts builds.

144–145

I'm still not seeing any response to my follow-up.

198

nit: ':' -> '.'.

200–203

I think this API is overall a bit confusing because we call it with integer indices, but it captures a specific branch and phi instruction...

Maybe, rather than getting the incoming block inside the lambda, why not pull the computation of OpndIncomingBB out of the lambda entirely? It doesn't change between the calls below.

Similarly, I feel like it would be a bit more readable to pass in the successor basic block and call BI->getSuccessor() in the call as that avoids documenting the magical 1 and 0 indices.

202

OperandIdx and SuccIdx would be better parameter names here I think.

337

Again, I think 'Before erasing the phi' is confusing here. This code doesn't erase the phi node. I think this should say 'Before erasing the llvm.expect call, ...' which can then make it easier to refer to the first argument in the rest of the sentence.

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
15–25

Outstanding comment.

davidxl marked 10 inline comments as done.May 26 2017, 9:25 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
100

Removed the comment marker

144–145

I am not sure what your followup question is, but non-i1 case in the patch is handled correctly (but conservatively) -- basically no wrong probability will be emitted -- and it will early return as you expected (isUnlikely will be false). I enhanced the comment.

200–203

I modified the function.

337

Ouch, I meant llvm.expect.

davidxl updated this revision to Diff 100522.May 26 2017, 9:26 PM
davidxl marked an inline comment as done.

Address Chandler's feedbacks.

davidxl updated this revision to Diff 101089.Jun 1 2017, 1:05 PM

Added support to handle integer with any bit-width. New test cases are added to test int type other than i1.

some of minor nits a a small thing left... thanks for the handling and the nice test cases around larger integers!

lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
96

You'll need to use the APInt so this can handle larger than 64-bit integers (or you'll need to bail out when the integer requires more than 64-bits). The APInt code should be pretty straight forward here though.

(also, add a test case (regardless of which direction you go) with 73 bit integer or some such?)

116–117

MathExtras.h provides maskTrailingOnes which implements this. However, if you end up moving to APInt it has a pretty rich interface for doing these kinds of operations. You may be able to just call .zext on it to model the zext.

122–124

It's a bit surprising that these two statements are in the opposite order as above... But anyways, doesn't this need to be a sign extend of the expected value?

181–188

FWIW, this comment now really helps me with the whole thing. Thanks!

203

Hoist this above the lambda, and thin skip the lambda argument? It doesn't vary between calls.

204–205

Skip the variables and just inline the successor access?

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
15–25

Still outstanding...

120

Add a test (somewhere) for sext when the incoming value from the Phi node will get is -1? I'd probably use an i8 so its easy to write 255 as the value and sign extend it but expect that it is still 255. I think that test case would do the wrong thing currently? (see comment above, but maybe I've misread the code above.)

271–276

Note that in this test case and all the others you can strip off the attributes and the metadata here.

davidxl marked 8 inline comments as done.Jun 1 2017, 5:45 PM
davidxl added inline comments.
lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
96

Rewrote the code to use APInt. Also do a forward computation from the operand value using the recorded operations and compare with the expected value. This is more flexible which allows us to handle other operations in the future.

test/Transforms/LowerExpectIntrinsic/phi_merge.ll
120

Added two sext test case.

From i8 255 --> sext to i64 and compare with 255 -- does not match -- unlikely annotation emitted.
From i8 255 --> sext to i64 and compare with -1 --> match -- no annotation.

davidxl updated this revision to Diff 101150.Jun 1 2017, 5:46 PM
davidxl marked an inline comment as done.

Address review feebacks.

chandlerc accepted this revision.Jun 1 2017, 6:01 PM

Cool! I like the replay logic. It's a little alarming that we will replay arbitrary number of operations on an arbitrary number of phi inputs, but I'm *really* not worried here and the simplicity and clarity of the logic resulting from simple replay is great.

The remaining comments are just superficial stuff. If they make sense to you, feel free to go ahead and commit with those changes. If something isn't clear or needs more discussion, just lemme know.

lib/Transforms/Scalar/LowerExpectIntrinsic.cpp
98–101

Probably want to update this comment to explain that we're now recording the operations so we can replay them.

135

Heh, for once, you can omit the return type! ;] Maybe add a brief comment explaining this lambda replays the operations on an APInt?

Or could name it something more explanatory like ApplyOperations

137

You can use a range based for loop:

for (Instruction *Op : llvm::reverse(Operations)) {
141–142

Just use cast here, and likely skip the variable.

186–191

I would sink the comparison into the condition. The variable 'IsUnlikely' isn't as helpful as the comment here, and its only used for the if.

This revision is now accepted and ready to land.Jun 1 2017, 6:01 PM
davidxl marked 5 inline comments as done.Jun 1 2017, 6:51 PM

thanks for the review! I am pretty happy with the final version .

This revision was automatically updated to reflect the committed changes.