Page MenuHomePhabricator

Update BFI when handling inlined landing / eh pad
Needs ReviewPublic

Authored by hhb on May 25 2020, 2:33 AM.

Details

Summary

When handling inlined landing / eh pad, we may split a block. But we didn't copy BFI to the new block.

This leads to non-deterministic behavior.
In some cases, BlockFrequencyInfoImplBase::getBlockFreq can detect this unknown block and return 0.
In other rare cases, the BasicBlock* happens to be an address used before, an old frequency value will be returned.

This error can be easily detected if -check-bfi-unknown-block-queries is enabled.

https://bugs.llvm.org/show_bug.cgi?id=45789

Diff Detail

Event Timeline

hhb created this revision.May 25 2020, 2:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2020, 2:33 AM
kongyi added a subscriber: kongyi.May 25 2020, 6:24 AM
hhb updated this revision to Diff 266091.May 25 2020, 5:28 PM

Don't output block name when a node is invalid: it is almost certainly fail.

hhb updated this revision to Diff 266093.May 25 2020, 5:47 PM

Revert to last version

mtrofin added a subscriber: mtrofin.EditedMay 30 2020, 8:42 PM

Shouldn't this happen as part of BasicBlock::splitBasicBlock? Or probably BasicBlockUtils.cpp (llvm::SplitBlock).

There should also probably be a call to BlockFrequencyInfoImpl::forgetBlock that we're missing somewhere, that triggers this in the first place.

This is a good find, and I think we should identify the places that introduce fragility (i.e. deletions of BBs without telling BFI to forget - there seems to have been a design about this, BFICallbackVH, can't seem to find uses of it though)

hhb added a comment.Jun 1 2020, 1:33 PM

Shouldn't this happen as part of BasicBlock::splitBasicBlock? Or probably BasicBlockUtils.cpp (llvm::SplitBlock).

There should also probably be a call to BlockFrequencyInfoImpl::forgetBlock that we're missing somewhere, that triggers this in the first place.

This is a good find, and I think we should identify the places that introduce fragility (i.e. deletions of BBs without telling BFI to forget - there seems to have been a design about this, BFICallbackVH, can't seem to find uses of it though)

I do want to put this into splitBasicBlock and other related functions. But it seems too intrusive. Consider the situation that all analysis passes adding a parameter to splitBasicBlock...

There are probably many places where forgetBlock is missing. Given that forgetBlock is not even exposed through BlockFrequencyInfo.

But yes this fix is not reliable. All places where a new block is added, need to also update BFI. There's no way to guarantee that.

A couple of ways to fix:

  1. Remove BFI "defaults to 0" behavior. This way if a new block is not added to BFI, compiler will likely crash and we will notice. But many other places depends on default 0 and difficult to change.
  2. Make sure removed blocks are also removed from BFI. We may need some event mechanism. I'll check BFICallbackVH.
  3. Use some other identifier that will not be reused, instead of pointer. It will be a huge change.
  4. Add a unit test and enable -check-bfi-unknown-block-queries for it. This is probably the only viable way to go...
In D80510#2067235, @hhb wrote:

Shouldn't this happen as part of BasicBlock::splitBasicBlock? Or probably BasicBlockUtils.cpp (llvm::SplitBlock).

There should also probably be a call to BlockFrequencyInfoImpl::forgetBlock that we're missing somewhere, that triggers this in the first place.

This is a good find, and I think we should identify the places that introduce fragility (i.e. deletions of BBs without telling BFI to forget - there seems to have been a design about this, BFICallbackVH, can't seem to find uses of it though)

I do want to put this into splitBasicBlock and other related functions. But it seems too intrusive. Consider the situation that all analysis passes adding a parameter to splitBasicBlock...

There are probably many places where forgetBlock is missing. Given that forgetBlock is not even exposed through BlockFrequencyInfo.

But yes this fix is not reliable. All places where a new block is added, need to also update BFI. There's no way to guarantee that.

A couple of ways to fix:

  1. Remove BFI "defaults to 0" behavior. This way if a new block is not added to BFI, compiler will likely crash and we will notice. But many other places depends on default 0 and difficult to change.
  2. Make sure removed blocks are also removed from BFI. We may need some event mechanism. I'll check BFICallbackVH.
  3. Use some other identifier that will not be reused, instead of pointer. It will be a huge change.
  4. Add a unit test and enable -check-bfi-unknown-block-queries for it. This is probably the only viable way to go...

I believe point 2 is critically important, and points 1 or 4 are "nice to have": we need to remove the source of non-determinism due to BB pointers being used as keys, and not removed. If a map has a key value "p1" from a BB1, and BB1 is deleted, it is possible, and non-deterministic, that a BB2 be allocated at p, and this would appear as a known BB. -check-bfi-unknown-block-queries would only help with the resulting BB from a split not having BFI data.

hhb added a comment.Jun 1 2020, 2:46 PM

D75341 and comments there seem highly related.

hhb added a reviewer: mtrofin.Jun 1 2020, 2:47 PM
hhb edited the summary of this revision. (Show Details)Jun 1 2020, 2:50 PM
hhb added a comment.Jun 1 2020, 2:56 PM
In D80510#2067446, @hhb wrote:

D75341 and comments there seem highly related.

Actually that change supposes to fix this kind of non-deterministic. Not sure why it does not work as intended.

We may still want to copy BFI when split BBs. But other concerns should have been solved by CallbackVH...

In D80510#2067484, @hhb wrote:
In D80510#2067446, @hhb wrote:

D75341 and comments there seem highly related.

Actually that change supposes to fix this kind of non-deterministic. Not sure why it does not work as intended.

We may still want to copy BFI when split BBs. But other concerns should have been solved by CallbackVH...

Huh.. added davidxl for more background, too, he remembered something along the lines of this non-determinism being addressed somehow (probably via the CallbackVH)

hhb added a comment.Jun 1 2020, 3:55 PM
In D80510#2067484, @hhb wrote:
In D80510#2067446, @hhb wrote:

D75341 and comments there seem highly related.

Actually that change supposes to fix this kind of non-deterministic. Not sure why it does not work as intended.

We may still want to copy BFI when split BBs. But other concerns should have been solved by CallbackVH...

Huh.. added davidxl for more background, too, he remembered something along the lines of this non-determinism being addressed somehow (probably via the CallbackVH)

Aha it is simply because BFICallbackVH is new enough to not include in Android tree (that's what I test against). I will try again with latest llvm.

the related fix is https://reviews.llvm.org/D75341 which will turn non-determinstic behavior into 'returning default unknown' which helps -check-bfi-unknown-block-queries to find the issue.

hhb added a comment.Jun 1 2020, 5:17 PM
In D80510#2067617, @hhb wrote:
In D80510#2067484, @hhb wrote:
In D80510#2067446, @hhb wrote:

D75341 and comments there seem highly related.

Actually that change supposes to fix this kind of non-deterministic. Not sure why it does not work as intended.

We may still want to copy BFI when split BBs. But other concerns should have been solved by CallbackVH...

Huh.. added davidxl for more background, too, he remembered something along the lines of this non-determinism being addressed somehow (probably via the CallbackVH)

Aha it is simply because BFICallbackVH is new enough to not include in Android tree (that's what I test against). I will try again with latest llvm.

D75341 did fix the issue. Then this change is kind of optional. I'll close the bug.

The direction of the patch is still valid. The patch I mentioned only fixes the non-deterministicness of the problem -- the resulting profile data is still fully updated -- though for landing pad, the default frequency seems ok.