This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Adjust the computed branch offset for the possible shorter distance
ClosedPublic

Authored by Carrot on Feb 4 2019, 3:30 PM.

Details

Summary

In file PPCBranchSelector.cpp we tend to over estimate code size due to large alignment and
inline assembly. Usually it causes larger computed branch offset, it is not big problem.
But sometimes it may also causes smaller computed branch offset
than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.

           actual        estimated
           address        address
 ...
bne Far      100            10c
.p2align 4
Near:        110            110
 ...
Far:        8108           8108

Actual offset:    0x8108 - 0x100 = 0x8008
Computed offset:  0x8108 - 0x10c = 0x7ffc

The computed offset is at most ((1 << alignment) - 4) bytes smaller
than actual offset. So we add this number to the offset for safety.

Diff Detail

Repository
rL LLVM

Event Timeline

Carrot created this revision.Feb 4 2019, 3:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2019, 3:30 PM
jsji added a comment.Feb 11 2019, 12:00 PM

Any testcases that can show the problem and test the fix? Thanks.

Any testcases that can show the problem and test the fix? Thanks.

It can only be triggered by a very large (>32KB) function body because the range of conditional branch is +/- 32KB.

In our case, the large function body is caused by aggressive thinlto guided inlining.

So no small test case can demonstrate the problem :(

jsji added a comment.EditedFeb 12 2019, 8:40 AM

I think we should be able to come up with a smaller test case that demonstrate the problem here?
No necessary causing run time problem due to wrong branch,
but causing smaller calculated branch offset, and we can check that by adding a few dbgs() to check calculated branch offset in code?

Also , looks like to me that we don't need to get max alignment here?

lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

Looks like to me that the root cause here is that

  1. we may have conservative (larger) MBBStartOffset due to inline asm or alignment, (eg: 100 -> 10c)
  2. we are subtracting MBBStartOffset here (eg: 0x8108 - 0x10c)

So instead of getting larger BranchSize, subtracting may lead us to getting smaller BranchSize.

For other blocks, even if we may still have conservative (larger) size, it should be OK, as we are adding them towards BranchSize.

So, shouldn't we just need to adjust the value of MBBStartOffset , checking inline asm & alignment when calculating it in previous loop?

Carrot added a comment.EditedFeb 12 2019, 2:44 PM

I think we should be able to come up with a smaller test case that demonstrate the problem here?
No necessary causing run time problem due to wrong branch,
but causing smaller calculated branch offset, and we can check that by adding a few dbgs() to check calculated branch offset in code?

Because of the code at line 137

if (FuncSize < (1 << 15)) {
  BlockSizes.clear();
  return false;
}

Even if you just want to trigger the patched code, the smaller test case still need to be at least 32KB.

Carrot marked an inline comment as done.Feb 12 2019, 3:08 PM
Carrot added inline comments.
lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

The root cause in your understanding is correct!

The inline asm size is handled by

MBBStartOffset += TII->getInstSizeInBytes(*I);

The BB alignment is already handled when computing BlockSizes.

Here the problem is the computed MBBStartOffset can't be precise, and we can't tell what's the difference between its value and the actual address. For the BranchSize, if it is larger than actual offset, it is safe for us. If it is smaller than actual branch offset, the max delta is

(1 << MaxAlign) - 4

Even if you just want to trigger the patched code, the smaller test case still need to be at least 16KB.

On ARM and AArch64, we have special intrinsics int_aarch64_space/int_arm_space which are specifically intended to eat space, to allow constructing small testcases for branch relaxation etc. Maybe worth adapting to PPC if you expect to be writing multiple testcases like that. Actually, you probably don't need an IR intrinsic now that you can write MIR testcases; an MIR instruction should be sufficient.

jsji added inline comments.Feb 12 2019, 3:47 PM
lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

Thanks.
Can we tell whether MBBStartOffset is precise or not by checking whether we meet inline asm & alignment?
And only apply the delta when we are sure they are not precise?

Also is it possible that MBBStartOffset is smaller than (1 << MaxAlign) -4 ?

And can we add some comments about why the max delta is
(1 << MaxAlign) -4 ?

Any testcases that can show the problem and test the fix? Thanks.

It can only be triggered by a very large (>32KB) function body because the range of conditional branch is +/- 32KB.

In our case, the large function body is caused by aggressive thinlto guided inlining.

So no small test case can demonstrate the problem :(

You can use the assembly directive '.space' to create arbitrary sized basic blocks. See test/CodeGen/RISCV/branch-relaxation.ll for examples.

Any testcases that can show the problem and test the fix? Thanks.

It can only be triggered by a very large (>32KB) function body because the range of conditional branch is +/- 32KB.

In our case, the large function body is caused by aggressive thinlto guided inlining.

So no small test case can demonstrate the problem :(

You can use the assembly directive '.space' to create arbitrary sized basic blocks. See test/CodeGen/RISCV/branch-relaxation.ll for examples.

Looks very useful. Will try it. Thanks a lot!

Carrot updated this revision to Diff 186767.Feb 13 2019, 4:11 PM
Carrot marked an inline comment as done.

Add a test case.

Improve the code to adjust branch offset only when we have imprecise code size.

jsji added a comment.Feb 14 2019, 3:16 PM

Thank you for providing the testcase, I think it corrects my understanding.

lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

Actually, after seeing your new comments and the testcase. I think my initial understanding was wrong.
The root cause of the estimate gap is not due to that we are subtracting MBBStartOffset .

The root cause looks like to me now:

  1. We may have conservative (larger) BlockSizes due to inline asm . eg: In your example, we add 4 extra bytes due to label in inline asm
  2. We may get smaller than actual alignment adjustment due to that inaccurate Offset. eg: In your example, we get offset 0, as we though the offset will be 32, however, it is actually 28, so , we should get an adjustment of 12 here! So that is why we are actually -12 to real value here!

So looks like to me, after an inaccurate inline asm BB, all the alignment adjustment of a BB may get a smaller than actual value (negative!). , and since we are adding all these toward BranchSize, I think this problem will not only happen in forward branch, but also in backwards branch.:

So maybe we should fix this in GetAlignmentAdjustment to let is be conservative when a BB contains inline asm?

jsji added inline comments.Feb 14 2019, 3:30 PM
test/CodeGen/PowerPC/branch_selector.ll
1 ↗(On Diff #186767)

Why we require -mcpu=pwr8 here? This should apply to all sub-target?

2 ↗(On Diff #186767)

We should also test by changing default alignment here,
eg: -align-all-functions=3, which will make it pass without fix!

Carrot updated this revision to Diff 187060.Feb 15 2019, 11:58 AM
Carrot marked 3 inline comments as done.

Also handle the backward branch.

Carrot added inline comments.Feb 15 2019, 1:37 PM
lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

So looks like to me, after an inaccurate inline asm BB, all the alignment adjustment of a BB may get a smaller than actual value (negative!). , and since we are adding all these toward BranchSize, I think this problem will not only happen in forward branch, but also in backwards branch.:

Not all estimated alignment adjustment is smaller than actual alignment adjustment. Suppose the estimated bne address is 100, the actual address 9c, the accumulated function size to the end of this block is 104 for estimated case and 100 for actual value. So the following 16 bytes alignment will cause 12 padding bytes for estimated case and no padding for actual code. But this situation is not interesting for us here.
The shorter estimated branch offset can indeed occur in backward branch, so I modified the patch.

So maybe we should fix this in GetAlignmentAdjustment to let is be conservative when a BB contains inline asm?

I don't know how to fix it in GetAlignmentAdjustment. For whatever extra alignment adjustment, you will add the value to branch instruction address and target address in following blocks, but you can't change the alignment padding, and you can't change the estimated branch offset.

test/CodeGen/PowerPC/branch_selector.ll
1 ↗(On Diff #186767)

Without -mcpu=pwr8 llvm doesn't generate aligned loop body. I learned it from test/CodeGen/PowerPC/code-align.ll.

2 ↗(On Diff #186767)

In this case, the imprecise address is caused by inline assembly, the function alignment doesn't matter.

Thanks for updating!

lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

Not all estimated alignment adjustment is smaller than actual alignment adjustment.

Yes, agree, not all adjustment will be smaller, that is why I use *may* in my last comment. :)

may get a smaller than actual value(negative!).

I don't know how to fix it in GetAlignmentAdjustment. For whatever extra alignment adjustment, you will add the value to branch instruction address and target address in following blocks, but you can't change the alignment padding, and you can't change the estimated branch offset.

My first thought was that we return adjustment of (1 << Align) -4 when we know that the Offset is imprecise, alwasy add largest possible padding instead calculating values from Offset ?

But yes looks like this is too conservative.

test/CodeGen/PowerPC/branch_selector.ll
1 ↗(On Diff #186767)

OK. Thanks, so it applys to almost all targets, except the generic ones like ppc64.

Carrot marked an inline comment as done.Feb 15 2019, 7:09 PM
Carrot added inline comments.
lib/Target/PowerPC/PPCBranchSelector.cpp
214 ↗(On Diff #185174)

My first thought was that we return adjustment of (1 << Align) -4 when we know that the Offset is imprecise, alwasy add largest possible padding instead calculating values from Offset ?

If you do this in GetAlignmentAdjustment, it will defeat the objective of GetAlignmentAdjustment, causes following block unaligned, and mess up all following address computation.

I can't help but feel like this patch adds further complication to an already excessively complicated function. To put things in perspective - this file is currently 282 lines and 210 of those lines are in a single function. Frankly, I think refactoring this into a few conceptual sections implemented in separate functions would go a long way.

Seems to me that conceptually this pass does the following:

  • Renumber blocks
  • Compute the size of each block
  • Identify PC-relative branches and compute the branch distance
  • Convert PC-relative branches that branch too far into the "long branch sequence" (i.e. invert branch condition, convert to bc+8, b)

I understand if you don't want to do this refactoring and want to proceed with this patch as-is to solve the immediate problem. However, I thought I would bring this up since it seems like refactoring this would go a long way to making this readable.

I can't help but feel like this patch adds further complication to an already excessively complicated function. To put things in perspective - this file is currently 282 lines and 210 of those lines are in a single function. Frankly, I think refactoring this into a few conceptual sections implemented in separate functions would go a long way.

Seems to me that conceptually this pass does the following:

  • Renumber blocks
  • Compute the size of each block
  • Identify PC-relative branches and compute the branch distance
  • Convert PC-relative branches that branch too far into the "long branch sequence" (i.e. invert branch condition, convert to bc+8, b)

I understand if you don't want to do this refactoring and want to proceed with this patch as-is to solve the immediate problem. However, I thought I would bring this up since it seems like refactoring this would go a long way to making this readable.

I have similar feeling when I read this file at the first time. I will do the refactoring after commit this patch. I prefer to separate bug fixing and refactoring.

I think I misunderstood the original problem this is solving. What I thought it was solving is any inline asm or alignment directives that appear prior to the branch or its target (whichever comes last in the function). However, after looking through the code, it appears that the problem is only if the inline asm or alignment directive appears prior to the branch or its target (whichever comes first in the function).
It would probably be good to explain why it isn't a problem if the block that throws off the computation is between the branch and its target.

lib/Target/PowerPC/PPCBranchSelector.cpp
80 ↗(On Diff #187060)

It seems that this might be better named FirstImpreciseBlock.

200 ↗(On Diff #187060)

I don't actually follow why we don't count the size of the destination block since we're branching to the top of it. Could you add a comment explaining why?

207 ↗(On Diff #187060)

I also don't understand this... How about if the first "imprecise block" is between the destination block and the branch block? Wouldn't we still possibly need the extra adjustment? Namely, shouldn't this say something like MBB.getNumber() >= MinImpreciseBlock?

nemanjai added inline comments.Feb 19 2019, 6:49 PM
lib/Target/PowerPC/PPCBranchSelector.cpp
200 ↗(On Diff #187060)

Oh, nvm. I just realized that you moved the addition of the size of the destination block to before the loop. I suppose the idea is that we do want to consider the destination block's size but not its alignment.

Carrot updated this revision to Diff 187682.Feb 20 2019, 2:45 PM
Carrot marked 3 inline comments as done.

Add comment to explain the case when the inline asm occurs between branch block and dest block.

lib/Target/PowerPC/PPCBranchSelector.cpp
200 ↗(On Diff #187060)

Exactly.

207 ↗(On Diff #187060)

Added comment to explain this situation.

Any other comments?

nemanjai accepted this revision.Mar 4 2019, 4:12 PM

Sorry for the delay. LGTM.

This revision is now accepted and ready to land.Mar 4 2019, 4:12 PM
This revision was automatically updated to reflect the committed changes.