This is an archive of the discontinued LLVM Phabricator instance.

[AArch64CompressJumpTables] Prevent over-compression caused by invalid alignment.
ClosedPublic

Authored by paulwalker-arm on May 5 2023, 4:18 PM.

Details

Summary

AArch64CompressJumpTables assumes it can calculate exact block
offsets. This assumption is bogus because getInstSizeInBytes()
only returns an upper bound rather than an exact size. The
assumption is also invalid when a block alignment is bigger than
the function's alignment.

To mitigate both scenarios this patch changes the algorithm to
compute the maximum upper bound for all block offsets. This is
pessimistic but safe because all offsets are treated as unsigned.

Diff Detail

Event Timeline

paulwalker-arm created this revision.May 5 2023, 4:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 4:18 PM
paulwalker-arm requested review of this revision.May 5 2023, 4:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 4:18 PM

Not sure of the performance ramifications of this patch but this is a bug fix for something I've hit in real world code. Looking at AArch64Subtarget.cpp has the affected targets as:

CortexA76
CortexA77
CortexA78
CortexA78C
CortexR82
CortexX1
CortexX1C
CortexA710
CortexA715
CortexX2
CortexX3
NeoverseN1
NeoverseN2
NeoverseV2
NeoverseV1

I'm not sure where the value is for having function alignments smaller than loop alignments, so matching them up is a potential easy fix to restore jump table compression. Even if that's done I believe this fix is required to catch other places where alignments can be set.

Don't you just need to add "Alignment - FunctionAlignment" bytes to AlignedOffset? The assembler resolves the final offsets, so we don't need precise offsets at the point we're doing the compression; we just need a conservative estimate so we don't over-compress.

Don't you just need to add "Alignment - FunctionAlignment" bytes to AlignedOffset? The assembler resolves the final offsets, so we don't need precise offsets at the point we're doing the compression; we just need a conservative estimate so we don't over-compress.

Is it that simple? Looking at the logic I convinced myself that overestimating the offsets will be just as bad because it can make it look like MinBlock is closer to MaxBlock than it actually is. My assumption is that to be safe we need to calculate min and max offset estimates, with min == max representing the case where the offset is known. Do you agree? or am I over complicating things?

If it is more involved do you object to me getting the fix in first and the following up with the rework?

efriedma added a comment.EditedMay 9 2023, 10:49 AM

If we have the maximum size of each block, and the alignment of each block, we can compute a conservative estimate of the layout of the function:

-- block1 offset
block1 instruction size estimate
block2 padding estimate (alignment - 4)
--block2 offset
block2 instruction size estimate
block3 padding estimate (alignment - 4)
--block3 offset
[...]

The actual distance between two of the block offsets will always be less than an estimate computed using simple subtraction: there isn't any actual alignment computation involved, so blocks can't appear closer than they actually are in either direction.

Looking more closely, though, it looks like it doesn't actually work that way at the moment, though: instead of using an alignment estimate, it uses alignTo(). That's probably wrong; I think getInstSizeInBytes is a conservative estimate in some cases.

So the right fix is instead of:

if (Alignment == Align(1))
  AlignedOffset = Offset;
else
  AlignedOffset = alignTo(Offset, Alignment);

We should do:

if (Alignment <= Align(4))
  AlignedOffset = Offset;
else
  AlignedOffset = Offset + Alignment.value() - 4;

That should solve the issue without specifically referencing the function's alignment.

Sorry of the delay in following up. I've notice that branch relaxation solves the same problem so I've copied its solution and updated the test accordingly.

paulwalker-arm edited the summary of this revision. (Show Details)Jun 6 2023, 10:36 AM
paulwalker-arm retitled this revision from [AArch64CompressJumpTables] Prevent compression when block alignment is bigger than function alignment. to [AArch64CompressJumpTables] Prevent over-compression when block alignment is bigger than function alignment..

I'm pretty sure what branch relaxation is doing is wrong, in a subtle way.

Suppose, at an extreme, you have an instruction that's somewhere between 0 and 124 bytes. So you have a sequence like this:

bb align 128:
ZERO_TO_124_BYTES_INSTR
B another_bb

bb2 align 128:
[...]

Say the start of the block is aligned. We add up the size of the block: ZERO_TO_124_BYTES_INSTR is 124 bytes, B is 4 bytes, so the block is 128 bytes long. Calling alignTo() does nothing, so we conclude there's no padding after the branch. But actually, there can be up to 124 bytes of padding depending on the actual size of ZERO_TO_124_BYTES_INSTR. And that padding increases the distance from the branch to the destination.

To get this right, we must not call alignTo(): trying to compute padding after a block using alignTo() assumes we know the exact offset of the end of the block, which is not a correct assumption. We need to instead estimate the maximum amount of padding the .p2align directive will insert.

I'm not sure if there are any pseudo-instructions that are actually variable-length on aarch64 at the moment, but that's how the interface works in general; such instructions exist on other targets, and they've existed in the past on AArch64. We also need to consider the interaction with MachineOutliner, which can effectively shrink the size of arbitrary sequences of instructions.

Branch relaxation is pretty forgiving, generally, so I'm not surprised nobody has tripped over this.

Updated to always calculate pessimistic offsets based on Eli's education as to what getInstSizeInBytes really means.

paulwalker-arm retitled this revision from [AArch64CompressJumpTables] Prevent over-compression when block alignment is bigger than function alignment. to [AArch64CompressJumpTables] Prevent over-compression caused by invalid alignment..Jun 13 2023, 7:59 AM
paulwalker-arm edited the summary of this revision. (Show Details)
paulwalker-arm edited the summary of this revision. (Show Details)
This revision is now accepted and ready to land.Jun 13 2023, 10:22 AM