This is an archive of the discontinued LLVM Phabricator instance.

[MachineBlockPlacement] Remove the pad limit for no-fallthrough loops
Needs ReviewPublic

Authored by chill on Jul 25 2023, 6:32 AM.

Details

Summary

This patch removes the limit on how many padding bytes are allowed to
be inserted in order to align loop blocks that have no fallthrough
edges into them and are either a loop header or are preceded in the
layout by a block in a different loop.

This change gives some small performance improvements on AArch64 and
also makes benchmark results less susceptible for variations due to
block placement.

Diff Detail

Event Timeline

chill created this revision.Jul 25 2023, 6:32 AM
chill requested review of this revision.Jul 25 2023, 6:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 6:32 AM
nikic resigned from this revision.Jul 25 2023, 6:42 AM

(Pretty far outside my area of expertise)

I'm having a hard time following what changes are actually semantically significant here. Can you split the "remove MBB argument from getMaxPermittedBytesForAlignment" part into a separate patch?

I think if someone specifies -max-bytes-for-alignment, the command line argument should not be ignored for non-fallthrough loop headers. It should ideally take precedence over this heuristic.

I would also make this AArch64 specific, as it has not been verified on any other architectures. That is debatable though, it just might be using more space than is desirable. If that can't be done via a basic block arg, maybe getMaxPermittedBytesForAlignment could take a bool indicating whether it is non-fallthrough loop header.

chill added a comment.Jul 26 2023, 5:58 AM

I'm having a hard time following what changes are actually semantically significant here. Can you split the "remove MBB argument from getMaxPermittedBytesForAlignment" part into a separate patch?

Done.

I think if someone specifies -max-bytes-for-alignment, the command line argument should not be ignored for non-fallthrough loop headers. It should ideally take precedence over this heuristic.

Done.

I would also make this AArch64 specific, as it has not been verified on any other architectures. That is debatable though, it just might be using more space than is desirable. If that can't be done via a basic block arg, maybe getMaxPermittedBytesForAlignment could take a bool indicating whether it is non-fallthrough loop header.

The behaviour already exists on all the other architectures (except LoongArch) since they keep TargetLoweringBase::MaxBytesForAlignment initialised to 0, i.e. no limit. Thus I would not expect regressions.
I've added people who touched loop alignment on LoongArch as reviewers.

efriedma added inline comments.Jul 26 2023, 11:48 AM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2954

The !LayoutPred->isSuccessor(ChainBB) check already ensures the padding will never be executed. Given that, I guess the remaining checks here are to try to maximize icache hits. In that context, why are loop headers special? Do we care if LayoutPred is part of a subloop of L?

chill added inline comments.Jul 27 2023, 2:36 AM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2954

Given that, I guess the remaining checks here are to try to maximize icache hits.

Yes, I'd like to avoid excessive padding inside a loop.

In that context, why are loop headers special?

They aren't really special, that check for loop header is a shortcut for not having to dive into
getLoopFor(), IIUC findBestLoopTop will place the loop header first (so the layout predecessor will be
in a different loop), or place another loop block in front of the header with a fallthrough to the header.

Do we care if LayoutPred is part of a subloop of L?

I haven't thought about this case. I'm going to experiment with doing this only for innermost loops.

chill updated this revision to Diff 545084.Jul 28 2023, 3:10 AM
chill marked an inline comment as done.