This is an archive of the discontinued LLVM Phabricator instance.

Relax cross-section branches
ClosedPublic

Authored by dhoekwater on Mar 2 2023, 8:09 PM.

Details

Summary

Because the code layout is not known during compilation, the distance of
cross-section jumps is not knowable at compile-time. Unconditional branches
are relaxed via thunk insertion by the linker, but conditional branches
must be manually relaxed. Because of this, we should assume that any
cross-sectional conditional jumps are out of range. This assumption is
necessary for machine function splitting on Arm.

Diff Detail

Event Timeline

dhoekwater created this revision.Mar 2 2023, 8:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 8:09 PM
dhoekwater updated this revision to Diff 502834.Mar 6 2023, 3:39 PM

Add a missing semicolon

Add unconditional branch relaxation for AArch64 and remove the flag

Make all getCrossSectionBranchDistance instances return uint64_t

Add checks to tests

Rebase updated changes and clang-format

dhoekwater retitled this revision from Add a flag to assume cross-section conditional branches must be relaxed to Relax cross-section branches.May 25 2023, 1:24 PM
dhoekwater published this revision for review.May 25 2023, 1:54 PM
dhoekwater added reviewers: arsenm, hliao, mingmingl.
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 1:54 PM

Clarify documentation comment wording

arsenm added inline comments.Jun 8 2023, 4:48 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
271–273

I think it would be better to just fix the scavenger to tolerate empty blocks.

dhoekwater marked an inline comment as done.Jun 13 2023, 10:45 AM
dhoekwater added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
271–273

Agreed, but I think this would best be done in a different patch. This same approach is used by AMDGPU/SI, RISCV, and LoongArch.

dhoekwater marked an inline comment as done.Jun 15 2023, 3:46 PM

@arsenm Do you have any other feedback?

arsenm added inline comments.Jun 16 2023, 3:40 PM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
600

I'm surprised this would require a new hook. Does seem like it should take the two blocks or section IDs

Alternatively could just add blocks to isBranchOffsetInRange?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
271–273

Ugh. Is this even true anymore with backwards scavenging?

dhoekwater marked 2 inline comments as done.Jun 28 2023, 4:31 PM
dhoekwater added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
271–273

Yeah, this is still true. The signature of the function is Register scavengeRegisterBackwards(const TargetRegisterClass &RC, MachineBasicBlock::iterator To, bool RestoreAfter, int SPAdj, bool AllowSpill = true);, and To must be an iterator to a valid instruction because scavengeRegisterBackwards uses To->getParent(). It seems like fixing that could be reasonable since RegisterScavenger already has a MBB member, but again I think that's best done in a separate patch.

arsenm requested changes to this revision.Jul 11 2023, 1:10 PM

I think argumentless target hooks are generally bad and need to provide some additional context, such that a target could consider different behavior depending on the section

This revision now requires changes to proceed.Jul 11 2023, 1:10 PM
dhoekwater marked an inline comment as done and an inline comment as not done.

Use the code model instead of some arbitrary number for cross-section branches

The idea of a fixed cross-section branch distance isn't really
representative of the cross-section branch issue. The linker
can place two sections any distance away from each other
so long as both fit within the program address range specified
by the code model. Use the code model instead.

I think argumentless target hooks are generally bad and need to provide some additional context, such that a target could consider different behavior depending on the section

I know I'm replacing an argumentless target hook with another, but this does provide some additional context, which is the size restrictions that each code model assumes.

Rebase onto main

Refactor code so it doesn't use virtual registers. Because we only care about the liveness at the end of the basic block, we don't need to use backwards scavenging. This structure also sets us up much more nicely for adding alternate implementations of buildIndirectBranch.

arsenm accepted this revision.Jul 26 2023, 4:41 PM
This revision is now accepted and ready to land.Jul 26 2023, 4:41 PM

Extra brace, oops.

Unconditional branches are relaxed via thunk insertion by the linker, but conditional branches must be manually relaxed. Because of this, we should assume that any cross-sectional conditional jumps are out of range

As-is, it looks like this patch is also relaxing cross-sectional unconditional jumps. Given we can trust the linker to relax these, and we expect the branches to be cold, is there a reason for the compiler to relax them? I guess the linker-generated stub could clobber x16/x17?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
295

Do we know at this point that the function isn't using the red zone? (We usually use the emergency spill slot for this sort of situation.)

dhoekwater marked an inline comment as done.Jul 27 2023, 4:06 PM

Unconditional branches are relaxed via thunk insertion by the linker, but conditional branches must be manually relaxed. Because of this, we should assume that any cross-sectional conditional jumps are out of range

As-is, it looks like this patch is also relaxing cross-sectional unconditional jumps. Given we can trust the linker to relax these, and we expect the branches to be cold, is there a reason for the compiler to relax them? I guess the linker-generated stub could clobber x16/x17?

Yeah, the linker will clobber X16 to relax unconditional jumps. The ABI seems designed around the assumption that out-of-range unconditional jumps only exist at function call boundaries, so we have to work around that.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
295

Because BranchRelaxation runs after Frame Finalization, we can't add an emergency spill slot at this point.

We could add a spill slot to every function just in case spilling is necessary, but I would expect it to degrade performance.

Since Red Zone on AArch64 isn't widely used, it should be alright if it isn't compatible with MFS/BBSections on AArch64. I'll add an explicit check here to assert that we're not doing anything we shouldn't.

Yeah, the linker will clobber X16 to relax unconditional jumps. The ABI seems designed around the assumption that out-of-range unconditional jumps only exist at function call boundaries, so we have to work around that.

Alternatively, we could try to make register allocation understand that branches clobber x16, I guess? Haven't thought through how exactly that would work. The advantages of letting the linker split: one, we can completely avoid the extra instructions in libraries that are small, and two, linker-generated thunks would move the thunking code outside the hot section. But I won't insist on it (we can always revise this part later).

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
295

Can we write some reasonable heuristic for whether we might need to spill? I mean, during frame finalization, we should at least be able to tell if there's any cold code in the function.

If disabling red zone usage in the relevant functions is the simplest way forward, I guess I'm fine with that.

dhoekwater marked an inline comment as done.

Make sure that we don't spill into the red zone of a function.

Alternatively, we could try to make register allocation understand that branches clobber x16, I guess? Haven't thought through how exactly that would work. The advantages of letting the linker split: one, we can completely avoid the extra instructions in libraries that are small, and two, linker-generated thunks would move the thunking code outside the hot section. But I won't insist on it (we can always revise this part later).

I have a follow-up patch to this one that defers relaxation to the linker when possible and picks the manual relaxation strategy that minimally impacts the hot section.
Making register allocation understand x16 clobbering would be a neat idea; I'd definitely like to explore it when fine-tuning MFS. Thanks for the review!

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
295

Can we write some reasonable heuristic for whether we might need to spill?

Since frame finalization happens before MachineFunctionSplitter/BBSections run, there isn't any way to tell what blocks may be hot or cold at this point.

We could hypothetically say "officially, MFS / bbsections are unsupported with Red zone, but they do work 99% of the time" and minimize pushing the stack pointer, but that sounds hairy.

AFAICT, disabling red zone doesn't affect those that use MFS / bbsections. The AArch64 ABI Procedure Call Standard specifies that not writing past the stack pointer is a universal constraint and doesn't mention the red zone at all (although Apple and Windows platforms still respect it).

Simplify and rename the test function for clarity

Make the tests a bit less brittle. This way the tests will better accommodate future optimization.

This comment was removed by dhoekwater.
dhoekwater updated this revision to Diff 547413.Aug 4 2023, 5:08 PM

removed reviewers because I initially added way too many.

This revision was automatically updated to reflect the committed changes.