Because the code layout is not known during compilation, the distance of
cross-section jumps is not knowable at compile-time. Unconditional branches
are relaxed via thunk insertion by the linker, but conditional branches
must be manually relaxed. Because of this, we should assume that any
cross-sectional conditional jumps are out of range. This assumption is
necessary for machine function splitting on Arm.
Details
- Reviewers
arsenm mingmingl - Commits
- rGd7bca8e4942b: [AArch64] Relax cross-section branches
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
275–277 | I think it would be better to just fix the scavenger to tolerate empty blocks. |
llvm/include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
600 ↗ | (On Diff #525866) | I'm surprised this would require a new hook. Does seem like it should take the two blocks or section IDs Alternatively could just add blocks to isBranchOffsetInRange? |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
275–277 | Ugh. Is this even true anymore with backwards scavenging? |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
275–277 | Yeah, this is still true. The signature of the function is Register scavengeRegisterBackwards(const TargetRegisterClass &RC, MachineBasicBlock::iterator To, bool RestoreAfter, int SPAdj, bool AllowSpill = true);, and To must be an iterator to a valid instruction because scavengeRegisterBackwards uses To->getParent(). It seems like fixing that could be reasonable since RegisterScavenger already has a MBB member, but again I think that's best done in a separate patch. |
I think argumentless target hooks are generally bad and need to provide some additional context, such that a target could consider different behavior depending on the section
Use the code model instead of some arbitrary number for cross-section branches
The idea of a fixed cross-section branch distance isn't really
representative of the cross-section branch issue. The linker
can place two sections any distance away from each other
so long as both fit within the program address range specified
by the code model. Use the code model instead.
I know I'm replacing an argumentless target hook with another, but this does provide some additional context, which is the size restrictions that each code model assumes.
Refactor code so it doesn't use virtual registers. Because we only care about the liveness at the end of the basic block, we don't need to use backwards scavenging. This structure also sets us up much more nicely for adding alternate implementations of buildIndirectBranch.
Unconditional branches are relaxed via thunk insertion by the linker, but conditional branches must be manually relaxed. Because of this, we should assume that any cross-sectional conditional jumps are out of range
As-is, it looks like this patch is also relaxing cross-sectional unconditional jumps. Given we can trust the linker to relax these, and we expect the branches to be cold, is there a reason for the compiler to relax them? I guess the linker-generated stub could clobber x16/x17?
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
299 | Do we know at this point that the function isn't using the red zone? (We usually use the emergency spill slot for this sort of situation.) |
Yeah, the linker will clobber X16 to relax unconditional jumps. The ABI seems designed around the assumption that out-of-range unconditional jumps only exist at function call boundaries, so we have to work around that.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
299 | Because BranchRelaxation runs after Frame Finalization, we can't add an emergency spill slot at this point. We could add a spill slot to every function just in case spilling is necessary, but I would expect it to degrade performance. Since Red Zone on AArch64 isn't widely used, it should be alright if it isn't compatible with MFS/BBSections on AArch64. I'll add an explicit check here to assert that we're not doing anything we shouldn't. |
Yeah, the linker will clobber X16 to relax unconditional jumps. The ABI seems designed around the assumption that out-of-range unconditional jumps only exist at function call boundaries, so we have to work around that.
Alternatively, we could try to make register allocation understand that branches clobber x16, I guess? Haven't thought through how exactly that would work. The advantages of letting the linker split: one, we can completely avoid the extra instructions in libraries that are small, and two, linker-generated thunks would move the thunking code outside the hot section. But I won't insist on it (we can always revise this part later).
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
299 | Can we write some reasonable heuristic for whether we might need to spill? I mean, during frame finalization, we should at least be able to tell if there's any cold code in the function. If disabling red zone usage in the relevant functions is the simplest way forward, I guess I'm fine with that. |
I have a follow-up patch to this one that defers relaxation to the linker when possible and picks the manual relaxation strategy that minimally impacts the hot section.
Making register allocation understand x16 clobbering would be a neat idea; I'd definitely like to explore it when fine-tuning MFS. Thanks for the review!
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
299 |
Since frame finalization happens before MachineFunctionSplitter/BBSections run, there isn't any way to tell what blocks may be hot or cold at this point. We could hypothetically say "officially, MFS / bbsections are unsupported with Red zone, but they do work 99% of the time" and minimize pushing the stack pointer, but that sounds hairy. AFAICT, disabling red zone doesn't affect those that use MFS / bbsections. The AArch64 ABI Procedure Call Standard specifies that not writing past the stack pointer is a universal constraint and doesn't mention the red zone at all (although Apple and Windows platforms still respect it). |
Make the tests a bit less brittle. This way the tests will better accommodate future optimization.
I think it would be better to just fix the scavenger to tolerate empty blocks.