In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Details
Diff Detail
Event Timeline
This looks good in general and easier than I suspected :)
In general though I'd like the method which gives you the clobber mask factored out of LiveIntervalAnalysis and moved into MachineBasicBlock (see comment below).
I looked shortly at the register allocators: It seems PBQP, Base, Greedy all use the infrastructure in LiveIntervalAnalysis to check clobbers, RegAllocFast seems to be querying clobbering operands manually and does not use the RegMaskSlots/RegMaskBits parts so it would need some adjustments to work correctly I believe.
lib/CodeGen/LiveIntervalAnalysis.cpp | ||
---|---|---|
225–226 | This is fine but I should go into a separate commit (which you can just commit without review). | |
239–240 | I guess this can go into a separate cleanup patch as well. | |
240 | I'm worried that we now have EH specific code in the LiveInterval analysis. I'd rather see this factored out into a MBB.getBeginClobberMask(). This will also make this functionality accessible to register allocators which do not use the RegMaskSlots/RegMaskBits acceleration logic in LiveIntervals. | |
244–249 | I'm confused by this part. Shouldn't the ReturnBlock end in a return instruction? How can there be successor blocks how can there be anything live across the return instruction that would require this clobbering? |
- Move BB mask logic out of line
lib/CodeGen/LiveIntervalAnalysis.cpp | ||
---|---|---|
225–226 | OK | |
244–249 | Funclet returns will have successor blocks. There are two cases: the funclet returns the address of the next block that the runtime should resume execution at, or the funclet is a cleanup that returns void and the runtime will run the next cleanup or keep unwinding. It is possible for the outermost cleanup funclet to have no successors, but it's also not interesting to put a clobber mask there, because the function will be over. |
LGTM.
The regalloc changes are fine. Adjusting FastRegAlloc can be done in a followup patch, something should obviously be done about the getNoPreservedMask() implementations, but we don't need further review for that.
include/llvm/CodeGen/MachineBasicBlock.h | ||
---|---|---|
383 | I would add an extra comment for each function, something like: /// Get the clobber mask for the begin of the basic block. Funclets use this ... /// Get the clobber mask for the end of the basic block. \see getBeginClobberMask() | |
lib/CodeGen/MachineBasicBlock.cpp | ||
1322 | As discussed. | |
lib/Target/ARM/ARMBaseRegisterInfo.h | ||
97 | There is no implementation for this included in this patch (same for the other Targets). |
Thanks for fixing this!
Should you also be removing the FIXME code in X86ISelLowering.cpp (~line 3445) that sets the noPreserved mask on invokes (with 32-bit funclet personalities)? I don't see that here.
I can't remove that yet because it also informs PEI that it needs to push all non-volatile registers in the prologue. We *don't* want that behavior for x64, because the runtime takes care of it for us, and redundantly saving all callee-saved XMM registers is expensive.
I think the right way to handle that is to put a nothing-preserved mask on EH_RESTORE instructions, which corresponds to where control rejoins an outer scope after a CATCHRET. That way, 32-bit functions using only cleanups will not need to spill all CSRs.
lib/Target/ARM/ARMBaseRegisterInfo.h | ||
---|---|---|
97 | The implementations already exist, I only made the method virtual. I'll make the default behavior be fatal error. |
Oops -- I didn't realize that the 32-bit runtime doesn't restore them for us on "catchret". What you say makes sense, thanks for explaining.
I would add an extra comment for each function, something like: