This is an archive of the discontinued LLVM Phabricator instance.

RegAlloc: do not consider liveins to EH-pad successors as liveout.
ClosedPublic

Authored by t.p.northover on Apr 29 2021, 5:36 AM.

Details

Summary

RegAllocFast is looking at the successors' liveins to determine the initial liveouts of the block being allocated. Unfortunately if the block ends with an invoke, one of these is the landingpad which really gets its liveins from the runtime, not us.

This leads to registers being reserved in the block until they're fully clobbered by an instruction in reverse-order. In particularly tight situations on x86 (base pointer, lots of arg registers, and al used for varargs purposes, a _NOREX regclass being allocated) we can end up with nothing spillable and fail to allocate.

Also, it turns out that RegUnit 0 is valid, which threw me for a while when I was relying on dumpState and AH was nowhere to be found. So this fixes the dumper too.

Diff Detail

Event Timeline

t.p.northover created this revision.Apr 29 2021, 5:36 AM
t.p.northover requested review of this revision.Apr 29 2021, 5:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 5:36 AM
This revision is now accepted and ready to land.Apr 29 2021, 10:55 AM

Thanks, committed as 438a63e13bf8.

t.p.northover reopened this revision.Apr 29 2021, 12:02 PM

And now I've reverted it. I was musing some more and it occurred that some values can come from this block (e.g. any SSA value we produce except from the call). It's only the landingpad defs that are from the runtime. I'll work on a more targeted patch.

This revision is now accepted and ready to land.Apr 29 2021, 12:02 PM

I was musing some more and it occurred that some values can come from this block

Good point, I should have thought of this!

Right, second try. I think the TargetLowering functions return 0 (which is never a real register) when they're not set, so this should do the trick.

efriedma added inline comments.
llvm/lib/CodeGen/RegAllocFast.cpp
1455

Calling getExceptionPointerRegister() directly from fast regalloc seems hacky. We don't want to copy-paste this code everywhere that cares about physical register live-outs.

t.p.northover added inline comments.May 5 2021, 5:55 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1455

I could add a MachineBasicBlock::liveouts that does the iteration. Simplest would probably be to return a SmallSet because of the potential issue with duplicates, but that doesn't quite fit with the liveins iterator-based interface. Otherwise it's probably an iterator adapter containing that SmallSet.

efriedma added inline comments.May 6 2021, 11:03 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1455

Probably most places that would use this can handle duplicates, so it would be fine to just implement an iterator over the raw list including duplicates.

I spent a bit of time trying to come up with some way to represent this that's a bit cleaner, but I didn't really have any good ideas. Maybe we could partition the "liveins" list in MachineBasicBlock to distinguish between registers that are live-out from a predecessor, and ones that are set by the unwinder? The advantages would just be that we don't have to query TLI, and MIR dumps would be a bit easier to understand.

Moved the iteration logic into a new MachineBasicBlock::livout_iterator, fairly ugly but not terrible.

t.p.northover added inline comments.May 11 2021, 7:05 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1455

Probably most places that would use this can handle duplicates, so it would be fine to just implement an iterator over the raw list including duplicates.

I suppose so, there may well be subregisters or overlapping ones floating about in the list anyway.

Maybe we could partition the "liveins" list in MachineBasicBlock to distinguish between registers that are live-out from a predecessor, and ones that are set by the unwinder?

I can't really think of a place that would be a useful distinction except when backcalculating liveouts as here, so I'm still inclined to keep the logic here.

Hang on, this one isn't quite right. Back with a revision soon.

Fixed the iterator logic to actually advance the BB, and skip over exception registers during construction. Tests had been infinite looping before, making it look a lot greener than it was.

Switch to -1 as a sentinel for when the caller pops the stack (i.e. C calling conv situation).

Ooops, uploaded a diff to wrong review. I'll put the diff back to how it was in case anyone finds this later and gets confused.

What I meant to say here was thanks! I've committed it as c1dc267258e0.

t.p.northover closed this revision.May 19 2021, 3:02 AM