This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][InstrRef] Recover some performance by calculating IDF for register units instead of all their alises
ClosedPublic

Authored by jmorse on Oct 12 2021, 3:23 AM.

Details

Summary

In D110173 we start using the existing LLVM IDF calculator to place PHIs as we reconstruct an SSA form of machine-code program. Sadly that's slower than the old (but broken) way, this patch attempts to recover some of that performance.

The key observation: every time we def a register, we also have to def it's register units. A refresher on units is here [0]. If we def'd $rax, in the current implementation we independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and they will all have the same PHI positions. Instead of doing that, we can calculate the PHI positions for {al, ah} and place PHIs for any aliasing registers in the same positions. Any def of a super-register has to def the unit, and vice versa, so this is sound. It cuts down the SSA placement we need to do significantly.

There are a few escape hatches: stack slots don't have subregisters (yet), so they have to be handled independently. There are also scenarios where we only ever read from registers (such as arguments), and so never track their sub-registers or register units. In both of these cases, this patch just falls back to placing PHIs "normally". There's also a small amount of juggling to do with the stack pointer: we ignore defs of the stack pointer at calls, so register masks and SP defs get ignored. Now that we're placing PHIs for register units on top of the super-registers they alias, we need to be careful to not def the stack pointer register units either. Thus, there's now a collection of "aliases of the stack pointer" in MLocTracker that shouldn't be def'd by calls either.

Testing: because this is performance related, no tests, but this builds clang-3.4 with identical object files after being applied (save for the object file with the date/time in it).

[0] https://lists.llvm.org/pipermail/llvm-dev/2012-May/050177.html

Diff Detail

Event Timeline

jmorse created this revision.Oct 12 2021, 3:23 AM
jmorse requested review of this revision.Oct 12 2021, 3:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 3:23 AM
jmorse added inline comments.Oct 12 2021, 3:48 AM
llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
1819–1820

NB: this seems to be the canonical way of enumerating register unit... registers? in the rest of LLVM. What MCRegUnitIterator produces doesn't seem to necessarily correspond to a single register, therefore there's an additional process of enumerating the roots too.

Either way, it works for x86, and this pattern is used elsewhere in llvm. So It Must Be Right (TM).

Orlando accepted this revision.Oct 12 2021, 4:25 AM

LGTM! Plus a couple of inline nits.

Testing: because this is performance related, no tests, but this builds clang-3.4 with identical object files after being applied (save for the object file with the date/time in it).

Out of interest do you have any numbers for the positive performance impact of this patch?

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
1819–1820

Thanks for the explanation. Nit: Please can you rename URoots to URoot?

1833

nits: If you're also stuffing (arg) registers in too, maybe SlotsToPHI should be renamed. Something like LocsToPHI or LocsWithoutRegUnits?
Also is this a dev comment that leaked into the patch?

This revision is now accepted and ready to land.Oct 12 2021, 4:25 AM

Out of interest do you have any numbers for the positive performance impact of this patch?

Loss from PHI installation:
http://llvm-compile-time-tracker.com/compare.php?from=9b9e6f1d4a7b15d5a4a076cd7e9dfbe3cc9afce6&to=c039d62f7f2bbea4cca09be7d2355700e972f9e6&stat=instructions

Gain from this patch:
http://llvm-compile-time-tracker.com/compare.php?from=7ee343ecae52ea3cd843e508e322b6b10c649a56&to=bc07cf76a31fa769786d6631fe25586bd1396c34&stat=instructions

They're not directly comparable numbers because inbetween is the variable-location PHI installation patch, which adds additional performance loss. The full set of patches are here under the "use-ssa-phi-placement3" branch, unfortunately the commit messages are written in Jeremy-speak so might not be intelligible. The order goes:

  • Turn on instr-ref
  • Install 2 PHI placement patches
  • This patch
  • Another vloc performance patch
  • 2 attempts to improve performance further with corresponding reverts
  • A final vloc performance patch that actually works,
  • Another patch that didn't deliver any gains.

https://llvm-compile-time-tracker.com/?config=NewPM-ReleaseLTO-g&stat=instructions&remote=jmorse

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
1833

yes and yes

Thanks for the info

This revision was landed with ongoing or failed builds.Oct 13 2021, 8:08 AM
This revision was automatically updated to reflect the committed changes.