Change the copy tracker to keep a single map of register units instead
of 3 maps of registers. This gives a very significant compile time
performance improvement to the pass. I measured a 30-40% decrease in
time spent in MCP on x86 and AArch64 and much more significant
improvements on an out of tree target with more registers.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/MachineCopyPropagation.cpp | ||
---|---|---|
133–136 ↗ | (On Diff #166531) | This feels odd now, as in the new code you say Avail = false instead of removing elements from the Copies set... Would it make sense to maintain a counter of entries with Avail == true for this? |
148–155 ↗ | (On Diff #166531) | Is it correct to only check the first register unit here? Isn't it possible that only some of the units gots clobbered in the meantime, while the first one still makes it seem like the copy is available? |
lib/CodeGen/MachineCopyPropagation.cpp | ||
---|---|---|
133–136 ↗ | (On Diff #166531) | It's not really worth the bookkeeping to keep track of the number of available copies, but this is used to shortcut out of forwardUses and it's measurably slower if we don't do it at all. I've renamed this to hasAnyCopies so it's more obvious what it actually checks. |
148–155 ↗ | (On Diff #166531) | Yes, as discussed offline this is correct. When we clobber units we end up marking registers unavailable based on the original instruction's defs (not just the unit itself), so we won't hit any problems here. |
I'm convinced now the clobbering works correctly because as clobberRegister does not just clobber the register at hand but the whole destination register of the copy touched by the COPY at hand.
LGTM (this time with the right review :)