Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).
https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.
There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:
Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions.
This patch implements sinking for these value materialization
instructions, by iterating over the RegFixups map. Usually there
aren't a lot, and building clang in Debug mode using a patched
clang showed a barely-above-the-noise 0.5% increase in build time.
The benefit to reducing spills and restores also exists; I saw a
0.3% reduction in code size for that Debug build of clang.
The original source locations were lost as part of emitting these
instructions to the local value area, but the source location of
the first use seems like a very good second best choice. I saw
another 0.3% reduction in the number of instruction bytes with
line-0 attributions after this patch; the benefit is greater
than that because only some of these instructions had been
attributed to line 0 before the patch.
One interesting effect is that some of these value instructions
previously ended up intermixed with prologue instructions (e.g.
stack homing for parameters); now that they have proper source
locations, they reliably come after the prologue.
This looks like it's reverting rGa28e767f06d. This code was added to avoid O(n^2) complexity. Do you have an alternative solution to that issue?