This is a follow up to D28759 and together with that commit fixes
all (maybe all, pending another look at the benchmarks) of
the benchmarks that regressed due to rL278321 (while keeping the
performance enhancements in cases where rL278321 was beneficial).
Prior to this commit, the analysis would simply ignore any function
calls for the clearance calulation, causing incorrect results after
any function call (for the benchmarks that regressed rL278321 just
happened to pick a register that was worse than the xmm0 default).
With this patch, we kill clearance for all registers when a function
call occurs.
Similarly, we kill clearance at function entry. This is by far the more
disruprive of the two cases, but is necessary to avoid 2x penalty in some
common cases. The most obvious case where this happens is calling a
small-ish non-inlined function in a loop. If the function uses xmm
registers that are not live-ins, it is likely to fall into this performance
trap, if we don't consider clearance small on live-ins.
This is obviously more pessimistic than reality in a lot cases. However,
the combination of the immense penalty for not having the dependency
breaking instruction (3-5x), together with the fact that these instructions
are extremely cheap (they are special cased in the decoder, AFAIK, so
don't even take up an execution unit).
I was about to upload a review for this issue :)
It solves at least 2 bugzillas:
https://llvm.org/bugs/show_bug.cgi?id=25277
https://llvm.org/bugs/show_bug.cgi?id=27573
One of them is marked as duplicate of other issues, so it will solve those too probably.
You should mention these bugzillas in your commit message.
In my version I had some small refactoring to this code section, feel free to adopt it if you like it: