When outlining optional branches, an early exit condition will make the
rest of the function optional. If we outline in favor of this exit
block, it can place the return block in the middle of the function. In 3
very different benchmarks, I found that not outlining in favor of the
return block would fix the performance regression. Two of the benchmarks
are in the test-suite:
MultiSource/Applications/lambda-0.1.3/lambda.test
SingleSource/Benchmarks/fib2
lambda exhibits far worse instruction decode cache with outlining
enabled and without this patch.
fib2 exhibits worse branch prediction, which by itself isn't an argument
for this patch, but it is another point in a pattern.
The internal benchmark that slowed down was basically a large function
where incorrect exit placement created additional icache misses
Taken together these three data points, along with the absence of any
obvious regressions in the test-suite with outlining enabled vs this
patch suggest that this is a reasonable heuristic.