This patch estimates the cost of the generated runtime checks and
relaxes the limit on the number of runtime checks, if the cost of the
runtime checks is a small fraction (0.5% of the expected scalar loop
runtime). The threshold (and other details) are not set in stone yet
and requires further benchmarking/analysis. Also, ExpectedTC returns
the max of the induction variable for loops without known constant trip
counts, which means we largely overestimate the cost of loops with
variable trip counts.
The current version also keeps a hard limit of 2 *
NumRuntimePointerChecks, but that also needs a better look.
If the general direction is agreed upon, I will hash out the final
Fixes PR44662 (modulo potential adjustments for unknown trip counts)