Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.
In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).
This looks like a natural generalization of "bool Instruction::extractProfMetadata(uint64_t &TrueVal, uint64_t &FalseVal) const". Why don't we extend existing implementation of "extractProfMetadata" to handle more than 2 operands and overloaded existing API with "bool Instruction::extractProfMetadata(SmallVectorImpl<uint64_t> &) const"?
PS: Looks tempting to place total weight as a first element in the resulting vector. WDYT?