Make sure every conditional branch constructed by LoopUnrollRuntime code sets branch weights.
- Add new 1:127 weights for the conditional jumps checking whether the whole (unrolled) loop should be skipped in the generated prolog or epilog code.
- Remove updateLatchBranchWeightsForRemainderLoop function and just add weights immediately when constructing the relevant branches. This leads to simpler code and makes the code more obvious as every call to CreateCondBr now has a BranchWeights parameter.
- Rework formula for epilogue latch weights, to assume equal distribution of remainders and remove assert (as I was able to reach this code when forcing small unroll factors on the commandline).
nit: in the comment also explain why it's unlikely for loop to not enter unrolled version at all. Like does the unroll only happen with known trip count (which is bigger than unroll factor)?