Pass the successor-iterator to getEdgeProbability so that it doesn't need to do a linear search of the successor list to calculate the index. This reduces an operation that was O(n^2) time to O(n) time, where n is the number of successors to the basic block.
In my WebAssembly VM that uses LLVM to generate machine code, I was running into this on a test case with a large switch from the WebAssembly reference interpreter test suite: https://github.com/WebAssembly/spec/blob/master/test/core/br_table.wast#L110
To find the bottleneck, I increased the number of cases in the switch statement further, which caused it to spend ~13 seconds in LLVM code generation. This optimization reduced the code generation time to tens of milliseconds.