This is a large patch, and it's not quite ready yet but I'd like to put it up for review now to get some feedback on the approach.
Instead of building the binary search tree and discovering jump tables and bit tests as we go along, this patch extract such parts (CaseClusters) from the switch first (FindJumpTables and FindBitTestsClusters), and then builds a binary tree around those clusters afterwards (work-list driven algorithm in visitSwitch).
This has several benefits. Most importantly, the binary tree will not be skewed because we're trying to find jump tables - it will always be balanced. We could also change how we'd like to balance it, e.g. using profile info to balance on branch frequency, causing hot cases to be closer to the root.
Secondly, decoupling the cluster extraction mechanism from tree building allows us to use other algorithms. For jump table finding, I'm using a dynamic programming approach which is basically Kannan & Proebsting's algorithm referenced in Anton's original switch lowering paper (http://llvm.org/pubs/2007-05-31-Switch-Lowering.pdf, Ref 6). I believe that's optimal, which is cool, but it's also O(n^2) in all cases, whereas the previous algorithm was only O(n^2) in worst case and usually faster. I haven't measured the impact yet, but if it's bad we could cap the search at a certain table size, or use a different approach.
The code to extract bit test clusters uses the same approach. I haven't really thought about this one much, it might be better to do something simpler.
Also, this patch makes us bypass all the fancy lowering logic in -O0 mode, which should make that faster.
I'd be very happy to hear your thoughts on this.