Add a TLI hook to allow SelectionDAG to fine tune the conversion of CTPOP to a chain of "x & (x - 1)" when CTPOP isn't legal. The tuning in particular needs close review. I'm sure it could be refined.
Also, I'm not an expert at LLVM, but it seems like InstCombine is missing boundary optimizations that should have been caught before SelectionDAG. In particular:
ctpop(x) > 0 --> x != 0
ctpop(x) > (any size >= element size) --> always false
ctpop(x) < (any size >= element size + 1) --> always true
Any pointers on how/where to implement the above?