If ExpensiveCombines is enabled (which is the case with -O3 on the legacy PM and always on the new PM), InstCombine tries to compute the known bits of all instructions in the hope that all bits end up being known. This is the most expensive individual part of InstCombine.
How effective is it? If we add some statistics on how often the constant folding succeeds and how many KnownBits calculations are performed and run test-suite we get:
"instcombine.NumConstPropKnownBits": 642, "instcombine.NumConstPropKnownBitsComputed": 18744965,
In other words, we get one fold for every 30000 KnownBits calculations. However, the truth is actually much worse: Currently, known bits are computed before performing other folds, so there is a high chance that cases that get folded by known bits would also have been handled by other folds.
What happens if we compute known bits after all other folds (hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?
"instcombine.NumConstPropKnownBits": 0, "instcombine.NumConstPropKnownBitsComputed": 18105547,
So it turns out despite doing 18 million known bits calculations, the known bits fold does not do anything useful on test-suite. I was originally planning to move this into AggressiveInstCombine so it only runs once in the pipeline, but seeing this, I think we're better off removing it entirely.
As this is the only use of the "expensive combines" mechanism, it may be removed afterwards, but I'll leave that to a separate patch.