If ExpensiveCombines is enabled (which is the case with -O3 on the legacy PM and always on the new PM), InstCombine tries to compute the known bits of all instructions in the hope that all bits end up being known. This is the most expensive individual part of InstCombine.
How effective is it? If we add some statistics on how often the constant folding succeeds and how many KnownBits calculations are performed and run test-suite we get:
"instcombine.NumConstPropKnownBits": 642, "instcombine.NumConstPropKnownBitsComputed": 18744965,
In other words, we get one fold for every 30000 KnownBits calculations. However, the truth is actually much worse: Currently, known bits are computed before performing other folds, so there is a high chance that cases that get folded by known bits would also have been handled by other folds.
What happens if we compute known bits after all other folds (hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?
"instcombine.NumConstPropKnownBits": 0, "instcombine.NumConstPropKnownBitsComputed": 18105547,
So it turns out despite doing 18 million known bits calculations, the known bits fold does not do anything useful on test-suite. I was originally planning to move this into AggressiveInstCombine so it only runs once in the pipeline, but seeing this, I think we're better off removing it entirely.
As this is the only use of the "expensive combines" mechanism, it may be removed afterwards, but I'll leave that to a separate patch.
I believe this test case was the original motivation for having this fold.
However, I thinks this should be handled by InstCombineSimplifyDemanded, which we invoke in cases where we have a reasonable expectation of either demanded bits or known bits simplifications to occur (such as an "and" root, as is the case here). SimplifyDemanded currently doesn't handle this case due to what looks like an implementation bug to me: While normally SimplifyDemanded computes known bits for instructions it doesn't handle itself, it does not do so for some instructions it only partially handles (e.g. it handles a constant shift amount, but does not compute known bits if the shift amount is not constant).