SimplifyDemandedBits currently early-outs for multi-use values beyond the root node for simplification (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits.
@lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern.
Why not start with the early returns?