The goal here is to simplify funnel shifts based on demanded or known bits. There are three parts to this change:
- In InstCombineSimplifyDemanded, determine which bits in the operands are demanded based on the demanded bits of the result. Also determine which bits in the result are known based on known bits in the operands. This only works for known shift amount. This is the primary change.
- SimplifyDemanded may replace operands of the funnel shift with undef. As such InstCombineCalls is taught to replace funnel shifts with one undef operand with either a shl or lshr. This is also limited to known shift amounts. In principle this can also be applied to variable shamt, but as this would require handling modular reduction, as well as zero shifts, it doesn't seem like a clear win.
- Finally, the changes in InstructionSimplify are added to consistently handle all undef operands, including on shamt and on both inputs.
The background for this patch is https://github.com/rust-lang/rust/issues/56009, where a performance regression due to the switch to funnel shift intrinsics in Rust was reported. Unfortunately, this issue is not resolved by this patch (see final test cases in fsh.ll), because this would require simplifying a fsh with multiple users. There's special code in SimplifyMultipleUseDemandedBits that handles this for and/or/xor, but adding fsh to that list would require creating new shl/lshr instructions, which is not necessarily a win. Any ideas on how that case could be handled?
This probably could use a in-code comment about the reason why this replacing the funnel shifts with one undef operand to just a shift, so that somebody else wouldn’t replace it with a straight fshl(X, undef, C) -> undef (which would make a lot of sense in isolation) down the road.
Ditto for similar replacement just below.