InstCombine does the opposite fold, in hope that C l>>/<< Y expression
will be hoisted out of a loop if Y is invariant and X is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.
Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.
Special care needs to be applied not to disturb x86 BT / hexagon tstbit
instruction pattern. And to not get into endless combine loop.
From what i can tell,
- PPC changes are all good
- AMDGPU neutral
- AArch64 good except vectors (pattern with lshr improves, but shl symmetrically degrades)
- ARM all good, at least in general?
- X86 look ok, the immediate gets encoded int test instruction (vectors are a mess regardless)