When CTLZ is first promoted from i16 to i32 by type-legalization, and then again from i32 to i64 during Legalization, two unfolded subtractions of constants will remain in output.
This was discussed on llvm-dev, see http://lists.llvm.org/pipermail/llvm-dev/2019-February/thread.html#130115. Sanjay pointed out that the reason for this missed optimization is that DAGCombiner::visitTRUNCATE() is restricted to handle this pre-legalize only.
As an alternative to enabling the DAGCombiner post-legalize also, this is a patch that addresses this issue directly by detecting this case during promotion of CTLZ and propagating the constant to the present add (of negative constant) instead of always creating a new sub.
Does this look reasonable? This is NFC on SystemZ/SPEC, but improves tests both on SystemZ and AMDGPU.