Hi Tim,
In SelectionDAG builder, we simply insert ZERO_EXTEND for CopyToReg if only TLI thinks ZERO_EXTEND is free.
This is not optimal if the virtual register has multiple compare instruction users crossing basic blocks.
The operand of a compare instruction could be promoted (extended) to be a signed/unsigned value if the predicate of the compare instruction IsSigned()/IsUnsigned. If multiple compare instructions use the same operand, we will unavoidably have conflict extension.
We have two optimization opportunities here,
- If we know the number of signed predicate user is greater than unsigned, we prefer to use signed promotion, and we use zero promotion otherwise.
- Predicates EQ and NE are neither signed nor unsigned, so they can be treated as either signed or unsigned. if know the incoming value is AssertSext, we should prefer to do signed promotion.
With those two optimizations, fewer signed/zero extension instructions can be inserted, and then we can expose more opportunities to Machine CSE pass in back-end.
For Cortex-A57, the following two benchmark performance improvements are observed.
spec.cpu2006.ref.470_lbm 1862.8973 -6.85%
spec.cpu2006.ref.444_namd 1549.8160 -5.43%
Thanks,
-Jiangning
I think this can be generalized slightly. It looks like the check against TRUNCATE is really just guarding against us deciding to apply sign extension to (assertsext LHS, i16) when we're actually emitting an i8 setcc.
I'd suggest something like