Tries to perform
(Xshr (add (i2^n X), (i2^n Y)), 2^(n-1))
-> (icmp ult (add (trunc X), (trunc Y)), X)
where
-> (llvm.uadd.with.overflow (trunc X), (trunc Y)).overflow- Only the K leading bits of X and Y can be non-zero.
where- The add is only used by the shr, or by iK (or narrower) truncates.
- Xshr can be ashr or lshr- The lshr type has more than 2 bits (other types are boolean math).
- X and Y have at least 2^(n-1) leading zeroes.> 1
This seems to be a pattern that just comes from OpenCL front-ends, so adding DAG/GISel combines doesn't seem to be worth the complexity.
Original patch D107552 by @abinavpp - adapted to use (a + b < a) instead of uaddo following discussion on the review.
See this issue https://github.com/RadeonOpenCompute/ROCm/issues/488