Fixes:
- When rebuilding new indices, s/zext should be distributed to sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not sext(a + b) + 5. This also affects the logic of recursively looking for a constant offset, we need to include s/zext into the context of the searching.
- Function find should return the bitwidth of the constant offset instead of always sign-extending it to i64.
- Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP indices to pointer-size before computing the address. Therefore, gep base, zext(a + b) != gep base, a + b
Improvements:
- Add an optimization for splitting sext(a + b): if a + b is proven non-negative (e.g., used as an index of an inbound GEP) and one of a, b is non-negative, sext(a + b) = sext(a) + sext(b)
- Function Distributable checks whether both sext and zext can be distributed to operands of a binary operator. This helps us split zext(sext(a + b)) to zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow.
Refactor:
- Merge some common logic of handling add/sub/or in find.
Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes we made.
Are XOR and AND handled properly? OR is limited to cases where it is equivalent to add so no problem there, but it seems like hoisting a constant out of a XOR or AND expression would be problematic.