Hi Jingyue and other reviewers,
I investigated the SeparateConstOffsetFromGEP pass Jingyue added on May this year and find it is also beneficial for the address calculation in AArch64 backend. So we want to enable it in AArch64 backend. But before that, I want to fix some problems.
I find there are some run time failures in llvm test-suite benchmark sqlite3. They are caused by following two reasons:
(1) "int64 modulo uint64" generates an incorrect result. See test case "@sign_mod_unsign_bug" in my patch. This is fixed by a cast from uint64_t to int64_t
(2) The OR instruction may generate incorrect result. See the test case "@test_or_bug". This is fixed with the improvement described later.
Also, I find an opportunity to improve this pass. As your comments say, currently it can only find one constant in one index. For the index having several constants like " (a + 4) + (b + 1)", enabling instcombine may not always work as following reasons:
(1) We find some passes may generate such cases after instcombine. E.g. We find the CFGSimplifyPass can even generate index from two constants like "a = 1 + 2", which the current SeparateConstOffsetFromGEP pass can only find constant 1. Also, there are many cases like "(a + 4) + 1" or even more complex.
(2) The ADD-OR situation like the test case "@test_or_bug" will not be optimized by instcombine. We can only find constant 1 in "(a<<2 + 4) | 1" and ignore constant 4.
This patch improves this pass to handle such situations. I don't modify too much code. As the current logic can work well to find one constant in one index, I just add some code to enable it to find more constants in one index. Previously, it finds a constant in either operand of a binary instruction, now it can find the constants in both operands.
I tested this patch on llvm test-suite, now all the benchmarks can pass. Also spec cpu2000 and spec cpu2006 are tested. I don't know what benchmarks you tested on NVPTX backend, But I think this improvement can benefit NVPTX as well. At least it has no regression.
Review please.
Thanks,
-Hao
s/outputs/returns/