This patch addresses the following issue that came up in the context of RISC-V benchmarks, but which affects other targets. Suppose you have several loads/stores that access array elements or struct fields with large offsets:
void foo(int *x, int *y) { y[0] = x[0x10001]; y[1] = x[0x10002]; y[2] = x[0x10003]; ... }
In a target such as RISC-V you cannot add 0x10001 to the address of X in a single instruction (the constant doesn't fit the 12-bit signed immediate), so the generated code is more directly reflected by something like this:
void foo(int *x, int *y) { y[0] = *(x+0x10000+1); y[1] = *(x+0x10000+2); y[2] = *(x+0x10000+3); ... }
But you can fold the +1, etc. into an immediate offset of the load/store instructions, so you are able to effectively have something like this:
void foo(int *x, int *y) { int *base = &x[0x10001]; y[0] = base[0]; y[1] = base[1]; y[2] = base[2]; ... }
That optimization is only able to be performed, though, if the +1, +2, etc. are split from the 0x10000. Fortunately, there is already a target hook that indicates we want such an address split to occur: shouldConsiderGEPOffsetSplit. When that hook returns true, CodeGenPrepare.cpp adds the GEPs with large offsets to a list of GEPs to be split and ::splitLargeGEPOffsets splits them, in a process clearly illustrated in that method's comments. Unfortunately, the split currently only occurs when the base and the GEP are in different BBs, since the DAGCombiner would just recombine those in the same BB anyway.
This patch intends to:
- make the split also occur in the cases where the base and the GEP are in the same BB (that's often the case);
- ensure that the DAGCombiner doesn't reassociate them back again.
To achieve that second step the patch adds a check before the reassociation of add instructions to see if the sum is used by loads or stores and if reassociating could break a reg+imm addressing mode for those loads/stores. This strategy seems to work, as shown in the tests.
A possible alternative would be to add a RISC-V specific pass to split the addresses, but solving this problem in a more generic fashion is probably preferable, as it avoids duplication of functionality and can benefit other targets.
(It might be possible to address https://bugs.llvm.org/show_bug.cgi?id=24447 by making the address mode checks more stringent for X86, etc.)
Doing some archaelogy.
Should this be checking the uses of the (add, (add, x, offset1), offset2)) expression? It seems to be checking the uses of (add x, offset1).