During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset.
Details
- Reviewers
spatel RKSimon niravd efriedma javed.absar john.brawn fpichet - Commits
- rGdd8cd6d26b17: [DAGCombine] Fix ReduceLoadWidth for shifted offsets
rG597811e7a754: [DAGCombiner] Reduce load widths of shifted masks
rL351310: [DAGCombine] Fix ReduceLoadWidth for shifted offsets
rL340261: [DAGCombiner] Reduce load widths of shifted masks
Diff Detail
Event Timeline
test/CodeGen/ARM/and-load-combine.ll | ||
---|---|---|
44 | I'd recommend reverting these changes from diff - you can avoid the problem by moving all the arguments to the same line |
test/CodeGen/ARM/and-load-combine.ll | ||
---|---|---|
7 | It's not clear to me what the purpose of these test changes are (or the previous version). If I revert these changes then the test still pases. | |
test/CodeGen/X86/fp128-i128.ll | ||
44 ↗ | (On Diff #159674) | r338821 made changes to this test which means the patch fails on this file. |
Rebased and updated changes to the x86 codegen tests.
test/CodeGen/ARM/and-load-combine.ll | ||
---|---|---|
7 | It's just because I've used the update_llc script and the format was different. Moving the arguments onto a single line removed the unnecessary diffs produced by the script. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9232 | On big-endian targets ShAmt has been adjusted by the time we get here, in which case the shifts we do here are wrong, e.g. many of the tests you've added are checking that in big-endian the load is eliminated which is not what should be happening. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9232 | Ah! Thanks, I'm sure BE always trips me up here. |
I recently resynced an out of tree backend and I got a miscompile because of this commit.
My target is Big Endian.
The C code is :
void swap(unsigned *ptr) { *ptr = (*ptr & 0x0000ff00 ) << 8; }
The IR is
%0 = load i32, i32* %ptr, align 4 %and = and i32 %0, 65280 %shl = shl i32 %and, 8 store i32 %shl, i32* %ptr, align 4
The resulting DAG->dump() after the function ReduceLoadWidth returned will be:
t5: i32,ch = load<(load 4 from %ir.ptr)> t0, t2, undef:i32, main.c:8:4 t6: i32 = Constant<65280> t12: i32 = add nuw t2, Constant:i32<2>, main.c:8:4 t13: i32,ch = load<(load 1 from %ir.ptr + 2, align 2), zext from i8> t0, t12, undef:i32, main.c:8:4 t7: i32 = and t13, Constant:i32<255>, main.c:8:4 t9: i32 = shl t7, Constant:i32<8>, main.c:8:4 t10: ch = store<(store 4 into %ir.ptr)> t13:1, t9, t2, undef:i32, main.c:8:4
There is a missing shl by 8 missing. (yes there should be 2 shl to be combined later)
llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9252 ↗ | (On Diff #161681) | I think this line should be: |
Hi,
Could you please describe more accurately what you are expecting to see? I'm failing to see the issue.
For example:
given test.ll:
define dso_local void @swap(i32* %ptr) #0 {
entry:
%0 = load i32, i32* %ptr, align 4 %and = and i32 %0, 65280 %shl = shl i32 %and, 8 store i32 %shl, i32* %ptr, align 4 ret void
}
llc -mtriple=armv7-linux-gnu < test.ll
will give:
ldrb r1, [r0, #1] lsl r1, r1, #8 // 8 str r1, [r0] bx lr
it should be:
ldrb r1, [r0, #1] lsl r1, r1, #16 // 16 str r1, [r0] bx lr
Thanks. Yes, I think this is to do with the way I try to insert the new shl node, the DAG is just returning the existing node hence we only have one shl node. Should have a patch up tomorrow.
ActiveBits can be zero, this might throw an error no?
and (load i32, 0x000FF000)
This is a shifted mask. You could use APInt.countPopulation as to get the number of 1s.