This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Reduce load widths of shifted masks
ClosedPublic

Authored by samparker on Aug 8 2018, 3:43 AM.

Details

Summary

During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset.

Diff Detail

Repository
rL LLVM

Event Timeline

samparker created this revision.Aug 8 2018, 3:43 AM
RKSimon added inline comments.Aug 8 2018, 8:10 AM
test/CodeGen/ARM/and-load-combine.ll
44 ↗(On Diff #159674)

I'd recommend reverting these changes from diff - you can avoid the problem by moving all the arguments to the same line

samparker updated this revision to Diff 159866.Aug 9 2018, 12:47 AM

Moved arguments to occupy a single line.

dnsampaio added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9094 ↗(On Diff #159866)

ActiveBits can be zero, this might throw an error no?
and (load i32, 0x000FF000)
This is a shifted mask. You could use APInt.countPopulation as to get the number of 1s.

9094 ↗(On Diff #159866)

Ignore. u do lsrh.

john.brawn added inline comments.Aug 14 2018, 4:36 AM
test/CodeGen/ARM/and-load-combine.ll
7 ↗(On Diff #159866)

It's not clear to me what the purpose of these test changes are (or the previous version). If I revert these changes then the test still pases.

test/CodeGen/X86/fp128-i128.ll
44 ↗(On Diff #159674)

r338821 made changes to this test which means the patch fails on this file.

samparker updated this revision to Diff 160560.Aug 14 2018, 5:12 AM

Rebased and updated changes to the x86 codegen tests.

test/CodeGen/ARM/and-load-combine.ll
7 ↗(On Diff #159866)

It's just because I've used the update_llc script and the format was different. Moving the arguments onto a single line removed the unnecessary diffs produced by the script.

john.brawn requested changes to this revision.Aug 15 2018, 7:42 AM
john.brawn added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9232 ↗(On Diff #160560)

On big-endian targets ShAmt has been adjusted by the time we get here, in which case the shifts we do here are wrong, e.g. many of the tests you've added are checking that in big-endian the load is eliminated which is not what should be happening.

This revision now requires changes to proceed.Aug 15 2018, 7:42 AM
samparker added inline comments.Aug 16 2018, 3:09 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9232 ↗(On Diff #160560)

Ah! Thanks, I'm sure BE always trips me up here.

samparker updated this revision to Diff 160993.Aug 16 2018, 3:44 AM

Re-adjusted ShAmt for big endian targets.

This revision is now accepted and ready to land.Aug 20 2018, 9:51 AM
This revision was automatically updated to reflect the committed changes.
fpichet reopened this revision.Jan 13 2019, 7:37 PM
fpichet added a subscriber: fpichet.

I recently resynced an out of tree backend and I got a miscompile because of this commit.
My target is Big Endian.
The C code is :

void swap(unsigned *ptr) {
  *ptr = (*ptr & 0x0000ff00 ) << 8;		
}

The IR is

%0 = load i32, i32* %ptr, align 4
%and = and i32 %0, 65280
%shl = shl i32 %and, 8
store i32 %shl, i32* %ptr, align 4

The resulting DAG->dump() after the function ReduceLoadWidth returned will be:

t5: i32,ch = load<(load 4 from %ir.ptr)> t0, t2, undef:i32, main.c:8:4
t6: i32 = Constant<65280>
  t12: i32 = add nuw t2, Constant:i32<2>, main.c:8:4
t13: i32,ch = load<(load 1 from %ir.ptr + 2, align 2), zext from i8> t0, t12, undef:i32, main.c:8:4
    t7: i32 = and t13, Constant:i32<255>, main.c:8:4
  t9: i32 = shl t7, Constant:i32<8>, main.c:8:4
t10: ch = store<(store 4 into %ir.ptr)> t13:1, t9, t2, undef:i32, main.c:8:4

There is a missing shl by 8 missing. (yes there should be 2 shl to be combined later)

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9495

I think this line should be:
SDValue Shifted = DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),

This revision is now accepted and ready to land.Jan 13 2019, 7:37 PM

Hi,

Could you please describe more accurately what you are expecting to see? I'm failing to see the issue.

For example:

given test.ll:
define dso_local void @swap(i32* %ptr) #0 {
entry:

%0 = load i32, i32* %ptr, align 4
%and = and i32 %0, 65280
%shl = shl i32 %and, 8
store i32 %shl, i32* %ptr, align 4
ret void

}

llc -mtriple=armv7-linux-gnu < test.ll
will give:

ldrb    r1, [r0, #1]
lsl     r1, r1, #8   // 8
str     r1, [r0]
bx      lr

it should be:

ldrb    r1, [r0, #1]
lsl     r1, r1, #16  // 16
str     r1, [r0]
bx      lr

Thanks. Yes, I think this is to do with the way I try to insert the new shl node, the DAG is just returning the existing node hence we only have one shl node. Should have a patch up tomorrow.

samparker updated this revision to Diff 181768.Jan 15 2019, 3:46 AM

Changed the method of creating the final shl.

fpichet accepted this revision.Jan 15 2019, 7:42 AM

I tried your patch and the bug is gone. Thanks.

This revision was automatically updated to reflect the committed changes.