This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX, LSV] Move the LSV optimization pass to later when the graph is cleaner
ClosedPublic

Authored by nouiz on Feb 11 2020, 2:14 PM.

Details

Summary

This allow it to recognize more loads as being consecutive when the load's address are complex at the start.

Diff Detail

Event Timeline

nouiz created this revision.Feb 11 2020, 2:14 PM
tra added inline comments.Feb 11 2020, 2:43 PM
llvm/test/CodeGen/NVPTX/vector-loads-complex.ll
12 ↗(On Diff #243996)

Couple of general notes:

I'd attempt to reduce IR to the bare minimum necessary to demonstrate/test intended behavior. This IR is not bad, but I think it may be reduced further to the point where we end up with just one vectorized 2-element load.

It's a good practice to 'anchor the checks with fixed start/end points

CHECK-LABEL: <function_name>

... # other checks on whatever happens inside the function.

CHECK: ret

It helps to delineate the search boundaries for FileCheck. Not a big deal now, as you only have one function, but I would not be surprised if more tests would be added over time.

14–18 ↗(On Diff #243996)

Negative matching is tricky. As written, FileCheck will only ensure that there's no ld.global between the fourth instance of ld.global.v2.u8 and the end of the file.

I'd start by attempting to place the checks close to the load statements they coalesce into. It's not always possible, but it probably works in this case.

As long as the number of loads is fixed/predictable, positively matching all of them makes negative matching unnecessary, so I'd just drop it.

nouiz updated this revision to Diff 244243.Feb 12 2020, 11:52 AM

I simplified the test. As it was way simpler, I added it to an existing test file.

tra accepted this revision.Feb 12 2020, 12:18 PM

Minor test nits. LGTM otherwise.
I'm curious, what does generated ptx for the function look before/after the patch.

llvm/test/CodeGen/NVPTX/vector-loads.ll
73

CHECK-LABEL

75

I'm not sure we need it. The check above verifies that the load did get vectorized, which was the purpose of this patch and the test.

Any condition when CHECK-NOT may fail (e.g. we're loading more than one, or it's an unrelated load/store) is unlikely to be relevant. It will also only look in-between ld.v2.u8 and ret, and such issues will not be detected before ld.v2.u8. If you want to ensure that ld.v2.u8 is the only relevant load, then you may need to have a separate test case which would fail if it's found more than one ld of any kind. Or you can add another CHECK-NOT: ld between the start of the function and CHECK: ld.v2.u8. I think that may work, but I'm not completely sure.

I'd just skip CHECK-NOT altogether.

92

CHECK: ld.v2.u8

98

CHECK: ret

This revision is now accepted and ready to land.Feb 12 2020, 12:18 PM

Before:

.visible .func foo_complex(

.param .b32 foo_complex_param_0

) // @foo_complex
{

.reg .b16       %rs<4>;
.reg .b32       %r<11>;

// %bb.0:

ld.param.u32    %r1, [foo_complex_param_0];
mov.u32         %r2, %tid.x;
mov.u32         %r3, %ctaid.x;
shl.b32         %r4, %r3, 9;
and.b32         %r5, %r4, -131072;
and.b32         %r6, %r4, 130560;
shl.b32         %r7, %r2, 1;
or.b32          %r8, %r6, %r7;
add.s32         %r9, %r1, %r5;
add.s32         %r10, %r9, %r8;
  • ld.u8 %rs1, [%r10+128]; ld.u8 %rs2, [%r10+129];** max.u16 %rs3, %rs1, %rs2; st.u8 [%r10+129], %rs3; ret; // -- End function

}

After:

.visible .func foo_complex(

.param .b32 foo_complex_param_0

) // @foo_complex
{

.reg .b16       %rs<4>;
.reg .b32       %r<11>;

// %bb.0:

ld.param.u32    %r1, [foo_complex_param_0];
mov.u32         %r2, %tid.x;
mov.u32         %r3, %ctaid.x;
shl.b32         %r4, %r3, 9;
and.b32         %r5, %r4, -131072;
and.b32         %r6, %r4, 130560;
shl.b32         %r7, %r2, 1;
or.b32          %r8, %r6, %r7;
add.s32         %r9, %r1, %r5;
add.s32         %r10, %r9, %r8;
  • ld.v2.u8 {%rs1, %rs2}, [%r10+128];** max.u16 %rs3, %rs1, %rs2; st.u8 [%r10+129], %rs3; ret; // -- End function

}

nouiz marked an inline comment as done.Feb 12 2020, 12:55 PM
nouiz added inline comments.
llvm/test/CodeGen/NVPTX/vector-loads.ll
73

Here I did as the other tests in this file. I'll fix all of them.

nouiz updated this revision to Diff 244253.Feb 12 2020, 12:55 PM

I did all the small changes.

tra accepted this revision.Feb 12 2020, 1:42 PM
This revision was automatically updated to reflect the committed changes.