This is an archive of the discontinued LLVM Phabricator instance.

Memory writes overlap in the pipelined loop
Needs ReviewPublic

Authored by yan_luo on Mar 6 2019, 11:51 AM.

Details

Reviewers
bcahoon
Summary

The original loop is like:

%15 = …

LOOP:

%26 = PHI %15, %31 
…
STORE -> [%26 + 32, %26 + 32 + 64]  <- 64 is the access size
…
STORE -> [%26 + 0, %26 + 0 + 64]
…
STORE -> [%26 + 62, %26 + 62 + 64]
…
STORE -> [%26 + 94, %26 + 94 + 64]
…
%31 = ADD %26, 124

LOOP_END

The software pipelined loop is like:

%15 = …
%234 = COPY %15

PROLOG:

…
STORE -> [%234 + 32, %234 + 32 + 64]
…
STORE -> [%234 + 0, %234 + 0 + 64]
…
STORE -> [%234 + 62, %234 + 62 + 64]
…
%227 = ADD %234, 124

KERNEL:

%291 = PHI %227, %241
%292 = PHI %15, %291
…
%303 = COPY %291
%241 = ADD %303, 124
…
STORE -> [%303 + 32, %303 + 32 + 64]
…
STORE -> [%303 + 0, %303 + 0 + 64]
…
STORE -> [%292 + 94, %292 + 94 + 64] <- !!!
…
STORE -> [%303 + 62, %303 + 62 + 64]


KERNEL_END
EPILOG:

%299 = PHI %15, %303
STORE -> [%299 + 94, %299 + 94 + 64]

The new content in [offset 0, offset 0 + 64] in the next iteration is overwritten by the write [offset 94, offset94 + 64] in the previous iteration. The loop carried dependency could also exist between store and store instructions.

Diff Detail

Event Timeline

yan_luo created this revision.Mar 6 2019, 11:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 11:51 AM

Currently I only have an in-house benchmark for this. Not sure I can share this, but if Brendon confirms this is a correct fix, I will try to come up with a reproducible testcase.

This needs to be a test somewhere in llvm/test/CodeGen/?arch?/.
Not sure if it should be an IR test, or MIR test.

It seems that Hexagon is the only target that is using SWP in the upstream. Not sure a reproducible testcase at our side is also reproducible in the Hexagon target. But I will give it a try.

Hi Yan,

Thanks for the patch! I agree that the change is needed. Let me see if I can generate a Hexagon test case also.

Thanks,
Brendon

lib/CodeGen/MachinePipeliner.cpp
3134

I think the following is equivalent:

if (!DI->mayStore())

return false;
yan_luo updated this revision to Diff 189727.Mar 7 2019, 8:57 AM
yan_luo marked an inline comment as done.

Only need to check mayStore() on DI.

lebedev.ri added inline comments.Mar 7 2019, 9:03 AM
lib/CodeGen/MachinePipeliner.cpp
3134–3135

The diff should not be relative to the previously uploaded patch.

yan_luo updated this revision to Diff 189736.Mar 7 2019, 9:47 AM

Updated the diff.

dantrushin added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
3170 ↗(On Diff #189736)

BTW, is this code correct?
I would expect something like

(OffsetS + DeltaS) <= (OffsetD + AccessSizeD) && (OffsetS + AccessSizeS + Delta) >= OffsetD

Am I missing something?

bcahoon added inline comments.Mar 14 2019, 11:02 AM
llvm/lib/CodeGen/MachinePipeliner.cpp
3170 ↗(On Diff #189736)

This is wrong. I need to upstream our change to this, which is:

return DeltaS != DeltaD || OffsetS + AccessSizeS < OffsetD + AccessSizeD;

to replace the if-else and return.