This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Do Not Move Loads/Stores Beyond Stacksave/Stackrestore Boundaries
ClosedPublic

Authored by qiongsiwu1 on Nov 23 2022, 9:08 AM.

Details

Summary

If left unchecked, the SLPVecrtorizer can move loads/stores below a stackrestore. The move can cause issues if the loads/stores have pointer operands from allocas that are reset by the stackrestores. This patch adds the dependency check.

The check is conservative, in that it does not check if the pointer operands of the loads/stores are actually from allocas that may be reset. We did not observe any SPECCPU2017 performance degradation so this simple fix seems sufficient.

The test could have been added to llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll, but that test has not been updated to use opaque pointers. I am not inclined to add tests that still use typed pointers, or to refactor llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll to use opaque pointers in this patch. If desired, I will open a different patch to refactor and consolidate the tests.

Diff Detail

Event Timeline

qiongsiwu1 created this revision.Nov 23 2022, 9:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 9:08 AM
qiongsiwu1 requested review of this revision.Nov 23 2022, 9:08 AM
qiongsiwu1 edited the summary of this revision. (Show Details)Nov 23 2022, 9:11 AM
qiongsiwu1 edited the summary of this revision. (Show Details)Nov 23 2022, 9:25 AM

Updating code comment.

Ping for review. @reames could you also let me know if I should tag anyone else for a review? Thanks so much!

ABataev accepted this revision.Nov 28 2022, 5:33 AM

Thanks! I think it is good enough for now, later we can try to tweak, if required.

This revision is now accepted and ready to land.Nov 28 2022, 5:33 AM

@ABataev Thanks so much for the fast review! I will land the patch in a moment.

Can you explain why memory dependence wasn't enough to prevent the problematic reordering? I'm a bit fuzzy on the details - it's been a while - but I though that stacksave and stackrestore were modeled as modifying all memory to prevent this reordering?

Can you explain why memory dependence wasn't enough to prevent the problematic reordering? I'm a bit fuzzy on the details - it's been a while - but I though that stacksave and stackrestore were modeled as modifying all memory to prevent this reordering?

Did a quick search - looks like it does not work for static allocas, only for dynamic

qiongsiwu1 added a comment.EditedNov 28 2022, 5:49 PM

Can you explain why memory dependence wasn't enough to prevent the problematic reordering? I'm a bit fuzzy on the details - it's been a while - but I though that stacksave and stackrestore were modeled as modifying all memory to prevent this reordering?

I took a closer look at the new test case, and it seems that the vectorizer never added the four loads that concerns us to the memory dependency list of the stackrestore. It seemed that for some reason the loads memory locations were never treated as aliasing the stackrestore, hence the condition here was never true for the loads when the DepDest was the stackrestore. If I hardcode the result of isAliased to true as long as inst2 is a stackrestore (here), the new test passed without the fix included in this patch. I have not looked at BatchedAA closely to see what exactly is going on. I will do so if fixing BatchedAA is more desirable.

Did a quick search - looks like it does not work for static allocas, only for dynamic

Ah thanks so much for the fast reply! Maybe it is the alias analysis that is causing static allocas to fail, as stated above? BTW, by static alloca, do we mean the allocas that are defined in the entry block of a function and of a fixed size, as documented here? In the original test case where this issue was observed, the alloca was not defined in the entry block, but was of a known and fixed size (the alloca definition itself was nearly identical to this one).