This is an archive of the discontinued LLVM Phabricator instance.

[LoopIdiom] let the pass deal with runtime memset size
ClosedPublic

Authored by eopXD on Aug 3 2021, 7:46 AM.

Details

Summary

The current LIR does not deal with runtime-determined memset-size. This patch
utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal.
Before comparison the pass would try to fold the expression that is already
protected by the loop guard.

Testcase file memset-runtime.ll, memset-runtime-debug.ll added.

This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s
that are inequal after folding with the loop guards.

Diff Detail

Event Timeline

eopXD created this revision.Aug 3 2021, 7:46 AM
eopXD requested review of this revision.Aug 3 2021, 7:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 7:46 AM
eopXD updated this revision to Diff 363752.Aug 3 2021, 8:24 AM

Fix clang-format warnings.

eopXD updated this revision to Diff 363764.Aug 3 2021, 9:07 AM

Fix more clang-format warning.

eopXD edited the summary of this revision. (Show Details)Aug 3 2021, 9:45 AM

Thanks.
I would also like to ask to split this up further.
In particular, SCEVExprConverter looks scary.
Can we handle PositiveStrideSCEV != MemsetSizeSCEV case later?

eopXD updated this revision to Diff 363787.Aug 3 2021, 9:59 AM

Address comments.

Abort optimization if PositiveStrideSCEV != MemsetSizeSCEV.
Added TODO and will deal with it later.

eopXD edited the summary of this revision. (Show Details)Aug 3 2021, 10:03 AM
eopXD added a comment.Aug 3 2021, 10:17 AM

In particular, SCEVExprConverter looks scary.

I agree that adding a bug chunk is quite scary.
Its purpose is to reduce SCEV-s for comparison (convert sext to zext or fold smax constants)
I think most expression cases in the converter is not touched (as LIR is dealing with loops here).

However I currently can't think of a more simpler structure though, do you have any suggestions?
(Or is it acceptable since its a local structure in LIR?)

It might be good do change the function signatures (to take the ScalarEvolution argument, to take a SCEV instead of unsigned) in a preparatory NFC patch.
Some more catch-all inline comments..

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
220

Can this be changed in a preparatory NFC patch?

789–791

Can this be changed in a preparatory NFC patch?

923

Can you make this change globally trouough the file in a preparatory NFC patch?

(Though, somehow it feels conceptually wrong to pass it into processLoopStridedStore().)

954

Negative test missing

958

Negative test missing

959

Negative test missing

966

Test missing

973

Test missing

1002

Can this be changed in a preparatory NFC patch?

eopXD added a comment.Aug 4 2021, 2:12 AM

NFC patch for changing StoreSize from unsigned to SCEV: D104595

eopXD updated this revision to Diff 364156.Aug 4 2021, 9:22 AM

Rebase to main HEAD.

eopXD updated this revision to Diff 364380.Aug 5 2021, 1:33 AM

Landed D104595, rebase.

eopXD marked 3 inline comments as done.Aug 5 2021, 1:34 AM
eopXD added inline comments.Aug 5 2021, 7:32 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
923

NFC patch: D107570.

eopXD updated this revision to Diff 364491.Aug 5 2021, 8:19 AM
eopXD marked an inline comment as done.

Landed D107570, rebase.

eopXD updated this revision to Diff 364508.Aug 5 2021, 9:27 AM

Rebase.

eopXD updated this revision to Diff 364848.Aug 6 2021, 11:17 AM
eopXD marked 5 inline comments as done.

Address comments.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
954

isLegalStore already blocks non-zero address spaces. So this guard will never trigger and should be deleted.

958

Added memset-runtime-negative-cases.ll - @MemsetSize_LoopVariant.

959

isLegalStore already blocks non-affine SCEV for the pointer in the memset instruction. So this guard will never trigger and should be deleted.

966

Added memset-runtime.ll - @For_NegativeStride

973

Added memset-runtime-negative-cases.ll - @MemsetSize_Stride_Mismatch.

eopXD updated this revision to Diff 364934.Aug 6 2021, 11:24 PM

isLegalStore should be kept the same.

eopXD updated this revision to Diff 364966.EditedAug 7 2021, 9:18 AM

processLoopMemset does not goes through isLegalStore.
Checks and test cases added.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
954

Added memset-runtime-debug.ll - @NonZeroAddressSpace.

959

Added memset-runtime-debug.ll - @NonAffinePointer.

eopXD edited the summary of this revision. (Show Details)Aug 7 2021, 9:33 AM
Whitney added inline comments.Aug 9 2021, 6:53 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
925–981

initialize the variable?

929

Add a period at the end of a sentence.

948

Add a period at the end of a sentence.

955
// If the original StrideSCEV and MemsetSizeSCEV does not match, the pass
// will fold expressions that is covered by the loop guard at loop entry.
// The pass will compare again after the folding and proceed if equal.

That should be added in the future patch that implement that.

llvm/test/Transforms/LoopIdiom/memset-runtime-debug.ll
2

is that true? Or you manually generated this test case?
If not, please remove this line.

48

That's not ar[i]

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
3

Why do we need assert for this test case?

Whitney added a subscriber: bmahjour.
bmahjour added inline comments.Aug 9 2021, 10:09 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
950

isn't -> aren't

951

handle -> handles

954

Is this part of the comment talking about what you plan to do in the future? If so please either move it to the TODO section on line 977, or put a TODO marker here.

llvm/test/Transforms/LoopIdiom/memset-runtime-debug.ll
44

int *ar is actually the first parameter in the IR below.

115

Would be useful to have the psudo-C comment like other tests.

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
3

This one doesn't require asserts.

52

ar is the first parameter in the IR below

60

could you rename %1 and %2 to %n and %m and update the CHECKS?

eopXD updated this revision to Diff 365409.Aug 10 2021, 3:14 AM
eopXD marked 15 inline comments as done.

Address comments.

This revision is now accepted and ready to land.Aug 12 2021, 11:07 AM

Added some minor NFC comments.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
899

"with a constant size" can be removed, since the new code is trying to handle a non-constant size.

965

"MemsizeSizeSCEV" -> "MemsetSizeSCEV"

eopXD edited the summary of this revision. (Show Details)Aug 13 2021, 9:48 AM
eopXD updated this revision to Diff 366304.Aug 13 2021, 9:52 AM
eopXD marked 2 inline comments as done.

Address comments.

Thank you @lebedev.ri for guiding my huge code chunk in D104636 into this patch that is more concise.
Also thank you @Whitney, @bmahjour, @qianzhen for reviewing my code.
I am now going to commit it.

Next I would work on the case when MemsetSizeSCEV != PointerStrideSCEV. In this case, the SCEV-s can be folded given the loop entry is already protected by loop guard.

Thank you, i think this is about right for this chunk.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
930–934

Since non-constant case has to deal with this problem also, i suspect this should go away

llvm/test/Transforms/LoopIdiom/memset-runtime-debug.ll
4

It's not great that we don't check what the IR is.
At the very least i think you want FileCheck --implicit-check-not=memset %s

eopXD updated this revision to Diff 366313.Aug 13 2021, 11:00 AM
eopXD marked 2 inline comments as done.

Address comments.

Put debug output checks at very front, then IRs are checked.

Actually, we've talked about not needing the overflow check, but IIRC just saturating the number of bytes.
Can you point me at the code that does that?

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1078–1104

This doesn't look right.

  1. There absolutely should not be two almost identical functions
  2. Shouldn't this be saturating unsigned multiplication?

Actually, we've talked about not needing the overflow check, but IIRC just saturating the number of bytes.

May I ask what does IIRC mean?

Actually, we've talked about not needing the overflow check, but IIRC just saturating the number of bytes.

May I ask what does IIRC mean?

If I Remember Correctly

Actually, we've talked about not needing the overflow check, but IIRC just saturating the number of bytes.
Can you point me at the code that does that?

No code for saturation here.
I thought restricting to address space zero prevents the need of saturation.

eopXD updated this revision to Diff 366318.Aug 13 2021, 11:24 AM
eopXD marked an inline comment as done.

Address comment.
There is only one getNumBytes now.

Actually, we've talked about not needing the overflow check, but IIRC just saturating the number of bytes.
Can you point me at the code that does that?

No code for saturation here.
I thought restricting to address space zero prevents the need of saturation.

@efriedma does that sound right?

No code for saturation here.
I thought restricting to address space zero prevents the need of saturation.

@efriedma does that sound right?

I think I mentioned this on the other bug... but sounds right to me; overflow implies undefined behavior. If the amount does overflow, that means the memset is writing at least once to every byte in the address space. Among other things, that implies a store to null.

lebedev.ri accepted this revision.Aug 13 2021, 12:28 PM

No code for saturation here.
I thought restricting to address space zero prevents the need of saturation.

@efriedma does that sound right?

I think I mentioned this on the other bug... but sounds right to me; overflow implies undefined behavior. If the amount does overflow, that means the memset is writing at least once to every byte in the address space. Among other things, that implies a store to null.

Well, alrighy then.
@efriedma feel like stamping this too then?

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
930

Comment needs updating?

eopXD updated this revision to Diff 366408.Aug 14 2021, 1:00 AM

Rebase to main.

eopXD updated this revision to Diff 366412.Aug 14 2021, 1:41 AM
eopXD marked an inline comment as done.

Address comments.