This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Outline GEP chain cost modeling into new TTI interface - NFCI.
ClosedPublic

Authored by vdmitrie on Feb 24 2023, 7:20 PM.

Details

Summary

Cost modeling for GEPs should actually be target dependent but is currently
done inside SLP target-independent way.
Sinking it into TTI enables target dependent implementation.
This patch adds new TTI interface and implementation of the basic functionality
trying to retain existing cost modeling.

Diff Detail

Event Timeline

vdmitrie created this revision.Feb 24 2023, 7:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 7:20 PM
vdmitrie requested review of this revision.Feb 24 2023, 7:20 PM

I'd expect this to cause cost changes - have you had any luck creating tests demonstrating that?

llvm/include/llvm/Analysis/TargetTransformInfo.h
2024

Pass by const ref?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2817

Drop the nullptr (maybe convert to StringRef?) and add "Calculated costs for Tree" to the only existing dumpTreeCosts call

7228

This kind of thing always feels like a cheat - is the only reason you don't return TCC_Free here because we still need to do the dumpTreeCosts call below?

7251

It's amazing how even a patch that tries to pull functionality out of SLP increases the size and complexity of SLPVectorizer.cpp :-(

ABataev added inline comments.Feb 26 2023, 6:07 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7251

It not just tries to pull something out, it introduces new interfaces, which will improve cost modeling. This, of course, may increase the complexity. Unfortunate, but required.

vdmitrie updated this revision to Diff 500976.Feb 27 2023, 5:15 PM

Address comments. Rebase.

I'd expect this to cause cost changes - have you had any luck creating tests demonstrating that?

I was not able to make one. Existing tests that are sensitive to this cost modeling are:

  • for plain wide loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll
llvm/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

  • for gather loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_gather-load-redux-cost.ll

But yep, I agree that this is not a great coverage.

vdmitrie marked 2 inline comments as done.Feb 27 2023, 5:34 PM
vdmitrie added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7228

This is just a short cut. If we have all the scalar GEPs to remain in vectorized code then vector cost would be the same. We could just return TCC_Free here but as you noted I'd like to have costs printed.

7251

I tried to add/reword some comments. May be that helps a bit.
We need to sink this target specific cost estimation down to TTI somehow (hence enable target specific implementations). I'm open for any suggestions that could help to solve this issue better way.

I'd expect this to cause cost changes - have you had any luck creating tests demonstrating that?

I was not able to make one. Existing tests that are sensitive to this cost modeling are:

  • for plain wide loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll
llvm/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

  • for gather loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_gather-load-redux-cost.ll

But yep, I agree that this is not a great coverage.

Make them to emit remarks with costs estimations?

I'd expect this to cause cost changes - have you had any luck creating tests demonstrating that?

I was not able to make one. Existing tests that are sensitive to this cost modeling are:

  • for plain wide loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll
llvm/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

  • for gather loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_gather-load-redux-cost.ll

But yep, I agree that this is not a great coverage.

Make them to emit remarks with costs estimations?

I'm not sure I understand. All these tests do already check for pass-remarks output which includes vec tree cost. Did you mean something else?

I'd expect this to cause cost changes - have you had any luck creating tests demonstrating that?

I was not able to make one. Existing tests that are sensitive to this cost modeling are:

  • for plain wide loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll
llvm/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

  • for gather loads:

llvm/test/Transforms/SLPVectorizer/X86/remark_gather-load-redux-cost.ll

But yep, I agree that this is not a great coverage.

Make them to emit remarks with costs estimations?

I'm not sure I understand. All these tests do already check for pass-remarks output which includes vec tree cost. Did you mean something else?

I thought they don't have remarks. Maybe add remarks to some other tests?

ABataev added inline comments.Feb 28 2023, 4:13 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
284–291

Better to represent them as bitfields:

bool IsSameBaseAddress:1 = false;
bool IsUniformStride:1 = false;
bool IsKnownStride:1 = false;
unsigned Reserved: 29;
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2816–2826

This change may be committed separately

7201

Use areAllUsersVectorized?

vdmitrie added inline comments.Feb 28 2023, 11:20 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
284–291

That's fair
Will do

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2816–2826
7201

Yep. That what I was considering. Just though about addressing it in a follow up patch.

vdmitrie updated this revision to Diff 501335.Feb 28 2023, 4:57 PM

Address comment.
Rebase.

I thought they don't have remarks. Maybe add remarks to some other tests?

Please let me know if you have some specific tests in mind to beef up with pass remark checks.
May be tests with specific patterns? Finding them by looking through every test manually is a bit tedious work.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7201

I experimented with replacing single use check with areAllUsersVectorized(). Yes, that's what we want but it makes the patch non-NFC. So I'd like to make that change separately.

ABataev added inline comments.Mar 1 2023, 6:12 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1045

Add an assert that all geps are from the same basic block?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7179

What if it is scatter vectorize node? VL0 still should be LoadInst.

vdmitrie added inline comments.Mar 1 2023, 8:34 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1045

will do

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7179

We do not call this routine for scatter vectorize loads node. We did that in the past but that was a bug (since we evaluated GEPs twice as the result). For scatter vectorize we evaluate GEPs only when calculating cost for tree node that correspond to pointer operands of the loads. The comment above explains exactly that distinction.

ABataev added inline comments.Mar 1 2023, 8:48 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7179

Could you add an assertion for the expected cases? Like ScatterVectorize is not expected, etc.

vdmitrie added inline comments.Mar 1 2023, 8:55 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7179

Sure

vdmitrie updated this revision to Diff 503141.Mar 7 2023, 1:20 PM

Address comments to add assertions.
Rebase.

ABataev added inline comments.Mar 8 2023, 8:10 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1047–1048

Can it happen at all?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7176

Better just return here with diff 0.

vdmitrie added inline comments.Mar 8 2023, 10:10 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1047–1048

Technically, yes. Ptrs is not prohibited to contain just single Base entry (which can be a value or a non-GEP instruction). How practical such use would be is the different kind of question.
If we make the assertion stricter by adding add It != Ptrs.end() we anyway have to add ugly fake use for "It" to avoid build time warning when assertions are suppressed.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7176

I'd like to not bypass the debug printer in this case.
So here are couple of options:
1 - leave it as is
2 -add debug printer prior to the early re
turn.
Pls let me know which way would you prefer.

ABataev added inline comments.Mar 8 2023, 10:16 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1047–1048

Can you add some extra asserts that if there is a single Base, other elements are Undefs/Poisons?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7176

Debug print and early exit

tschuett added inline comments.
llvm/include/llvm/Analysis/TargetTransformInfo.h
283

Can you static_assert that the size of PointerChainInfo is 4?

vdmitrie updated this revision to Diff 503579.Mar 8 2023, 5:33 PM
vdmitrie edited the summary of this revision. (Show Details)

Address comments

vdmitrie updated this revision to Diff 503589.Mar 8 2023, 6:21 PM

add zero initializer for reserved field as some compilers complain

ABataev accepted this revision.Mar 9 2023, 6:28 AM

LG with a nit

llvm/include/llvm/Analysis/TargetTransformInfo.h
316

Add the meaningful message

This revision is now accepted and ready to land.Mar 9 2023, 6:28 AM
vdmitrie added inline comments.Mar 9 2023, 12:12 PM
llvm/include/llvm/Analysis/TargetTransformInfo.h
316

Yep. I'll add it. I was not sure about wording. How about "Was size increase justified?" ?

BTW. I need to make another change to the struct (bool => unsigned) as on Windows that works differently and the size appears 8.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1055

There is actually a problem with this assertion. I suspected about it but only revealed for sure when tried to test the patch more extensively. The problem is that it is very possible that GEPS are not in the same block.
The only guaranteed is that their current uses (loads or stores) are all in the same block.
Imagine we have some struct with 4 fields of same size.
It is possible that 2nd field is written in one block, and all four fields being written in another block. When we try to vectorize these four stores we may have definition for GEP of 2d element store coming from strictly dominating block.
Any ideas about how this should be handled?

ABataev added inline comments.Mar 9 2023, 1:32 PM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1055

For the vectorized blocks it is not possible - we expect that vectorized GEPs are from the same block, otherwise they will be gathered

vdmitrie added inline comments.Mar 9 2023, 1:54 PM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1055

For GEPs that are operands of scatter vectorize node we probably have this check. I did not verify that but I'm not arguing that it is not true.
We definitely have problem for plain (wide) load/stores. We never build dedicated GEP vec-tree node for them and we never check in SLP vectorizer their pointer operands are defined all in same block when we vectorize loads/stores.

ABataev added inline comments.Mar 9 2023, 1:56 PM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1055

Right, for vectorized loads/store we do not do it, we just don't need it. Then you need to remove it.

vdmitrie updated this revision to Diff 504767.Mar 13 2023, 11:31 AM

Fixed issues revealed with more extensive test run.
Removed asserts from base implementation. Same GEPs origin is not guaranteed and and I don't find asserting for kinds of incoming pointers as quite useful.
Added assertion that Base is provided if sameBase is set.
PointersChainInfo changed to satisfy static assert condition on Windows.
Rebased.