This is an archive of the discontinued LLVM Phabricator instance.

[LoadStoreVectorizer] Consider if operation is faster than before
ClosedPublic

Authored by rampitec on Apr 21 2022, 5:17 PM.

Details

Summary

Compare a relative speed of misaligned accesses before and
after vectorization, not just check the new instruction is
not going to be slower.

Since no target now returns anything but 0 or 1 for Fast
argument of the allowsMisalignedMemoryAccesses this is still NFCI.

The subsequent patch will tune actual vaues of Fast on AMDGPU.

Diff Detail

Event Timeline

rampitec created this revision.Apr 21 2022, 5:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:17 PM
Herald added subscribers: hiraditya, tpr. · View Herald Transcript
rampitec requested review of this revision.Apr 21 2022, 5:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:17 PM
Herald added a subscriber: wdng. · View Herald Transcript
foad added inline comments.Apr 22 2022, 3:17 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1086

It is not at all clear to me whether it makes sense to compare FastBefore with Fast directly, or whether one of them needs to be multiplied by ChainSize. Especially since the definition of Fast is target-specific.

rampitec added inline comments.Apr 22 2022, 10:28 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1086

I thought about this, but 'fast' is not an actual memory transfer speed like Gb/s, it is a speed rank. Defining real memory speed will be hard to do, especially given alignment considerations. I do not think this is additive like a cost. It can be compared, but hardly added or multiplied. More so when an ultimate 'slow' is still zero.

rampitec updated this revision to Diff 424952.Apr 25 2022, 10:01 AM
arsenm added inline comments.May 17 2022, 9:04 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1081–1082

Maybe rename to RelativeCost or something?

rampitec updated this revision to Diff 430174.May 17 2022, 1:04 PM
rampitec marked an inline comment as done.

Renamed 'Fast' used in the file to 'RelativeSpeed'.

foad added inline comments.May 18 2022, 2:10 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1086

I still don't understand how generic code can meaningfully compare, for example, one load with speed rank 42 vs two loads with speed rank 99. Are you supposed to ignore the number of loads completely, and just compare the speed rank numbers?

rampitec added inline comments.May 18 2022, 10:30 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1086

Yes, the rank behaves more like a throughput. It is not precisely a throughput because nobody can guarantee it, but a similar concept.

arsenm requested changes to this revision.Nov 16 2022, 3:29 PM

Needs testcase

This revision now requires changes to proceed.Nov 16 2022, 3:29 PM

Needs testcase

This is NFCI. Even the next in stack (D124219) does not really change the codegen, just makes model more sane.

Needs testcase

This is NFCI. Even the next in stack (D124219) does not really change the codegen, just makes model more sane.

This is really NFCI, since no targets use it and return a relative speed yet. However, subsequent D124219 could have some tests for sub-dword LDS operations with alignment smaller than 4. I will add these to the D124219.
The other patch which should depend on this (incomplete, as this one hung for a long time) is the scratch handling, it is always faster to use a dword access no matter what.

arsenm accepted this revision.Nov 18 2022, 3:48 PM
arsenm added inline comments.
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1341

Single quote newline

This revision is now accepted and ready to land.Nov 18 2022, 3:48 PM
rampitec updated this revision to Diff 478383.Nov 28 2022, 2:43 PM
rampitec marked an inline comment as done.
This revision was landed with ongoing or failed builds.Nov 28 2022, 3:52 PM
This revision was automatically updated to reflect the committed changes.