Page MenuHomePhabricator

[LoadStoreVectorizer] Consider if operation is faster than before
Needs ReviewPublic

Authored by rampitec on Apr 21 2022, 5:17 PM.

Details

Reviewers
arsenm
foad
Summary

Compare a relative speed of misaligned accesses before and
after vectorization, not just check the new instruction is
not going to be slower.

Since no target now returns anything but 0 or 1 for Fast
argument of the allowsMisalignedMemoryAccesses this is still NFCI.

The subsequent patch will tune actual vaues of Fast on AMDGPU.

Diff Detail

Unit TestsFailed

TimeTest
60,260 msx64 debian > Clang.Driver::fsanitize.c
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -target x86_64-linux-gnu -fsanitize=undefined -fsanitize-trap=undefined /var/lib/buildkite-agent/builds/llvm-project/clang/test/Driver/fsanitize.c -### 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/clang/test/Driver/fsanitize.c --check-prefix=CHECK-UNDEFINED-TRAP
60,090 msx64 debian > LLVM.CodeGen/NVPTX::wmma.py
Script: -- : 'RUN: at line 5'; "/usr/bin/python3.9" /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/NVPTX/wmma.py --ptx=60 --gpu-arch=70 > /var/lib/buildkite-agent/builds/llvm-project/build/test/CodeGen/NVPTX/Output/wmma.py.tmp-ptx60-sm_70.ll
60,060 msx64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -msse4.2 -gline-tables-only -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/../ -O1 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/restore_stack.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/tsan/X86_64Config/Output/restore_stack.cpp.tmp && not /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/tsan/X86_64Config/Output/restore_stack.cpp.tmp 2>&1 | FileCheck /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/restore_stack.cpp

Event Timeline

rampitec created this revision.Apr 21 2022, 5:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:17 PM
Herald added subscribers: hiraditya, tpr. · View Herald Transcript
rampitec requested review of this revision.Apr 21 2022, 5:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:17 PM
Herald added a subscriber: wdng. · View Herald Transcript
foad added inline comments.Apr 22 2022, 3:17 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1089

It is not at all clear to me whether it makes sense to compare FastBefore with Fast directly, or whether one of them needs to be multiplied by ChainSize. Especially since the definition of Fast is target-specific.

rampitec added inline comments.Apr 22 2022, 10:28 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1089

I thought about this, but 'fast' is not an actual memory transfer speed like Gb/s, it is a speed rank. Defining real memory speed will be hard to do, especially given alignment considerations. I do not think this is additive like a cost. It can be compared, but hardly added or multiplied. More so when an ultimate 'slow' is still zero.

rampitec updated this revision to Diff 424952.Apr 25 2022, 10:01 AM
arsenm added inline comments.May 17 2022, 9:04 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1084–1085

Maybe rename to RelativeCost or something?

rampitec updated this revision to Diff 430174.May 17 2022, 1:04 PM
rampitec marked an inline comment as done.

Renamed 'Fast' used in the file to 'RelativeSpeed'.

foad added inline comments.May 18 2022, 2:10 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1089

I still don't understand how generic code can meaningfully compare, for example, one load with speed rank 42 vs two loads with speed rank 99. Are you supposed to ignore the number of loads completely, and just compare the speed rank numbers?

rampitec added inline comments.May 18 2022, 10:30 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
1089

Yes, the rank behaves more like a throughput. It is not precisely a throughput because nobody can guarantee it, but a similar concept.