It has been discovered that lowering the priority after VMEM loads
following long sequences of VALU instructions does not improve
|60,120 ms||x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp|
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -shared-libasan -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxDynamicConfig/TestCases/Output/scariness_score_test.cpp.tmp
|60,110 ms||x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp|
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp
Updated as suggested.
We don't know it yet, I'm afraid. For the couple real use cases that we have for the issue that this pass tries to address counting instructions looks sufficient. This being coupled with that proper cycle counting might be not very trivial, we may be at the risk of over-engineering here.
We could, but I'm not sure I know how that might be useful in practice.
Yes, that's not good. Combined the two new loops into one and simplified related code. Thanks.
Well, the instruction we create here is not just an auxiliary value that we coincidentally happen to use in both the cases, if that's what you mean. That instruction must be spent in all cases and we do want it be the same instruction.
But you can construct it in the right place rather than constructing the instruction and then inserting after. You have the insert point you want, you can just construct it there? It's unusual to need insert/insertAfter
I'm not sure I see how this answers the point. Granted that constructing the instruction at the points of insertion is possible, but as I said in this case we have reasons to create it in a single place.
I'm not sure how to review this. Can you explain why this heuristic makes sense intuitively? Or do you have any benchmarks to back it up? Or preferably both? :)
If you're ignoring loops, can you described succinctly what you *are* counting? Is it something like the minimum number of VALU instructions along any path from the start of the function to the VMEM load in question?
I agree with Matt that it would be more normal to construct-and-insert at the same time, though I don't feel very strongly about it. I think I suggested before that you could pass the insertion point into BuildSetprioMI. In this case the insertion point would be something like MBBInfos[MBB].LastVMEMLoad ? std::next(MBBInfos[MBB].LastVMEMLoad) : MBB->begin().
Right, that's the hardest part of it, isn't it. Because we don't have much use cases provided, the idea was to start with replicating more or less the same counting logic we have in the other compiler, and then adjust things using feedback from people who can give it some proper testing.
Good point. Amended the comment.