Page MenuHomePhabricator

[AMDGPU] Ignore KILLs when forming clauses
AcceptedPublic

Authored by sebastian-ne on Jul 15 2021, 12:31 AM.

Details

Reviewers
arsenm
foad
Summary

KILL instructions are sometimes present and prevented hard
clauses from being formed.

Fix this by ignoring all meta instructions in clauses.

Diff Detail

Unit TestsFailed

TimeTest
2,590 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,500 msx64 debian > libarcher.critical::lock.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/lock.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/lock.c
2,450 msx64 debian > libarcher.parallel::parallel-simple.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp -latomic && env OMP_TOOL_VERBOSE_INIT=stderr env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp.log 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c --check-prefixes CHECK,TSAN_ON
2,720 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
2,680 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (21 Failed)

Event Timeline

sebastian-ne created this revision.Jul 15 2021, 12:31 AM
sebastian-ne requested review of this revision.Jul 15 2021, 12:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2021, 12:31 AM
foad added inline comments.Jul 15 2021, 3:04 AM
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
80

What is SII for here?

132

I'm not sure if it's safe to use CI.Length instead of Size here. The difference is that if the clause has any trailing INTERNAL instructions then CI.Length includes them but Size does not. So with your changes, I would expect a sequence like this to be marked as a clause of length 4 instead of length 3:

load
internal
load
internal
sebastian-ne marked an inline comment as done.

Thanks for the review. You’re right about the too large clauses.
I fixed it and added it as a testcase.

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
80

Ups, I used that in an earlier version of this patch and forgot to remove it.

foad accepted this revision.Jul 15 2021, 6:30 AM

I think this looks fine, thanks.

A possible alternative approach would be to put something like this at the start of emitClause:

while (getHardClauseType(CI.Last) == HARDCLAUSE_INTERNAL)
  --CI.Last;
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
61

Nit: maybe call them "meta instructions" here, for consistency with isMetaInstruction, and because "pseudo instruction" means something differrent in the AMDGPU backend.

119

"at the end"

191

Nit: do the == case first, otherwise the "else" case reads like a double negative.

This revision is now accepted and ready to land.Jul 15 2021, 6:30 AM
arsenm added inline comments.Jul 15 2021, 3:34 PM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

I don't see a reason to emit a comment for these. Kills for example already have a comment for them

sebastian-ne marked 3 inline comments as done.

Fix review comments

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

The problem is that the AsmPrinter only handles top-level instructions, but not instructions inside bundles.
So, without the code here, printing a bundle containing a KILL will crash because KILL and others are unhandled in AMDGPUInstPrinter::printInstruction.

arsenm added inline comments.Jul 16 2021, 8:08 AM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

We shouldn't be bundling these?

sebastian-ne added inline comments.Jul 16 2021, 8:20 AM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

How do we create the clause then?
In the tests, we have

$sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
KILL undef renamable $sgpr4
$sgpr3 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 4, 0

and we should be able to emit an s_clause 0x1 in front of the two memory instructions.

arsenm added inline comments.Jul 16 2021, 8:22 AM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

Where did this kill come from? A kill of an undef source is pointless and can be deleted. The point of kill is to artifically extend live ranges anyway, so it can be moved outside of the bundle. Kills are ordinarily deleted after RA anyway

sebastian-ne added inline comments.Jul 16 2021, 8:47 AM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

The case where I encountered it looks like this:

%470:sgpr_128 = S_LOAD_DWORDX4_IMM %469:sreg_64, 48, 0 :: (load (s128) from %ir.73, addrspace 4)
%25:sgpr_128 = S_LOAD_DWORDX4_IMM %3:sreg_64, 320, 0 :: (load (s128) from %ir.67, addrspace 4)
KILL %469.sub0:sreg_64, %469.sub1:sreg_64
%26:sgpr_128 = S_LOAD_DWORDX4_IMM %3:sreg_64, 480, 0 :: (load (s128) from %ir.70, addrspace 4)

The kill is inserted by the SIFormMemoryClauses pass.
In the above example, if flat scratch is enabled, the kill gets inserted after the third s_load, so a clause is inserted successfully later. Without flat scratch, no clause is formed.

sebastian-ne added inline comments.Jul 16 2021, 9:01 AM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

Maybe SIFormMemoryClauses does not extend the potential clause to the last s_load because the register pressure estimate would get too high?
Therefore it inserts the KILL after the second s_load, which in turn prevents inserting a hard clause containing the latter two loads.

The would explain why the position of the KILL changes when flat scratch is enabled or disabled (enabling it frees the 4 SGPRs of the scratch SRD, so it has lower register pressure).

arsenm added inline comments.Jul 19 2021, 7:15 PM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

SIFormMemoryClauses's pressure estimate is extremely crude, so I wouldn't put so much faith in it (I think the pass needs a replacement overall, I think the allocator really needs to be aware of the restriction). The kill is there as a hint to the allocator, but doesn't need to stick around afterwards

sebastian-ne added inline comments.Jul 20 2021, 1:07 AM
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
300

Ok, should we remove the kill instructions if we encounter them in SIInsertHardClauses?
But that doesn’t help for other meta instructions like DBG instructions.