Page MenuHomePhabricator

[OpenMP] Disable trap before unreachable for OpenMP device jobs
AbandonedPublic

Authored by jdoerfert on Mon, Jul 19, 12:06 PM.

Details

Summary

We want to fold more aggressively on the GPU and we therefore disable
the generation of llvm.trap before an unreachable by default.

Depends on D106299

Diff Detail

Unit TestsFailed

TimeTest
3,090 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
3,140 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
3,330 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
3,610 msx64 debian > libarcher.races::lock-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-unrelated.c
3,520 msx64 debian > libarcher.races::parallel-simple.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/parallel-simple.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/parallel-simple.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/parallel-simple.c
View Full Test Results (16 Failed)

Event Timeline

jdoerfert created this revision.Mon, Jul 19, 12:06 PM
jdoerfert requested review of this revision.Mon, Jul 19, 12:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Jul 19, 12:06 PM

What's the problem with emitting llvm.trap in various unreachable places? Wondering if it also affects translating assert to an llvm.trap

Fix copy&paste error

What's the problem with emitting llvm.trap in various unreachable places?

llvm.trap is preserved, thus branches to an llvm.trap are preserved.

Wondering if it also affects translating assert to an llvm.trap

no.

As an example, often end up with code like this right now:

  %26 = load i32, i32* addrspacecast (i32 addrspace(3)* @execution_param to i32*), align 4, !dbg !39, !tbaa !27
  %and.i13.i.i = and i32 %26, 4, !dbg !39
  %cmp.i14.not.i.i = icmp eq i32 %and.i13.i.i, 0, !dbg !39
  br i1 %cmp.i14.not.i.i, label %if.end.i129.i.i, label %__kmpc_parallel_51.exit.i, !dbg !39

if.end.i129.i.i:                                  ; preds = %_Z16DecParallelLevelbj.exit.i.i
  tail call void @llvm.trap() #10, !dbg !39
  unreachable, !dbg !39

which could be:

br label %__kmpc_parallel_51.exit.i, !dbg !39
JonChesterfield added a comment.EditedMon, Jul 19, 12:50 PM

llvm.trap is preserved, thus branches to an llvm.trap are preserved.

That's interesting. Consistent with IR in general,

template <bool Trap> int test(int x) {
  if (x < 42) {
    return x;
  } else {
    if (Trap)
      __builtin_trap();
    __builtin_unreachable();
  }
}

extern "C" {
int trap(int x) { return test<true>(x); }
int none(int x) { return test<false>(x); }
}

=>

define i32 @trap(i32 returned %0) {
  %2 = icmp slt i32 %0, 42
  br i1 %2, label %4, label %3

3:                                                ; preds = %1
  tail call void @llvm.trap() #3
  unreachable

4:                                                ; preds = %1
  ret i32 %0
}

define i32 @none(i32 returned %0)  {
  %2 = icmp slt i32 %0, 42
  tail call void @llvm.assume(i1 %2) #3
  ret i32 %0
}

So yes, we'll get faster codegen if we are willing to throw away traps followed by unreachable code.

If that's a legitimate transform to do, it seems like something we should do in instcombine, instead of a separate pass. I.e. fold trap, unreachable to unreachable.

Can we do that instead?

edit: seems reasonably likely that the transform always makes gpu code faster, so perhaps it could be an architecture-specific instcombine

llvm.trap is preserved, thus branches to an llvm.trap are preserved.

That's interesting. Consistent with IR in general,

template <bool Trap> int test(int x) {
  if (x < 42) {
    return x;
  } else {
    if (Trap)
      __builtin_trap();
    __builtin_unreachable();
  }
}

extern "C" {
int trap(int x) { return test<true>(x); }
int none(int x) { return test<false>(x); }
}

=>

define i32 @trap(i32 returned %0) {
  %2 = icmp slt i32 %0, 42
  br i1 %2, label %4, label %3

3:                                                ; preds = %1
  tail call void @llvm.trap() #3
  unreachable

4:                                                ; preds = %1
  ret i32 %0
}

define i32 @none(i32 returned %0)  {
  %2 = icmp slt i32 %0, 42
  tail call void @llvm.assume(i1 %2) #3
  ret i32 %0
}

So yes, we'll get faster codegen if we are willing to throw away traps followed by unreachable code.

If that's a legitimate transform to do, it seems like something we should do in instcombine, instead of a separate pass. I.e. fold trap, unreachable to unreachable.

Can we do that instead?

We could, but we should not. A trap inserted by assert or the user should stay (IMHO).
What we do here is to avoid new traps that were arbitrarily inserted with unreachables. There is no particular reason why some "reasons" for an unreachable insert a trap
and others do not, it's just "to help debugging".

jdoerfert abandoned this revision.Mon, Jul 19, 2:16 PM

Replaced by D106308

JonChesterfield added a comment.EditedMon, Jul 19, 2:18 PM

I'm not sure about that - we could tie instcombine to -O0 or some similar proxy for debugging vs performance - but in practice it's fairly likely that most traps are compiler inserted so it probably works out the same.

Conditional instcombine would let us remove much of the current logic for conditionally inserting traps which seems a win for implementation complexity.

Doesn't matter much for this patch, if D106299 lands then sure, let's switch it on for openmp GPU. If it goes the instcombine route then we don't need to toggle a switch.