This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Disable trap before unreachable for OpenMP device jobs
AbandonedPublic

Authored by jdoerfert on Jul 19 2021, 12:06 PM.

Details

Summary

We want to fold more aggressively on the GPU and we therefore disable
the generation of llvm.trap before an unreachable by default.

Depends on D106299

Diff Detail

Event Timeline

jdoerfert created this revision.Jul 19 2021, 12:06 PM
jdoerfert requested review of this revision.Jul 19 2021, 12:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2021, 12:06 PM

What's the problem with emitting llvm.trap in various unreachable places? Wondering if it also affects translating assert to an llvm.trap

Fix copy&paste error

What's the problem with emitting llvm.trap in various unreachable places?

llvm.trap is preserved, thus branches to an llvm.trap are preserved.

Wondering if it also affects translating assert to an llvm.trap

no.

As an example, often end up with code like this right now:

  %26 = load i32, i32* addrspacecast (i32 addrspace(3)* @execution_param to i32*), align 4, !dbg !39, !tbaa !27
  %and.i13.i.i = and i32 %26, 4, !dbg !39
  %cmp.i14.not.i.i = icmp eq i32 %and.i13.i.i, 0, !dbg !39
  br i1 %cmp.i14.not.i.i, label %if.end.i129.i.i, label %__kmpc_parallel_51.exit.i, !dbg !39

if.end.i129.i.i:                                  ; preds = %_Z16DecParallelLevelbj.exit.i.i
  tail call void @llvm.trap() #10, !dbg !39
  unreachable, !dbg !39

which could be:

br label %__kmpc_parallel_51.exit.i, !dbg !39
JonChesterfield added a comment.EditedJul 19 2021, 12:50 PM

llvm.trap is preserved, thus branches to an llvm.trap are preserved.

That's interesting. Consistent with IR in general,

template <bool Trap> int test(int x) {
  if (x < 42) {
    return x;
  } else {
    if (Trap)
      __builtin_trap();
    __builtin_unreachable();
  }
}

extern "C" {
int trap(int x) { return test<true>(x); }
int none(int x) { return test<false>(x); }
}

=>

define i32 @trap(i32 returned %0) {
  %2 = icmp slt i32 %0, 42
  br i1 %2, label %4, label %3

3:                                                ; preds = %1
  tail call void @llvm.trap() #3
  unreachable

4:                                                ; preds = %1
  ret i32 %0
}

define i32 @none(i32 returned %0)  {
  %2 = icmp slt i32 %0, 42
  tail call void @llvm.assume(i1 %2) #3
  ret i32 %0
}

So yes, we'll get faster codegen if we are willing to throw away traps followed by unreachable code.

If that's a legitimate transform to do, it seems like something we should do in instcombine, instead of a separate pass. I.e. fold trap, unreachable to unreachable.

Can we do that instead?

edit: seems reasonably likely that the transform always makes gpu code faster, so perhaps it could be an architecture-specific instcombine

llvm.trap is preserved, thus branches to an llvm.trap are preserved.

That's interesting. Consistent with IR in general,

template <bool Trap> int test(int x) {
  if (x < 42) {
    return x;
  } else {
    if (Trap)
      __builtin_trap();
    __builtin_unreachable();
  }
}

extern "C" {
int trap(int x) { return test<true>(x); }
int none(int x) { return test<false>(x); }
}

=>

define i32 @trap(i32 returned %0) {
  %2 = icmp slt i32 %0, 42
  br i1 %2, label %4, label %3

3:                                                ; preds = %1
  tail call void @llvm.trap() #3
  unreachable

4:                                                ; preds = %1
  ret i32 %0
}

define i32 @none(i32 returned %0)  {
  %2 = icmp slt i32 %0, 42
  tail call void @llvm.assume(i1 %2) #3
  ret i32 %0
}

So yes, we'll get faster codegen if we are willing to throw away traps followed by unreachable code.

If that's a legitimate transform to do, it seems like something we should do in instcombine, instead of a separate pass. I.e. fold trap, unreachable to unreachable.

Can we do that instead?

We could, but we should not. A trap inserted by assert or the user should stay (IMHO).
What we do here is to avoid new traps that were arbitrarily inserted with unreachables. There is no particular reason why some "reasons" for an unreachable insert a trap
and others do not, it's just "to help debugging".

jdoerfert abandoned this revision.Jul 19 2021, 2:16 PM

Replaced by D106308

JonChesterfield added a comment.EditedJul 19 2021, 2:18 PM

I'm not sure about that - we could tie instcombine to -O0 or some similar proxy for debugging vs performance - but in practice it's fairly likely that most traps are compiler inserted so it probably works out the same.

Conditional instcombine would let us remove much of the current logic for conditionally inserting traps which seems a win for implementation complexity.

Doesn't matter much for this patch, if D106299 lands then sure, let's switch it on for openmp GPU. If it goes the instcombine route then we don't need to toggle a switch.