This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
ClosedPublic

Authored by kerbowa on Apr 29 2022, 2:28 PM.

Details

Summary

Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.

This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.

Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.

Diff Detail

Event Timeline

kerbowa created this revision.Apr 29 2022, 2:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 2:28 PM
kerbowa requested review of this revision.Apr 29 2022, 2:28 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 29 2022, 2:28 PM
kerbowa updated this revision to Diff 426169.Apr 29 2022, 2:48 PM

Add mir tests.

You do not handle masks other than 0 yet?

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
219

Since you are going to extend it I'd suggest this is -1. Then you will start carving bits outs of it. That way if someone start to use it it will still work after update.

222

Why not full i32? This is immediate anyway but you will have more bits for the future.

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
213

Use hex?

You do not handle masks other than 0 yet?

We handle 0 and 1 only.

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
219

Since the most common use case will be to block all instruction types I thought having that be MASK = 0 made the most sense. After that, we carve out bits for types of instructions that should be scheduled across it.

There may be modes where we restrict certain types of memops, so we cannot have MASK = 1 above changed to -1. Since this (MASK = 1) is allowing all ALU across we could define which bits mean VALU/SALU/MFMA etc and use that mask if you think it's better. I'm worried we won't be able to anticipate all the types that we could want to be maskable. It might be better to just have a single bit that can mean all ALU, or all MemOps, and so on to avoid this problem.

222

Good point thanks.

You do not handle masks other than 0 yet?

We handle 0 and 1 only.

Do you mean 1 is supported simply because it has side effects? If I understand it right you will need to remove this to support more flexible masks, right?

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
219

Ok. Let it be 1.

You do not handle masks other than 0 yet?

We handle 0 and 1 only.

Do you mean 1 is supported simply because it has side effects? If I understand it right you will need to remove this to support more flexible masks, right?

Yes.

rampitec accepted this revision.Apr 29 2022, 5:29 PM

You do not handle masks other than 0 yet?

We handle 0 and 1 only.

Do you mean 1 is supported simply because it has side effects? If I understand it right you will need to remove this to support more flexible masks, right?

Yes.

LGTM given that. But change imm to i32 before committing.

This revision is now accepted and ready to land.Apr 29 2022, 5:29 PM

Can you add a test to make sure the hazard recognizer and code size estimate don’t think this is a real instruction

kerbowa updated this revision to Diff 427747.May 6 2022, 2:22 PM

Use i32.
Output hex.
Fix hazard rec tests for pseudo instructions.

rampitec accepted this revision.May 6 2022, 2:27 PM

LGTM

This revision was automatically updated to reflect the committed changes.