This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add optional bounds checking for scratch accesses
AbandonedPublic

Authored by critson on Apr 16 2019, 6:12 AM.

Details

Reviewers
arsenm
nhaehnle
Summary

Implement a pass, enabled by -amdgpu-scratch-bounds-checking,
which adds bounds checks to scratch accesses.
When this pass is enabled, out-of-bounds writes have no effect
and out-of-bounds reads return zero.
This is useful for GFX9 where hardware no longer performs
bounds checking on scratch accesses and hence page faults
result for out-of-bounds accesses by generated by shaders.

Change-Id: Id2ee4b1f32e70b6bde2541db755727b6a407721b

Diff Detail

Event Timeline

critson created this revision.Apr 16 2019, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2019, 6:12 AM

I don't understand the problem being solved here. Who/what is this intended to benefit? An out of bounds access is going to be undefined. I don't want to start trying to define in at an arbitrary point in the backend. Crashing on the invalid access is much easier problem to debug. This seems to be a partial replacement for asan? If this is intended as some user visible semantic fix, that requires more thought and probably needs to be a new address space.

lib/Target/AMDGPU/SIInsertScratchBounds.cpp
210–228

You can use TII::getAddNoCarry

243–244

I'm working on only allowing instructions that are terminators to modify exec, but this is introducing a new exec write in the middle of a block

328–329

This isn't going to catch scratch accesses without an MMO, you need to check the opcodes

331–332

I dislike building a worklist of all instructions in the function to handle. This shouldn't be hard to handle as you go through the function

341–342

This isn't going to be strong enough to guarantee that there will never be an out of bounds access.

This approach also isn't going to work in a callable function, in the presence of calls, or variable sized stack objects (which I'm planning on implementing)

If the goal is to have a semantically always dereferencable stack pointer, I think we need to create a new addrspace. It would then be a no-op addrspacecast from an alloca, which the frontend desiring safe stack access would be responsible for inserting. We would then need to track the current global stack size in the ABI somewhere, and selection would need to insert this kind of bounds check code based on that

critson added inline comments.May 7 2019, 7:55 AM
lib/Target/AMDGPU/SIInsertScratchBounds.cpp
341–342

On that basis do you have any opinions on the appropriate method for establishing stack size going forward?

I don't understand the problem being solved here. Who/what is this intended to benefit? An out of bounds access is going to be undefined.

Graphics APIs have "robust (buffer) access" settings in which out of bounds accesses must be guaranteed not to crash the program.

lib/Target/AMDGPU/SIInsertScratchBounds.cpp
341–342

You could emit the stack size as a TargetGlobalAddress referring to a special symbol, and ask PAL to fixup the relocations for that symbol at load time. Details would have to be figured out, but that works as a rough approach.

critson updated this revision to Diff 201929.May 29 2019, 8:33 AM
  • Remove worklist usage.
  • Move from global to subtarget option.
  • Add second pass to resolve scratch size later in backend.
  • Add missing lit tests.
critson marked 6 inline comments as done.May 29 2019, 8:38 AM
critson added inline comments.
lib/Target/AMDGPU/SIInsertScratchBounds.cpp
243–244

This code inserts new basic blocks such that only terminators modify exec, unless I missed something?

328–329

I don't think we can do anything sensible with instructions without MMO. They could be accessing any address space, so it would not be appropriate to apply scratch bounds checks to them.

critson marked an inline comment as done.Jun 6 2019, 2:00 AM

Ping.
I was wondering if I can get a second reading on this?
I believe I've addressed everything except the introduction of a new address space, as this seems somewhat heavyweight to me.
If the address space is considered absolutely necessary then I'll work on that next.

arsenm added a comment.Jun 6 2019, 7:26 PM

I don't understand the problem being solved here. Who/what is this intended to benefit? An out of bounds access is going to be undefined.

Graphics APIs have "robust (buffer) access" settings in which out of bounds accesses must be guaranteed not to crash the program.

And this is allowed to be set for stack objects?

lib/Target/AMDGPU/SIFixScratchSize.cpp
71 ↗(On Diff #201929)

No c string functions

arsenm added inline comments.Jun 6 2019, 7:29 PM
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
896

I would expect this kind of handling to be done as part of selection, not a pretty late pass

lib/Target/AMDGPU/SIFixScratchSize.cpp
62 ↗(On Diff #201929)

This whole pass doesn't work when there's dynamic stack usage?

65 ↗(On Diff #201929)

Can early exit if there is no stack usage

68 ↗(On Diff #201929)

Can't assume the instruction use

arsenm added a comment.Jun 6 2019, 8:20 PM

I don't understand the problem being solved here. Who/what is this intended to benefit? An out of bounds access is going to be undefined.

Graphics APIs have "robust (buffer) access" settings in which out of bounds accesses must be guaranteed not to crash the program.

There must be some restrictions on this? I don't think any kind of codegen pass like this will be strong enough to guarantee this. An IR optimization could have introduced a trap or something based on a detected invalid access. This probably won't happen today since things are overly conservative with non-0 address spaces, but this is something I want to fix

critson abandoned this revision.Oct 11 2020, 10:34 PM

This has been superseded by front-end work in graphics compiler.