This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid splitting FLAT offsets in unsafe ways
ClosedPublic

Authored by foad on Jul 8 2020, 6:51 AM.

Details

Summary

As explained in the comment:

For a FLAT instruction the hardware decides whether to access
global/scratch/shared memory based on the high bits of vaddr,
ignoring the offset field, so we have to ensure that when we add
remainder to vaddr it still points into the same underlying object.
The easiest way to do that is to make sure that we split the offset
into two pieces that are both >= 0 or both <= 0.

In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have
an unsigned immediate offset field, so we can't use it to help split a
negative offset.

Diff Detail

Event Timeline

foad created this revision.Jul 8 2020, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2020, 6:51 AM
arsenm added a comment.Jul 8 2020, 7:19 AM

The mirror change is needed for globalisel

arsenm added inline comments.Jul 8 2020, 7:28 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

This limitation also only needs to be applied if AS == FLAT_ADDRESS

foad marked an inline comment as done.Jul 8 2020, 7:33 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing??

arsenm added inline comments.Jul 8 2020, 7:42 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions

arsenm added inline comments.Jul 8 2020, 7:46 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority)

foad marked an inline comment as done.Jul 8 2020, 8:39 AM

The mirror change is needed for globalisel

AMDGPUInstructionSelector::selectFlatOffsetImpl doesn't attempt to split offsets so I don't think there's anything to fix.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

This limitation also only needs to be applied if AS == FLAT_ADDRESS

I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture.

foad updated this revision to Diff 276792.Jul 9 2020, 11:46 AM

Rebase.
Fix silly mistake in checking for negative offsets.

arsenm added inline comments.Jul 9 2020, 11:47 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

My understanding was the aperture only means anything for private or local. If it's a global address, it's neither aperture and behaves as a normal instruction (i.e. there's no aperture for global pointers)

foad marked an inline comment as done.Jul 9 2020, 11:51 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

But in that case, you should still avoid making drastic changes to vaddr in case it ends up accidentally pointing *into* one of the apertures, when you wanted a global access. E.g. if you're accessing a global that happens to be just past the end of the private or local aperture.

arsenm added inline comments.Jul 9 2020, 2:27 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

I thought Nicolai mentioned this might not be possible?

I guess I don't understand why the aperture is so complicated and not just a couple of flags in the high, unused bits

Rebase.
Fix silly mistake in checking for negative offsets.

It's hard to see through the rebase, but did fixing the negative offset check add more tests? I assuming that the tests in the original patch did not capture this mistake, so it should warrant a new test.

foad added a comment.Jul 10 2020, 1:15 AM

Rebase.
Fix silly mistake in checking for negative offsets.

It's hard to see through the rebase, but did fixing the negative offset check add more tests? I assuming that the tests in the original patch did not capture this mistake, so it should warrant a new test.

There are no new tests. All the testing comes from staring at the changes in offset-split-flat.ll and offset-split-global.ll.

In retrospect I should have noticed that there was something wrong with the original patch, because it didn't cause any changes in offset-split-flat.ll.

foad marked an inline comment as done.Jul 14 2020, 1:14 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

I don't remember hearing that it wasn't possible. @nhaehnle ?

arsenm added inline comments.Jul 15 2020, 3:09 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

maskTrailingZeros

foad marked an inline comment as done.Jul 16 2020, 12:30 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705

That would round towards -infinity. As the comment says, we're deliberately rounding towards zero instead.

arsenm accepted this revision.Jul 16 2020, 11:24 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1698

Can you add a fixme to check if this is needed if we know the address isn't FLAT_ADDRESS?

This revision is now accepted and ready to land.Jul 16 2020, 11:24 AM
nhaehnle added inline comments.Jul 16 2020, 12:38 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1698

Yes, I'm pretty sure it's needed. If we generate a FLAT instruction, the hardware will check apertures regardless, and you could run into the following scenario:

end_of_scratch_aperture: 0x2'0000'0000
vaddr: 0x1'ffff'fff0
inst_offset: 32

The hardware executes the flat instruction, sees that vaddr falls into the scratch aperture, and executes the instruction as accessing scratch memory.

The only way to fix that would be if we knew that there is never any memory mapped directly after the end of any of the apertures.

This revision was automatically updated to reflect the committed changes.

I'm seeing (difficult to minimise) failure modes with gfx10 that look the same as those on gfx9 before this patch. I also see the comment:

"gfx10 does have a bug with flat offsets"

Is there a corresponding patch needed for gfx10?

foad added a comment.EditedOct 1 2020, 4:23 AM

I'm seeing (difficult to minimise) failure modes with gfx10 that look the same as those on gfx9 before this patch. I also see the comment:

"gfx10 does have a bug with flat offsets"

Is there a corresponding patch needed for gfx10?

This patch affected gfx10 just as much as gfx9, so there is no extra patch required for gfx10.

I am confused about the gfx10 bug. I assume it refers to this:

def FeatureFlatSegmentOffsetBug : SubtargetFeature<"flat-segment-offset-bug",
  "HasFlatSegmentOffsetBug",
  "true",
  "GFX10 bug, inst_offset ignored in flat segment"
>;

But (a) I think this is just the intended behaviour of the hardware, so I wouldn't call it a bug, and (b) I think gfx9 works the same way, otherwise there would have been no need for this patch in the first place!

Update: actually I think there is a gfx10-only hardware bug, but the description of that subtarget feature doesn't describe it very well.