This is an archive of the discontinued LLVM Phabricator instance.

Use Flat For 64-bit Global Buffer
ClosedPublic

Authored by cfang on Dec 15 2015, 3:14 PM.

Details

Summary

Generate flat instructions for 64-bit global buffer instructions. In this patch, we defined
a subtarget feature: FlatForGlobal, and add this feature for CI. This means for CI, we will
generate a flat instruction for a 64-bit global buffer reference.

Diff Detail

Event Timeline

cfang updated this revision to Diff 42922.Dec 15 2015, 3:14 PM
cfang retitled this revision from to Use Flat For 64-bit Global Buffer.
cfang updated this object.

The commit message should be prefixed with: AMDGPU/SI:

lib/Target/AMDGPU/AMDGPU.td
240

We don't want to enable FeatureFlatForGlobal globally. It should only be used for HSA.

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
417–418 ↗(On Diff #42922)

I think this fix should be in a separate patch.

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
995–998

You should move the check for useFlatForGlobal() into SelectMUBUF() that way you only need to add it in one place.

1064–1068

You won't need this once you have the check in SelectMUBUF().

lib/Target/AMDGPU/AMDGPUSubtarget.h
73

You need to initialize this value to something in the AMDGPUSubtarget constructor. 'true' when targeting HSA, 'false' otherwise.

lib/Target/AMDGPU/VIInstructions.td
107–112

SInce the patterns below this will be used for both CI and VI now, they should be moved into CIInstructions.td

test/CodeGen/AMDGPU/ci-use-flat-for-global.ll
2–3

Should add a RUN line for non-hsa: -march=amdgcn instead of -mtriple=amdgcn--amdhsa to make sure we are still using the buffer instructions.

cfang updated this revision to Diff 43042.Dec 16 2015, 11:19 AM
cfang edited edge metadata.

Update based on Tom's comments.

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
102–105

You need to set this value before you call initializeSubtargetDependencies() otherwise users won't be able to toggle this using -mattr=flat-for-global.

test/CodeGen/AMDGPU/ci-use-flat-for-global.ll
2–3

I would add two more run line here to test setting the non-default value for each target using -mattr=

7

You can drop the spir_kernel calling convention here.

30–37

Since we are disabling buffer instructions globally, I think you can use a much more simple test case:

void @test(i32 addrspace(1)* %out) {
entry:
  store i32 0, i32 addrspace(1)* %out
  ret void
}
cfang updated this revision to Diff 43053.Dec 16 2015, 1:03 PM

update based on Tom's reviews:

  1. set FlatForGlobal before "initializeSubtargetDependencies()"
  2. update LIT test for a simple one and also check around -mattr=(+/-)flat-for-global
tstellarAMD accepted this revision.Dec 17 2015, 8:44 AM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Dec 17 2015, 8:44 AM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
81–82

The test case you added fails now, because the generation has not been initialized yet. I think we can just check for AmdHsaOS() here.

tstellarAMD requested changes to this revision.Dec 17 2015, 1:36 PM
tstellarAMD edited edge metadata.
This revision now requires changes to proceed.Dec 17 2015, 1:36 PM
cfang added a subscriber: cfang.Dec 17 2015, 1:49 PM

Hello, tom:

I am taking this afternoon again. Call you make the change or I can do it tomorrow!

Thanks

Changpeng

cfang updated this revision to Diff 43252.Dec 18 2015, 10:31 AM
cfang edited edge metadata.

Don't check the subtarget before the subtarget is set up.

cfang updated this revision to Diff 43415.Dec 21 2015, 4:41 PM
cfang edited edge metadata.

Change the way of initialization for FlatForGlobal to make -mattr=-flat-for-global work;
Fix LIT test failures due to generating flat instructions instead of buffer ones.

tstellarAMD accepted this revision.Dec 22 2015, 11:03 AM
tstellarAMD edited edge metadata.

LGTM.

I think it would be good to explain why we are making this change in the commit message. Here is the commit message I wrote when I was getting ready to commit an earlier version of the change. Feel free to use it:

AMDGPU/SI: Use flat for global load/store when targeting HSA

For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.

This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs.  The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.

This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
lib/Target/AMDGPU/CIInstructions.td
36–37

Unnecessary whitepsace change, please drop this before you commit.

This revision is now accepted and ready to land.Dec 22 2015, 11:03 AM
This revision was automatically updated to reflect the committed changes.

Commited but was reverted because of a lIT failure on llvm.dbg.ll.

But this is the test I modified and my local test of this passed.

What could happen?

Thanks;

Changpeng