Page MenuHomePhabricator

[amdgpu] Add codegen support for HIP dynamic shared memory.
ClosedPublic

Authored by hliao on Jun 24 2020, 12:53 PM.

Details

Summary
  • HIP uses an unsized extern array extern __shared__ T s[] to declare the dynamic shared memory, which size is not known at the compile time.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
arsenm added inline comments.Aug 7 2020, 1:36 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
34

This doesn't need to be MaybeAlign, just Align. Also expand to Dynamic?

Also should probably elaborate that this is used for the case where a dynamically sized global is used

94

Set is the wrong word here; ensureDynamicLDSAlign()?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5585

This should check if the allocated size of GV->getValueType() is 0, not special case 0 sized arrays

5589

Should be the alignment of the aggregate itself, not the element type

arsenm added a comment.Aug 7 2020, 1:49 PM

For the globalisel part, you'll need D84638 and the global lowering should introduce the intrinsic, not the machine pseudo

arsenm added inline comments.Aug 7 2020, 2:01 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
34

This also needs to be added to the MachineFunctionInfo serialization

69

I think having this here actually breaks the calculation of the total size for all of the statically known globals

hliao updated this revision to Diff 284255.Aug 9 2020, 11:11 PM

Add GlobalISel and MIR support.

hliao marked an inline comment as done.Aug 9 2020, 11:15 PM
hliao added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
68

To report the LDS usage accurately, the final LDSSize need to count the padding due to dynamic shared memory alignment. Now, the re-alignment on LDS is explicitly done before the final instruction selection.

69

The re-alignment is done explicitly just before final instruction selection and just once. BTW, getLDSSize should be only valid after instruction selection.

arsenm added inline comments.Aug 10 2020, 12:54 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2808–2811 ↗(On Diff #284255)

I think these should remain distinct queries/fields, not fixed up at an arbitrary point. GlobalISel will miss this for example. The asm printer would query the kind that accounts for the dynamic padding

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284

This should not special case 0 sized arrays and should check the allocated size

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
38

This can still just be Align

96

This isn't a set since it does more than set the value. ensureDynamicLDSAlign?

102–103

This shouldn't be a mutation, but return the aligned up size.

totalLDSAllocSize()?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5585

This should not special case 0 sized arrays. This is 0 allocation size

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll
50

The default should be 1

hliao marked an inline comment as done.Aug 10 2020, 1:05 PM
hliao added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2808–2811 ↗(On Diff #284255)

GlobalISel calls adjustLDSSizeForDynLDSAlign similarly before the finalization of ISel.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284

I tend to be restrictive here to follow how that is used in HIP, zero-sized array always has zero allocated size. If other clients need similar usage but general zero-allocated type, we may enhance accordingly.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
102–103

don't we need to report the accurate usage of LDS? Does that alignment padding need counting as well for the final LDSSize?

arsenm added inline comments.Aug 10 2020, 1:07 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2808–2811 ↗(On Diff #284255)

Having extra state that needs to be made consistent is bad. It's better to just track the two independent fields and do the roundup when needed at the end

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284

Types don't mean anything. Any 0 sized globals are getting allocated to the same address. We're just going to miscompile other 0 sized types. We have no reason to treat other 0 sized types differently

2286–2287

Should use the aggregate alignment, not the element

hliao updated this revision to Diff 286135.Aug 17 2020, 1:46 PM

Add preFinalizeLowering so that both DAGISel and GISel shares the same path
to adjust LDS size.

hliao marked an inline comment as done.Aug 17 2020, 1:48 PM
hliao added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287

Propose to add preFinalizeLowering before pseudo instruction expansion so that both GISel and DAGISel have the chance to adjust LDS size.

arsenm added inline comments.Aug 17 2020, 2:14 PM
llvm/include/llvm/CodeGen/TargetLowering.h
2794 ↗(On Diff #286135)

The last thing we need is more callbacks called at random points

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287

It's really cleaner to just have this compute what you care about at the point you care about it. Having a point where this needs to be made consistent is both worse from a serialization perspective, and from an optimization point since theoretically we could spill into the padding later

hliao added inline comments.Aug 17 2020, 2:40 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287

But, as we discuss this is a short-term solution before the linker could perform the per-kernel LDS resolution. As the spilling to LDS is not implemented yet, this short-term solution should be kept as simple as possible and finally reverted. To pad LDS for the shared memory array, we have to wait until all static LDS ones are allocated. That's should be the point where amdgcn_groupstatcisize is about to be expanded, i.e. before finalizing lowering.

arsenm added inline comments.Aug 17 2020, 2:44 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287

Short term workaround or not, the less mutable state the better. I see no advantage to accumulating the allocated + padding size in a single variable vs. keeping the two separate. You have to track both in MFI anyway, and accumulating like this loses information.

hliao updated this revision to Diff 286197.Aug 17 2020, 9:38 PM

Remove adjustLDSSizeForDynLDSAlign.

arsenm added inline comments.Aug 18 2020, 3:28 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
41

Leftover comment: Just before the final selection, LDSSize is adjusted accordingly.

42

This doesn't need to be MaybeAlign, it can just be Align and default to 1

102–103

ensureDynamicLDSAlign(), and don't need to conditionally set it. This also does not need to mutate LDSSize

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir
126

Should be 1

hliao marked 4 inline comments as done.Aug 18 2020, 8:10 PM
hliao updated this revision to Diff 286463.Aug 18 2020, 8:10 PM

Revise following comments.

hliao updated this revision to Diff 286555.Aug 19 2020, 7:05 AM

Minor coding style fix.

arsenm added inline comments.Aug 19 2020, 8:48 AM
llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171

Should add parser tests for these cases

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59

This is an independent field and should not be changed here

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
99

This should still not be modifying LDSSize. This is still missing an independent query to give the static + rounded size

hliao added inline comments.Aug 19 2020, 9:10 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59

As the sequence of static LDS allocation and dynamic LDS alignment updates are processed in the program order or reverse of that order, we need to collect all static LDS usage and dynamic LDS alignment. As we remove the previous the one single point adjustment, we need to update LDSSizze if there's any static LDS allocation or dynamic LDS alignment updates.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
99

Valid LDS queries should be done after instruction selection. LDSSize is ONLY modified within instruction selection through static LDS allocation and dynamic LDS alignment update.

hliao added inline comments.Aug 19 2020, 9:14 AM
llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir covers that.

arsenm added inline comments.Aug 19 2020, 9:27 AM
llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171

It doesn't cover the error cases

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59

OK yes, this is in the right place now. However,r it should be where the alignment is updated

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588

This logic should be moved into allocateLDSGlobal

hliao added inline comments.Aug 19 2020, 9:39 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59

As we don't know which one will be processed last, we need to update LDSSize in both cases to ensure the correct one is calculated.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588

the allocation of dynamic LDS is not handled by the compiler. We only collect the alignment.

arsenm added inline comments.Aug 19 2020, 9:49 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59

This is actually terrible and we're going to burn extra padding, but I guess it's conservatively correct

hliao updated this revision to Diff 286591.Aug 19 2020, 9:50 AM

Add the invalid case for alignment parsing.

hliao marked an inline comment as done.Aug 19 2020, 9:50 AM
hliao added inline comments.
llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171

Add an invalid case.

hliao marked an inline comment as done.Aug 19 2020, 9:54 AM
hliao added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59

By maintaining StaticLDSSize, the padding for dynamic LDS is done only once. However, we need to continuously update it if there's any static LDS allocation or dynamic LDS alignment updates.

hliao updated this revision to Diff 286819.Aug 20 2020, 7:48 AM

Rebase.

arsenm added inline comments.Aug 20 2020, 12:55 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284

Should get global's alignment, not just the type

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588

This should take the alignment from the global, not just the type alignment

llvm/test/CodeGen/AMDGPU/GlobalISel/hip.extern.shared.array.ll
11

Needs some tests with larger explicit alignments

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-dynlds-align-invalid-case.mir
2–3

This isn't XFAIL, it's run with not and check the error message output

hliao updated this revision to Diff 286896.Aug 20 2020, 1:41 PM

Check the explicit alignment if any and fix the negative test case.

hliao marked 4 inline comments as done.Aug 20 2020, 1:43 PM
hliao added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284

Good catch, fixed in the latest revision.

hliao marked an inline comment as done.Aug 20 2020, 2:00 PM
arsenm added inline comments.Aug 20 2020, 4:36 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5585

Should probably add a comment explaining this is dynamically allocated or something

llvm/test/CodeGen/AMDGPU/hip.extern.shared.array.ll
4–11

None of these use an unnaturally high alignment

hliao updated this revision to Diff 286927.Aug 20 2020, 5:36 PM

Add comment and revist the test case.

arsenm accepted this revision.Aug 20 2020, 6:15 PM
This revision is now accepted and ready to land.Aug 20 2020, 6:15 PM
This revision was automatically updated to reflect the committed changes.