This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Wave and register controls
ClosedPublic

Authored by tstellarAMD on Jun 21 2016, 11:01 AM.

Details

Summary
  • Implemented amdgpu-flat-work-group-size attribute
  • Implemented amdgpu-num-active-waves-per-eu attribute
  • Implemented amdgpu-num-sgpr attribute
  • Implemented amdgpu-num-vgpr attribute
  • Dynamic LDS constraints are in a separate patch

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 61406.Jun 21 2016, 11:01 AM
kzhuravl retitled this revision from to [AMDGPU] Wave and register controls.
kzhuravl updated this object.
kzhuravl added reviewers: tstellarAMD, arsenm.
kzhuravl added a subscriber: llvm-commits.
kzhuravl added inline comments.Jun 21 2016, 11:04 AM
lib/Target/AMDGPU/AMDGPUSubtarget.h
553

The reason I put this function here is because it either returns flat work group size for a given function or subtarget's defaults, but I am not sure if this is the right place.

563

The reason I put this function here is because it either returns number of active waves per execution unit for a given function or subtarget's defaults, but I am not sure if this is the right place.

lib/Target/AMDGPU/SIRegisterInfo.h
219

Is there a better place for this function?

254

Is there a better place for this function?

arsenm edited edge metadata.Jul 1 2016, 1:02 PM

Can you shrink the test cases? They are pretty big for not testing the content

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
338–354

More DiagnosticInfos. These should also provide the function so clang will be able to provide a more useful debug location

363–378

Can you put blank lines after code before a comment, it's hard to read with it this dense

396–398

This should use DiagnosticInfoUnsupported. Although on second thought these should probably just clamp and diagnosis in clang

402–404

Ditto

411–414

Ditto

lib/Target/AMDGPU/AMDGPUSubtarget.h
553

We probably shouldn't be using IR classes in the subtarget, since there's a desire to someday be able to throw away the IR during codegen, but since there's already one thing already doing this here is fine for now

lib/Target/AMDGPU/SIInstrInfo.cpp
31–32

This doesn't look right

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
51–55

I think just 0, 0 or 0u, 0u should work

lib/Target/AMDGPU/SIMachineFunctionInfo.h
73

Separating line

lib/Target/AMDGPU/SIRegisterInfo.cpp
219–220

SIInstrInfo

245–246

You can remove the cast

lib/Target/AMDGPU/SIRegisterInfo.h
28

I don't think the RegisterInfo should have a reference to the Subtarget

test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-many-regs.ll
7–18 ↗(On Diff #61406)

Can you run instnamer on this and use CHECK-LABEL

test/CodeGen/AMDGPU/attr-amdgpu-num-active-waves-per-eu-invalid.ll
8 ↗(On Diff #61406)

Some tests with unparsable sizes would be good

kzhuravl updated this revision to Diff 65930.Jul 28 2016, 7:30 AM
kzhuravl edited edge metadata.
kzhuravl marked 16 inline comments as done.

Review feedback (I will add more tests a bit later today)

kzhuravl added inline comments.Jul 28 2016, 7:39 AM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
338–354

Changed to clamp

arsenm added inline comments.Jul 28 2016, 2:12 PM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
522

Line before comment

lib/Target/AMDGPU/AMDGPUAsmPrinter.h
79

Line before comment

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
212

You can use those as the initializer value instead of the =

lib/Target/AMDGPU/SIMachineFunctionInfo.h
350–379

The Returns should be \returns to doxygenify, same for the rest of these

lib/Target/AMDGPU/SIRegisterInfo.h
183

More \returns

kzhuravl updated this revision to Diff 66480.Aug 2 2016, 8:23 AM
kzhuravl marked 5 inline comments as done.

Review feedback + Added tests

kzhuravl updated this revision to Diff 66489.Aug 2 2016, 9:25 AM

Add missing tests

kzhuravl updated this revision to Diff 66497.Aug 2 2016, 10:10 AM

Trim down large test cases

whchung added a subscriber: whchung.Aug 3 2016, 3:00 AM
tstellarAMD added inline comments.Aug 3 2016, 7:58 AM
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
128

Mesa is using this attribute, so we still need to support it. We should also leave some tests which use it.

kzhuravl updated this revision to Diff 66756.Aug 3 2016, 9:40 PM
kzhuravl marked an inline comment as done.

Review feedback

LGTM. We just need to resolve the scheduler bug first, and then I think this can be committed.

lib/Target/AMDGPU/SIRegisterInfo.cpp
169–170

This actually uncovers a bug, because the scheduler will subtract the number of reserved registers from the value returned by this function, which in some cases gives us a negative pressure set limit. I think we need to remove this function before this patch can be committed. I will look into this.

Ok, thanks for taking a look!

kzhuravl updated this revision to Diff 67801.Aug 11 2016, 11:10 PM
kzhuravl edited edge metadata.

Rebase

lib/Target/AMDGPU/SIRegisterInfo.cpp
169–170

The fix for this is more complicated than I thought. I think we should fix this by maintaining the current behavior of this function.

This means that if the user does not specify any attributes that would limit the number of registers, then SGPRLimit and VGPRLimit should be set to the 10 waves per CU limits (24 VGPR and .48/80 SGPRS depending on the target).

If the user does specify attributes to limit the number of registers the function should return AMDGPURegisterInfo::getRegPressureSetLimit().

kzhuravl updated this revision to Diff 68176.Aug 16 2016, 7:52 AM
kzhuravl marked an inline comment as done.

Review feedback

kzhuravl added a reviewer: Restricted Project.Aug 30 2016, 3:05 PM
scchan added a subscriber: scchan.Aug 31 2016, 7:26 AM
kzhuravl updated this revision to Diff 69960.Sep 1 2016, 2:00 AM

Rebase + Fix WorkGroupSize -> WavesPerEU calculation

yaxunl added a subscriber: yaxunl.Sep 2 2016, 3:07 PM
yaxunl added inline comments.
test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
75 ↗(On Diff #69960)

Is this extra?

101 ↗(On Diff #69960)

extra line?

tstellarAMD commandeered this revision.Sep 2 2016, 4:58 PM
tstellarAMD edited reviewers, added: kzhuravl; removed: tstellarAMD.

Fixed some bugs with mesa and integrate with the new scheduler.

arsenm accepted this revision.Sep 2 2016, 6:09 PM
arsenm edited edge metadata.

LGTM

lib/Target/AMDGPU/SIRegisterInfo.cpp
871–872

This should be narrowed to just if the xnack feature is enabled

This revision is now accepted and ready to land.Sep 2 2016, 6:09 PM
kzhuravl accepted this revision.Sep 5 2016, 2:59 AM
kzhuravl edited edge metadata.

LGTM

tstellarAMD added inline comments.Sep 6 2016, 6:45 AM
lib/Target/AMDGPU/SIRegisterInfo.cpp
871–872

We have to use 6 here for all VI, because FLAT_SCRATCH is located 2 sgprs below XNACK, so if you use FLAT_SCRATCH, you have to allocate space for XNACK too, even if the device does not support it.

This revision was automatically updated to reflect the committed changes.
test/CodeGen/AMDGPU/large-work-group-promote-alloca.ll