- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch
Details
- Reviewers
kzhuravl arsenm - Group Reviewers
Restricted Project - Commits
- rG1d65026ca684: [AMDGPU] Wave and register controls
rL280747: [AMDGPU] Wave and register controls
Diff Detail
Event Timeline
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
445 | The reason I put this function here is because it either returns flat work group size for a given function or subtarget's defaults, but I am not sure if this is the right place. | |
455 | The reason I put this function here is because it either returns number of active waves per execution unit for a given function or subtarget's defaults, but I am not sure if this is the right place. | |
lib/Target/AMDGPU/SIRegisterInfo.h | ||
226 | Is there a better place for this function? | |
261 | Is there a better place for this function? |
Can you shrink the test cases? They are pretty big for not testing the content
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
---|---|---|
269–285 | More DiagnosticInfos. These should also provide the function so clang will be able to provide a more useful debug location | |
294–309 | Can you put blank lines after code before a comment, it's hard to read with it this dense | |
327–329 | This should use DiagnosticInfoUnsupported. Although on second thought these should probably just clamp and diagnosis in clang | |
333–335 | Ditto | |
342–345 | Ditto | |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
445 | We probably shouldn't be using IR classes in the subtarget, since there's a desire to someday be able to throw away the IR during codegen, but since there's already one thing already doing this here is fine for now | |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
32 | This doesn't look right | |
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
51–52 | I think just 0, 0 or 0u, 0u should work | |
lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
66 | Separating line | |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
214 | SIInstrInfo | |
236 | You can remove the cast | |
lib/Target/AMDGPU/SIRegisterInfo.h | ||
28 | I don't think the RegisterInfo should have a reference to the Subtarget | |
test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-many-regs.ll | ||
7–18 | Can you run instnamer on this and use CHECK-LABEL | |
test/CodeGen/AMDGPU/attr-amdgpu-num-active-waves-per-eu-invalid.ll | ||
8 | Some tests with unparsable sizes would be good |
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
---|---|---|
269–285 | Changed to clamp |
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
502 | Line before comment | |
lib/Target/AMDGPU/AMDGPUAsmPrinter.h | ||
77 | Line before comment | |
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
247 | You can use those as the initializer value instead of the = | |
lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
334–347 | The Returns should be \returns to doxygenify, same for the rest of these | |
lib/Target/AMDGPU/SIRegisterInfo.h | ||
190 | More \returns |
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
128 | Mesa is using this attribute, so we still need to support it. We should also leave some tests which use it. |
LGTM. We just need to resolve the scheduler bug first, and then I think this can be committed.
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
153–155 | This actually uncovers a bug, because the scheduler will subtract the number of reserved registers from the value returned by this function, which in some cases gives us a negative pressure set limit. I think we need to remove this function before this patch can be committed. I will look into this. |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
153–155 | The fix for this is more complicated than I thought. I think we should fix this by maintaining the current behavior of this function. This means that if the user does not specify any attributes that would limit the number of registers, then SGPRLimit and VGPRLimit should be set to the 10 waves per CU limits (24 VGPR and .48/80 SGPRS depending on the target). If the user does specify attributes to limit the number of registers the function should return AMDGPURegisterInfo::getRegPressureSetLimit(). |
LGTM
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
915–916 | This should be narrowed to just if the xnack feature is enabled |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
915–916 | We have to use 6 here for all VI, because FLAT_SCRATCH is located 2 sgprs below XNACK, so if you use FLAT_SCRATCH, you have to allocate space for XNACK too, even if the device does not support it. |
Line before comment