This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GCN: Remove xnack from 801 and 810
AbandonedPublic

Authored by kzhuravl on Nov 14 2017, 12:06 PM.

Details

Diff Detail

Event Timeline

kzhuravl created this revision.Nov 14 2017, 12:06 PM
t-tye added a subscriber: mareko.
t-tye added inline comments.
lib/Target/AMDGPU/AMDGPU.td
529

D40051 proposes making Bonaire separate from Kaveri and named gfx704.

Looks good to me.

tstellar edited edge metadata.Nov 14 2017, 5:58 PM

Does this change break backwards compatibility with the gfx801 target? If so, which ROCm version will I need to use with these changes?

Does this change break backwards compatibility with the gfx801 target? If so, which ROCm version will I need to use with these changes?

It depends on whether the KMD enables instruction replay. By default, the KMD enables it for compute rings, but I don't know if the ROCm userspace can override the setting.

t-tye edited edge metadata.Nov 14 2017, 6:25 PM

Does this change break backwards compatibility with the gfx801 target? If so, which ROCm version will I need to use with these changes?

It changes the default setting for XNACK which may break backwards compatibility depending whether the runtime sets the hardware to XNACK replay. I do not believe gfx801 has ever been fully working as there are still known bugs in OpenCL conformance related to XNACK support.

Not sure if graphics enables XNACK replay as all buffers may be pinned.

One possibility, now that all APUs have distinct GFX names if D40051 is accepted, would be to default the target feature of XNACK to enabled for APUs. However, it is harder to know how to handle dGPU as each runtime may choose a different setting of XNACK for the same target, depending on if buffers are transferred/pinned. In other words the setting of XNACK is not directly dependent on the target name. So always defaulting to disabled seems a cleaner, then each runtime can decide if it wants to enable XNACK and use the -mxnack option (or cc1 equivalent).

t-tye added a comment.Nov 14 2017, 6:29 PM

Does this change break backwards compatibility with the gfx801 target? If so, which ROCm version will I need to use with these changes?

It depends on whether the KMD enables instruction replay. By default, the KMD enables it for compute rings, but I don't know if the ROCm userspace can override the setting.

The high level runtime does not have to override the setting to still choose to load and execute code without XNACK support. For example, the runtime may choose to pin all buffers and so know XNACK can never happen. This is what the ROCM OpenCL 1.2 runtime does for dGPUs even when that have demand paging enabled. I am not sure how APUs are supported.

Took a look at the Mesa2D runtime with @kzhuravl and it appears that the non-gfx names are being used, and xnack is being explicitly set using the target features. So this change should not have any impact on Mesa3d.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Mesa can't modify SH_MEM_CONFIG to enable/disable XNACK. What is hardcoded in the kernel is what we get. XNACK is only enabled on compute rings on gfx8 APUs and on all rings on gfx9. In practice, Mesa should never access an unmapped page. I don't know if setting -xnack on all chips is a good idea in that case. We might also have suboptimal performance on gfx9 due to XNACK being always enabled by the KMD.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Mesa can't modify SH_MEM_CONFIG to enable/disable XNACK. What is hardcoded in the kernel is what we get. XNACK is only enabled on compute rings on gfx8 APUs and on all rings on gfx9. In practice, Mesa should never access an unmapped page. I don't know if setting -xnack on all chips is a good idea in that case. We might also have suboptimal performance on gfx9 due to XNACK being always enabled by the KMD.

If Mesa always guarantees that the shaders will never access non-resident memory, and so will never have an XNACK, then it can always generate shaders that have XNACK disabled regardless of whether the kernel enables XNACK replay. In other words, enabling XNACK replay does not affect performance unless the shader chooses to generate XNACK compatible code. And the shader does not need to generate XNACK compatible code if it will never access a non-resident page.

For example, OpenCL 1.2 runtime always ensures all buffers are resident, and so compilers all shaders with XNACK disabled, regardless of whther the kernel has enabled XNACK replay.

So, does Mesa runtime always ensure all data accessed will be resident? Even for APUs? If so it would likely be a performance gain to always request no-XNACK unless page migration may also be active on the data accessed.

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Mesa can't modify SH_MEM_CONFIG to enable/disable XNACK. What is hardcoded in the kernel is what we get. XNACK is only enabled on compute rings on gfx8 APUs and on all rings on gfx9. In practice, Mesa should never access an unmapped page. I don't know if setting -xnack on all chips is a good idea in that case. We might also have suboptimal performance on gfx9 due to XNACK being always enabled by the KMD.

If Mesa always guarantees that the shaders will never access non-resident memory, and so will never have an XNACK, then it can always generate shaders that have XNACK disabled regardless of whether the kernel enables XNACK replay. In other words, enabling XNACK replay does not affect performance unless the shader chooses to generate XNACK compatible code. And the shader does not need to generate XNACK compatible code if it will never access a non-resident page.

For example, OpenCL 1.2 runtime always ensures all buffers are resident, and so compilers all shaders with XNACK disabled, regardless of whther the kernel has enabled XNACK replay.

So, does Mesa runtime always ensure all data accessed will be resident? Even for APUs? If so it would likely be a performance gain to always request no-XNACK unless page migration may also be active on the data accessed.

Actually not always. We support partially-resident buffers. If those cause an infinite replay loop, I think we have to disable replay in the KMD. I don't know if no-XNACK is OK for partially-resident buffers.

nhaehnle edited edge metadata.Nov 16 2017, 10:10 AM

Did notice that for Mesa3D XNACK is being forcibly disabled for all targets <gfx9 and forcibly enabled for all targets >=gfx9. Is that the best choice? It seems it will likely not match how the driver is configuring the hardware. For example, how does the driver configure the APUs? Is XNACK always being enabled for gfx9 (it is not for compute)?

Mesa can't modify SH_MEM_CONFIG to enable/disable XNACK. What is hardcoded in the kernel is what we get. XNACK is only enabled on compute rings on gfx8 APUs and on all rings on gfx9. In practice, Mesa should never access an unmapped page. I don't know if setting -xnack on all chips is a good idea in that case. We might also have suboptimal performance on gfx9 due to XNACK being always enabled by the KMD.

If Mesa always guarantees that the shaders will never access non-resident memory, and so will never have an XNACK, then it can always generate shaders that have XNACK disabled regardless of whether the kernel enables XNACK replay. In other words, enabling XNACK replay does not affect performance unless the shader chooses to generate XNACK compatible code. And the shader does not need to generate XNACK compatible code if it will never access a non-resident page.

For example, OpenCL 1.2 runtime always ensures all buffers are resident, and so compilers all shaders with XNACK disabled, regardless of whther the kernel has enabled XNACK replay.

So, does Mesa runtime always ensure all data accessed will be resident? Even for APUs? If so it would likely be a performance gain to always request no-XNACK unless page migration may also be active on the data accessed.

Actually not always. We support partially-resident buffers. If those cause an infinite replay loop, I think we have to disable replay in the KMD. I don't know if no-XNACK is OK for partially-resident buffers.

We should really figure out how to fix that and get gfx8-like behavior though. I.e. don't return XNACK for non-resident buffers, just return 0xdeadbeef or whatever.

kzhuravl abandoned this revision.Nov 16 2017, 2:56 PM

Leaving them on by default for APUs.