This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix ds_{read,write}2_b64 on SI/gfx6
AbandonedPublic

Authored by nhaehnle on May 4 2019, 9:48 AM.

Details

Reviewers
arsenm
rampitec
Summary

Offsets for these instructions are in dwords, like for the b32 variants.

This is exposed by a combination of:

  • better load / store vectorization in LLVM
  • an in-flight change in Mesa to specify an increased alignment for LDS in compute shaders, which enables more vectorization opportunities

This has been observed on real hardware in the following tests:

  • dEQP-GLES31.functional.compute.shared_var.basic_type.mat4_{lowp,mediump,highp}
  • dEQP-GLES31.functional.compute.shared_var.work_group_size.mat4_{64_1_1, 1_64_1, 1_1_64}

Change-Id: I63c20afd1467b126199be2891ab45451b0430103

Diff Detail

Event Timeline

nhaehnle created this revision.May 4 2019, 9:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2019, 9:48 AM
rampitec added inline comments.May 4 2019, 10:15 AM
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
319

Was it really changed in CI?

arsenm added inline comments.May 4 2019, 2:04 PM
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
319

I doubt this changed, and the manual for SI and gfx9 plainly state it uses 8. This should also not use a hardcoded generation check if this is really true

nhaehnle abandoned this revision.May 7 2019, 4:27 AM

After some more investigation, you're right.

However, there is some SI-specific bug that none of D60459, D61313, and D60457 fix. It's triggered by a trivial modification of an unfortunately relatively complex shader and is extremely sensitive to code changes, so it might be some missing hazard or similar. I'll have to investigate that further.