Page MenuHomePhabricator

[AMDGPU] Implement hardware bug workaround for image instructions
Needs ReviewPublic

Authored by rdomingu on Jun 4 2020, 8:57 AM.

Details

Reviewers
arsenm
rampitec
Summary

[AMDGPU] Implement hardware bug workaround for image instructions

This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.

Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899

Diff Detail

Unit TestsFailed

TimeTest
20 mslinux > Flang.Preprocessing::compiler_defined_macros.F90
Script: -- : 'RUN: at line 6'; /mnt/disks/ssd0/agent/llvm-project/build/bin/f18 -intrinsic-module-directory /mnt/disks/ssd0/agent/llvm-project/build/tools/flang/include/flang -E /mnt/disks/ssd0/agent/llvm-project/flang/test/Preprocessing/compiler_defined_macros.F90 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck --ignore-case /mnt/disks/ssd0/agent/llvm-project/flang/test/Preprocessing/compiler_defined_macros.F90

Event Timeline

rdomingu created this revision.Jun 4 2020, 8:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2020, 8:57 AM
rdomingu edited the summary of this revision. (Show Details)Jun 4 2020, 10:42 AM
rdomingu added reviewers: arsenm, rampitec.
arsenm added a comment.Jun 4 2020, 1:49 PM

Also needs to update the GlobalISel path

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5932–5933

IsD16 && (!hasUnpackedD16() || (Gather4 && hasGatherbug))?

7248

You can initialize this to the target size and then avoid push_back

7249

This is the same as just MVT::v2i16

7251

Formatting

rdomingu marked 3 inline comments as done.Jun 5 2020, 6:15 AM
rdomingu added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5932–5933

We want NumVDataDwords = DMaskLanes for Gather4 && IsD16 && hasGatherBug. That wouldn't be the case with your suggestion.

rdomingu updated this revision to Diff 276533.Jul 8 2020, 1:09 PM

Updating D81172: [AMDGPU] Implement hardware bug workaround for image instructions

arsenm added inline comments.Jul 9 2020, 4:30 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3400–3401

Brace formatting

3405–3407

There's no obstacle to handling v3 here, it should work in the other cases

rdomingu marked 2 inline comments as done.Jul 13 2020, 10:51 AM
rdomingu added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3405–3407

Note that v3f16 is not handled in SIISelLowering.cpp either (line 7255). Are you suggesting we implement v3f16 there too?

arsenm added inline comments.Jul 13 2020, 10:56 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3405–3407

There is a v3f16 definition now, so you could. I think it's more important for globalisel to be more complete

rdomingu updated this revision to Diff 281734.Jul 29 2020, 2:02 PM

Updating D81172: [AMDGPU] Implement hardware bug workaround for image instructions

rdomingu updated this revision to Diff 281970.Jul 30 2020, 10:21 AM

Updating D81172: [AMDGPU] Implement hardware bug workaround for image instructions

rdomingu updated this revision to Diff 282042.Jul 30 2020, 2:09 PM

Updating D81172: [AMDGPU] Implement hardware bug workaround for image instructions

arsenm added inline comments.Aug 4 2020, 7:30 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3439

No else after return

3454

No else after return

3456

llvm_unreachable instead of assert(alse)

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
303

I know clang-format really wants to pack these onto a single line, but it's a terrible idea and you shouldn't listen to it

rdomingu updated this revision to Diff 284443.Aug 10 2020, 10:48 AM
rdomingu marked 4 inline comments as done.

Updating D81172: [AMDGPU] Implement hardware bug workaround for image instructions

arsenm added inline comments.Fri, Aug 28, 11:27 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3410

If you just resize to 8 below, why not use an array?

3411–3415

It would be preferable to emit a concat_vectors of <2 x s16> pieces here

3414

Resize here is weird. You can push back, or constructed PackedRegs to the desired size?

3420

Same as above

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7249

++I

rdomingu added inline comments.Thu, Sep 10, 6:55 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3411–3415

Sorry, I'm new to this. Why would concat_vectors be preferable than build_vector? Could you please elaborate?

arsenm added inline comments.Wed, Sep 16, 7:58 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3411–3415

Because a G_BUILD_VECTOR with 16-bit sources isn't naturally legal. This works, it just adds more work for the legalizer to reprocess these when you could produce something that's legal to begin with to save compile time

rdomingu added inline comments.Wed, Sep 16, 8:07 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3411–3415

I see. But how would you go from v3f16 to concat_vectors of <2 x 16> to v4f32 (which is what we want at the end)?

arsenm added inline comments.Wed, Sep 16, 8:16 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3411–3415

I think I'm missing something. Why is this going from <3 x s16> to <4 x s32>? Isn't this the unpacked layout case? Why isn't this just an G_ANYEXT from <3 x s16> to <3 x s32>?

3414

You can do this in the initial construction, also the small size should just be 8? Or use std::array?