Page MenuHomePhabricator

AMDGPU: Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON'
AcceptedPublic

Authored by Ravi on Oct 13 2021, 12:21 PM.

Details

Summary

Background: https://reviews.llvm.org/D74314

The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions. 'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk. The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand. The fix could have been done in one of 2 approaches.

Approaches:

  1. First approach is to add 'bias' operand index for each MIMG instruction in the table-gen generated image intrinsic info to later identify the 'bias' operand with it and fix it as 2 packed 16-bit operands with the upper 16-bit being undefined.
  2. The 'bias' operand is the only operand in the MIMG intrinsics that would be a 16-bit incoming operand below the index for gradients. The other image address operands 'offset' and 'z-compare' that come before gradients are always 32-bit irrespective of the A16-bit. The patch implements this logic.

Testing:
Multiple tests with A16 ON and OFF in both GlobalISel and ISelDAG path are checked and updated. Especially the SAMPLE and GATHER4 instruction tests. Few tests with
SAMPLE were missing in both the paths with A16 'OFF'. But the GATHER4 tests have covered this case. So no additional tests are added.
All the lit tests have passed.

Observations:

  1. The ISelDAG path generates the same code as earlier without any inefficiency. But the GlobalISel path adds explicit instructions to fill the upper 16-bit with junk. This has to be probably analysed as a new optimization issue and identify the path that's introducing them.
  2. Occuring with earlier code as well as with theis patch. The image resource constant(a group of 8 registers) are being copied from set of contiguous registers to another set of contiguous registers to take care of alignment. These copies can be avoided with a custom lowering of formal arguments. And the specific register info could be reported back to the driver/encoded in the code objects.

Diff Detail

Event Timeline

Ravi created this revision.Oct 13 2021, 12:21 PM
Ravi requested review of this revision.Oct 13 2021, 12:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2021, 12:21 PM
Ravi retitled this revision from Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON' to AMDGPU: Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON'.Oct 13 2021, 10:07 PM
Ravi edited the summary of this revision. (Show Details)
Ravi added a reviewer: sameerds.

This also needs fixing in the combiner (which was introduced in D85887). I.e. checking that the bias can be losslessly converted to f16 and converting it when converting the address.
(I guess this doesn’t need to be part of this change.)

arsenm accepted this revision.Oct 18 2021, 2:26 PM

LGTM although I'm not the most familiar person with images

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4245

Capitalize

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6227

Capitalize

This revision is now accepted and ready to land.Oct 18 2021, 2:26 PM
foad added inline comments.Oct 19 2021, 1:50 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.sample.a16.ll
9

Why has the indentation changed in almost every test?

foad added a comment.Oct 19 2021, 4:35 AM

But the GlobalISel path adds explicit instructions to fill the upper 16-bit with junk.

I think D112064 should clean this up.

foad added a comment.Wed, Nov 17, 3:36 AM

@Ravi are you still working on this?

Ravi added a comment.Wed, Nov 17, 9:13 PM

@Ravi are you still working on this?

I don't have commit permissions, so checked with @cdevadas and we decided to check-in after D112064 goes in. The suggestions from @arsenm are taken care and the indent changes have also been removed.

foad added a comment.Thu, Nov 18, 6:07 AM

we decided to check-in after D112064 goes in.

OK. I have updated D112064.