This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add support for a16 modifier for gfx9
ClosedPublic

Authored by rtaylor on Aug 10 2018, 11:05 AM.

Details

Summary

Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9.

Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841

Diff Detail

Repository
rL LLVM

Event Timeline

rtaylor created this revision.Aug 10 2018, 11:05 AM
rtaylor retitled this revision from [AMDGPU] Add support for a16 modifiear for gfx9 to [AMDGPU] Add support for a16 modifier for gfx9.
arsenm added inline comments.Aug 10 2018, 12:14 PM
lib/Target/AMDGPU/AMDGPU.td
245–246 ↗(On Diff #160149)

Feature name doesn't match, r128-a16?

248 ↗(On Diff #160149)

Should be more specific about what kind of address

lib/Target/AMDGPU/SIISelLowering.cpp
4613–4614 ↗(On Diff #160149)

Capitalize and punctuate

4618 ↗(On Diff #160149)

This should be checking the subtarget feature you added?

4620 ↗(On Diff #160149)

Why is this clearing?

4621 ↗(On Diff #160149)

Capitalize etc.

4624 ↗(On Diff #160149)

Capitalize etc.

4628–4630 ↗(On Diff #160149)

This is a SCALAR_TO_VECTOR

4631 ↗(On Diff #160149)

Why this bitcast, and why f32? If we're coercing vector types i32 would be correct

4648 ↗(On Diff #160149)

I don't understand the overall flow of all of these sections. There 3 nearly identical looking parts for the different NumGradients, and I'm further confused by the manipulation of i inside the loop

4660–4662 ↗(On Diff #160149)

Ditto

rtaylor added inline comments.Aug 13 2018, 6:06 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4618 ↗(On Diff #160149)

Yes, that makes sense.

4620 ↗(On Diff #160149)

VAddrs is set above, the L to LZ mapping needs VAddrs to find _L to convert to _LZ, so this is building the packed version of VAddrs, so clearing it first. Actually, a better reasoning is that the lod could be the high 16 or low 16 depending on the number of coordinates, so I could remove this clear, do the non-a16 packing as an 'else' to this 'if' and then do the _L to _LZ after. If the lod is high 16 it will just be ignored anyway since it's a don't care. If it's in low 16 (NumCoords+LodOrClampOrMip is odd), high 16 is guaranteed to be don't care since lod is always last and then we can just pop it. I will implement this.

4628–4630 ↗(On Diff #160149)

Ok.

4631 ↗(On Diff #160149)

Ok.

4648 ↗(On Diff #160149)

Fair enough, this could probably be consolidated, there is some redundancy, I suppose I could make these functions and call them and pass them, though that seems excessive also. The way the address components are packed differs depending on the dimension and the ordering makes it a bit difficult to cleanly do this. I could also do short sections with long conditions.

1D condition packs an undef with dx/dh and an undef with dy/dh.
2D condition packs dx/dh and dy/dh together and dx/dt and dy/dt together.
3D condition does 2D condition but packs an undef with dz/dh and an undef with dz/dt.

The ordering all out would be dxdh, dydh, dzdh, dxdt, dydt, dzdt.

rtaylor added inline comments.Aug 13 2018, 4:33 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4648 ↗(On Diff #160149)

Fair enough, this could probably be consolidated, there is some redundancy, I suppose I could make these functions and call them and pass them, though that seems excessive also. The way the address components are packed differs depending on the dimension and the ordering makes it a bit difficult to cleanly do this. I could also do short sections with long conditions.

1D condition packs an undef with dx/dh and an undef with dx/dt.
2D condition packs dx/dh and dy/dh together and dx/dt and dy/dt together.
3D condition does 2D condition but packs an undef with dz/dh and an undef with dz/dt.

The ordering all out would be dxdh, dydh, dzdh, dxdt, dydt, dzdt.

rtaylor updated this revision to Diff 160612.Aug 14 2018, 9:25 AM

Made changes according to suggestions. Reduced redundancy and simplified code structure.

The operand name / encoding bits in TableGen should really be changed as well to indicate the overload between R128 and A16.

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
294 ↗(On Diff #160612)

Name of the function should be adjusted as well.

lib/Target/AMDGPU/SIISelLowering.cpp
4625–4626 ↗(On Diff #160612)

Please check formatting with clang-format-diff, this looks wrong.

4633–4634 ↗(On Diff #160612)

Here as well.

4637 ↗(On Diff #160612)

Should be dz/dv instead of dz/dt, and dx/dv instead of dy/dh.

4639–4641 ↗(On Diff #160612)

I think the second part of the condition would be easier to follow as (NumGradients / 2) % 2 == 1 && (i == DimIdx + (NumGradients / 2) - 1 || i == DimIdx + NumGradients - 1) ("the number of texture coordinates relevant for the gradients is odd, and the current index is the last gradient with respect to dh or with respect to dv").

rtaylor updated this revision to Diff 162294.Aug 23 2018, 4:12 PM

Made suggested changes.

rtaylor updated this revision to Diff 162685.Aug 27 2018, 8:22 AM

Realized that I had updated clang-format runs of of SIISelLowering.h/.cpp and this caused a lot of re-formatting.

nhaehnle accepted this revision.Aug 28 2018, 3:10 AM

Thanks for the changes.

Sorry to be a stickler about this, but the formatting of the odd gradients check in lowerImage is still wrong, please adjust this according to clang-format-diff before you commit.

No need to re-upload here, the rest of the changes looks good to me.

This revision is now accepted and ready to land.Aug 28 2018, 3:10 AM
This revision was automatically updated to reflect the committed changes.

Thanks for the changes.

Sorry to be a stickler about this, but the formatting of the odd gradients check in lowerImage is still wrong, please adjust this according to clang-format-diff before you commit.

No need to re-upload here, the rest of the changes looks good to me.

No problem, I understand. I thought I had copied from the previous clang-format but I had copied from an earlier version, thanks for catching it.

It is interesting to note though that clang-format changed so much in this file.

It is interesting to note though that clang-format changed so much in this file.

Yes, there are unfortunately parts of LLVM that don't quite follow the coding convention. clang-format-diff is usually pretty good at limiting its changes to whatever you actually changed in the diff (I often use it as git diff | clang-format-diff -p1), but if your changes happen to be next to a large section that doesn't follow the coding convention, even that tool can be a bit overzealous in the changes it suggests.