This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add MIMG NSA threshold configuration attribute
ClosedPublic

Authored by critson on Sep 27 2022, 6:31 PM.

Details

Summary

Make MIMG NSA minimum addresses threshold an attribute that can
be set on a function or configured via command line.
This enables frontend tuning which allows increased NSA usage
where beneficial.

Diff Detail

Event Timeline

critson created this revision.Sep 27 2022, 6:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2022, 6:31 PM
critson requested review of this revision.Sep 27 2022, 6:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2022, 6:31 PM
foad accepted this revision.Sep 28 2022, 1:39 AM

Seems reasonable.

I would also be in favour of changing the default to 2. That would tend to introduce fewer "mov"s to shuffle data around, at the expense of using a larger encoding on average for mimg instructions. But I think that is a good trade-off to make for the sake of performance.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
959

Use max()

963

Use max()

This revision is now accepted and ready to land.Sep 28 2022, 1:39 AM

Seems reasonable.

I would also be in favour of changing the default to 2. That would tend to introduce fewer "mov"s to shuffle data around, at the expense of using a larger encoding on average for mimg instructions. But I think that is a good trade-off to make for the sake of performance.

Thank you for the quick review!
I did test the idea of using default 2, but I am not convinced yet that the benefit is universal enough.
Out of ~20000 pipelines, ~2000 had higher VGPR usage with threshold 2 and ~1000 had lower VGPR usage.
When VGPR usage decreases, it does so more beneficial ways though.
Spilling decreased in 2 pipelines, and only increased in 1.

foad added a comment.Sep 28 2022, 3:10 AM

Out of ~20000 pipelines, ~2000 had higher VGPR usage with threshold 2 and ~1000 had lower VGPR usage.

That's weird. I can't see why enabling NSA would consistently cause higher vgpr usage.

critson updated this revision to Diff 463490.Sep 28 2022, 3:39 AM
critson marked 2 inline comments as done.
  • Address reviewer feedback
foad accepted this revision.Sep 28 2022, 3:43 AM

LGTM, thanks.

This revision was landed with ongoing or failed builds.Sep 28 2022, 4:04 AM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Sep 28 2022, 5:34 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
59

On the "don't repeat yourself" principle you could either remove the cl::init here...

959

I don't think you need the .getValue().

965

... or return NSAThreshold here.

foad added a comment.Sep 28 2022, 9:35 AM

Out of ~20000 pipelines, ~2000 had higher VGPR usage with threshold 2 and ~1000 had lower VGPR usage.

That's weird. I can't see why enabling NSA would consistently cause higher vgpr usage.

FYI I looked at one case where this happens and it was caused by GCNNSAReassign making strange (well, different) decisions. So now I need to try to understand what that pass does.

Out of ~20000 pipelines, ~2000 had higher VGPR usage with threshold 2 and ~1000 had lower VGPR usage.

That's weird. I can't see why enabling NSA would consistently cause higher vgpr usage.

FYI I looked at one case where this happens and it was caused by GCNNSAReassign making strange (well, different) decisions. So now I need to try to understand what that pass does.

Yes, I think we probably need to revisit the logic in GCNNSAReassign.
I seem to remember it lack some support for subregisters which might be part of the issue.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
59

True, it would be nice to have this number only once.

959

Pretty sure it is required, because std:max does not know how to handle cl::opt.