Download Raw Diff

Details

Reviewers

Group Reviewers

Restricted Project

Commits

rG6821a3ccd69f: [AMDGPU] Add attribute for target loop unroll threshold default

Summary

Amend the loop unroll thresholds for PAL shaders to be more aggressive.
This gives an overall performance benefit on a representative sample
of shaders.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 41124
Build 41295: arc lint + arc unit

Event Timeline

timcorringham created this revision.Oct 11 2019, 9:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2019, 9:16 AM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 8 others. · View Herald Transcript

Harbormaster completed remote builds in B39413: Diff 224612.Oct 11 2019, 9:19 AM

timcorringham added a reviewer: Restricted Project.Oct 11 2019, 9:21 AM

Could use a test

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
93–94	This would now be dead
101	These should probably be the same for all OSes
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
52 ↗	(On Diff #224612)	You don't need to add this field. You already have the subtarget available here, you just need to change the type

How big was the performance testing?

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
62	This change penalizes loops which should have unroll boosted instead. Your new default thresholds are now higher than boosted.

This revision now requires changes to proceed.Oct 11 2019, 11:05 AM

nhaehnle added inline comments.Oct 14 2019, 9:36 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
62	I see now change here. Is something weird going on with the diff?

Changes to address review comments

timcorringham added inline comments.Oct 17 2019, 7:34 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
62	I now initialise ThresholdLocal to be the max of UnrollThresholdLocal and UP.Threshold., so the value used will only be increased for PAL.
93–94	This value is still used if the OS isn't PAL.
101	That is quite possible, but I don't have tests to confirm that for all OSes. Since the effect of crossing a cliff-edge can be significant (good or bad) I don't want to risk making that change for OSes without performance figures to justify it.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
52 ↗	(On Diff #224612)	Good point. I didn't pay enough attention when I resolved the merge of my code - fixed.

Harbormaster completed remote builds in B39731: Diff 225431.Oct 17 2019, 7:39 AM

I disagree to the idea of having different thresholds based on the runtime. A runtime has nothing to do with it. For example compute can work on top of ROCm or PAL. Can you justify different results for the same programs?

I understand that you have some code or codes which benefit from a specific threshold. I suggest you to analyze these codes, understand and explain the performance gain root cause, then create a new heuristic in this function. That is why this function exists in the first place. It will also allow you to create a testcase.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
62	It still does not make sense. You are initializing general threshold higher (1100) than boosted (1000).

In addition it can be unrelated to the threshold at all. It may be a flaw in the cost model for specific instructions. Please also see D68881 which started to address cost model issues.

Function attribute for loop unroll threshold default

Changed approach for this in response to review comments. The new change is less invasive
and avoids OS/environment specific behavior in the target specific heuristics.
It allows the front-end to provide a minimum value for the loop unroll threshold on a
per-function basis, while still allowing the heuristics to adjust that threshold.

Harbormaster completed remote builds in B41124: Diff 229867.Nov 18 2019, 9:20 AM

This looks acceptable, but needs test.

Added test to confirm that the amdgpu-unroll-threshold attribute has the expected effect.

Herald added a subscriber: zzheng. · View Herald TranscriptNov 20 2019, 7:23 AM

Harbormaster completed remote builds in B41245: Diff 230262.Nov 20 2019, 7:28 AM

LGTM

This revision is now accepted and ready to land.Nov 20 2019, 11:41 AM

Closed by commit rG6821a3ccd69f: [AMDGPU] Add attribute for target loop unroll threshold default (authored by timcorringham). · Explain WhyNov 21 2019, 2:00 AM

This revision was automatically updated to reflect the committed changes.

Meinersbur mentioned this in D88215: Add llvm.loop.unroll.threshold metadata.Oct 20 2020, 6:55 PM