This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower regbanks reassign threshold to 15000
ClosedPublic

Authored by rampitec on Apr 20 2021, 4:00 PM.

Details

Summary

Let it work on a very small kernels only. Measurements showed
the performance benefit is not worth the compile time.

Diff Detail

Event Timeline

rampitec created this revision.Apr 20 2021, 4:00 PM
rampitec requested review of this revision.Apr 20 2021, 4:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2021, 4:00 PM
Herald added a subscriber: wdng. · View Herald Transcript
This revision is now accepted and ready to land.Apr 20 2021, 4:49 PM

Have you looked into changing this to operate on regunits?

Have you looked into changing this to operate on regunits?

I still don't understand how shall it work with reg units, but even then it will not change the complexity of the algorithm, just a cost of a single pass

foad added a comment.Apr 21 2021, 2:45 AM

Is there a measurable benefit on any kernels?

Is there a measurable benefit on any kernels?

Not that I know.

This revision was landed with ongoing or failed builds.Apr 21 2021, 8:34 AM
This revision was automatically updated to reflect the committed changes.
foad added a comment.Apr 22 2021, 1:59 AM

Is there a measurable benefit on any kernels?

Not that I know.

Then maybe delete the pass, or at least disable it by default?

Is there a measurable benefit on any kernels?

Not that I know.

Then maybe delete the pass, or at least disable it by default?

Both are options as well as to enable only at -O3. However, I still think we have not enough data. JBTW, what gfx benchmark suite tells us?

I feel we should probably discuss more about the optimal solution than the current one. I have a gut feeling that the problem pass is trying to solve is a flavor of graph coloring problem where we have to color the interference graph with K number of colors where K is the number of register banks. We may need to redo the interference graph in this pass and then attempt the coloring.

I feel we should probably discuss more about the optimal solution than the current one. I have a gut feeling that the problem pass is trying to solve is a flavor of graph coloring problem where we have to color the interference graph with K number of colors where K is the number of register banks. We may need to redo the interference graph in this pass and then attempt the coloring.

It would be much better to implement it inside the RA itself. But only if it is a real thing. The current pass can eliminate most of the conflicts. I yet want to see a confirmation it improves performance just anywhere and quantify it.

foad added a comment.Apr 23 2021, 7:31 AM

JBTW, what gfx benchmark suite tells us?

We don't have an easy way to run lots of benchmarks. I tried running GFXBench 5.0.0 on my own machine with GFX10 hardware, and I couldn't detect any difference from enabling/disabling this pass. The results seemed to be stable within about +/- 0.1%.

JBTW, what gfx benchmark suite tells us?

We don't have an easy way to run lots of benchmarks. I tried running GFXBench 5.0.0 on my own machine with GFX10 hardware, and I couldn't detect any difference from enabling/disabling this pass. The results seemed to be stable within about +/- 0.1%.

Maybe we really should just drop it then.