Let it work on a very small kernels only. Measurements showed
the performance benefit is not worth the compile time.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I still don't understand how shall it work with reg units, but even then it will not change the complexity of the algorithm, just a cost of a single pass
Both are options as well as to enable only at -O3. However, I still think we have not enough data. JBTW, what gfx benchmark suite tells us?
I feel we should probably discuss more about the optimal solution than the current one. I have a gut feeling that the problem pass is trying to solve is a flavor of graph coloring problem where we have to color the interference graph with K number of colors where K is the number of register banks. We may need to redo the interference graph in this pass and then attempt the coloring.
It would be much better to implement it inside the RA itself. But only if it is a real thing. The current pass can eliminate most of the conflicts. I yet want to see a confirmation it improves performance just anywhere and quantify it.
JBTW, what gfx benchmark suite tells us?
We don't have an easy way to run lots of benchmarks. I tried running GFXBench 5.0.0 on my own machine with GFX10 hardware, and I couldn't detect any difference from enabling/disabling this pass. The results seemed to be stable within about +/- 0.1%.