Page MenuHomePhabricator

[GlobalISel] Introduce global variant of regbankselect
Needs ReviewPublic

Authored by ehjogab on Oct 28 2020, 6:25 AM.

Details

Summary

This patch introduces a new option '--regbankselect-global' to the
register bank selector. The algorithm is based on dynamic programming
and works as follows:

  1. For each instruction, working top-down, compute cost of applying each InstructionMapping. If the result needs to be converted into some other register bank not supported by any of the InstructionMappings, compute the cost of introducing a copy.
  2. For each instruction, working bottom-up, select the InstructionMapping with the highest utilization score. The utilization score is the sum of the cost of all mappings that make use of this mapping.
  3. For each instruction, working top-down, apply the selected mapping.

Compared to the greedy version, which looks at and make decision for one
instruction at a time, this version operates at a global scope.

Diff Detail

Unit TestsFailed

TimeTest
380 msx64 debian > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
530 msx64 windows > lld.ELF/invalid::symtab-sh-info.s
Script: -- : 'RUN: at line 4'; c:\ws\w1\llvm-project\premerge-checks\build\bin\yaml2obj.exe --docnum=1 C:\ws\w1\llvm-project\premerge-checks\lld\test\ELF\invalid\symtab-sh-info.s -o C:\ws\w1\llvm-project\premerge-checks\build\tools\lld\test\ELF\invalid\Output\symtab-sh-info.s.tmp.o

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2020, 6:25 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
ehjogab requested review of this revision.Oct 28 2020, 6:25 AM
foad added a subscriber: foad.Oct 28 2020, 7:15 AM
arsenm added inline comments.Oct 28 2020, 10:58 AM
llvm/include/llvm/CodeGen/GlobalISel/RegBankSelect.h
103–105

I think calling this "optimal" is a bit overly optimistic. Maybe Global?

llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1603

!isPreIselOpcode should cover asm already?

1608

This name is a bit obnoxious. If it's really necessary, I would rather just add this helper directly to MachineOperand

1608–1610

Can do just .getReg().isVirtual()

1703–1704

This explicit empty check is redundant

1732

Should not construct a fresh MachineIRBuilder every time

1735–1736

Is this intended to allow applyMappingImpl to insert new instructions which will iteratively have banks assigned? If so I think having a feature disparity with the single pass path is a problem

2005

Comment is unnecessary

2007

Don't like this getGSIOfMI name

2102

This checks for multiple explicit defs, but then seems to assume only a single def

arsenm added inline comments.Oct 28 2020, 11:05 AM
llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1863

I think checking if this is virtual is redundant; any instruction subject to rgebank select should not have physical register operands

Thank you very much for your feedback, arsenm! Really appreciate it!

I noticed a crapload of lint warnings so I will fix them first, then fix your comments that I agree with, and then we can discuss further on those that I don't. =)

llvm/include/llvm/CodeGen/GlobalISel/RegBankSelect.h
103–105

With respect to the cost function, it should be optimal. I don't have a detailed proof yet, but the intuition is that it computes the cost of putting a certain register in a certain register bank, and by selecting the choice with lowest cost and working its way backwards to achieve that, it really should give the optimal solution. However, there is a stage in the algorithm where it may need to rematerialize the instruction (a certain situation when there exist multiple usages), and there I'm not sure whether optimality is kept...

llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1603

Don't know; I just copied what was already done in greedy, so you may be right...

1608

I agree, it's not a good name. I'll see if I can either get rid of this function, or else put it directly in MachineOperand.

1608–1610

You mean this works even if MO is not actually a register? I just assumed that you're not allowed to invoke getReg() unless you are certain that MO is really a register.

1703–1704

Yes, that's correct.

1732

I need to have a new position of insertion for each instruction, hence the creation of a new MachineIRBuilder each time. Is it possible to keep the same MachineIRBuilder and simply change the insertion point, without affecting already inserted instructions?

1735–1736

Are you saying that I should rewrite the code to make use of applyMappingImpl, such that both single- and multi-pass make use of it? Because currently the multi-pass doesn't use it at all.

1863

For uniformity, I decided to treat all instructions (except debug, which are skipped) equally in order to preserve the cost chain and register bank preferences. Hence this step will look at instructions which by themselves are not subject to regbankselect, but they may produce results which are used by instructions that are.

2007

What name would you have liked instead? =)

2102

Hm, true. The code below assumes that there's exactly one such def, and if there are multiple explicit defs it should crash as it cannot handle such cases at the moment.

ehjogab updated this revision to Diff 301518.Oct 28 2020, 11:01 PM

Fix lint warnings

ehjogab marked 9 inline comments as not done.Oct 28 2020, 11:06 PM
ehjogab added inline comments.
llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1608–1610

Just checked MachineOperand.h, and there's an assert that will fail if getReg() is invoked and isReg() does not hold.

ehjogab marked an inline comment as not done.Oct 28 2020, 11:08 PM
ehjogab added inline comments.
llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1608–1610

Argh, my mistake; I saw what you meant now...

ehjogab updated this revision to Diff 301521.Oct 28 2020, 11:28 PM

Fixes based on arsenm's comments

ehjogab marked 5 inline comments as done and an inline comment as not done.Oct 28 2020, 11:30 PM
ehjogab updated this revision to Diff 301522.Oct 28 2020, 11:42 PM

Fixed arsenm's comment on MachineIRBuilder

ehjogab marked an inline comment as done.Oct 28 2020, 11:43 PM
gargaroff added inline comments.Oct 30 2020, 2:54 AM
llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1885

This line is causing an assertion with some of our tests .ll tests as it tries to access the regbank of a physical register.

You can provoke the assertion in existing AArch64 tests by adding RegBankSelect::Optimal to the RegBankSelect constructor call in AArch64TargetMachine.cpp.

One of the tests triggering this for example is llvm/test/CodeGen/AArch64/GlobalISel/regbank-ceil.ll.

ehjogab updated this revision to Diff 302255.Nov 2 2020, 5:52 AM
  • fix bug when register is a physical register
  • fix lint warnings
ehjogab marked an inline comment as done.Nov 2 2020, 5:55 AM
ehjogab added inline comments.
llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1885

Ah, apparently getRegBankOrNull() fails when the register is not a virtual register. This has been fixed in the update. Thanks for spotting this!

ehjogab retitled this revision from [GlobalISel] Introduce optimal variant of regbankselect to [GlobalISel] Introduce global variant of regbankselect.Nov 2 2020, 6:08 AM
ehjogab edited the summary of this revision. (Show Details)
ehjogab updated this revision to Diff 302265.Nov 2 2020, 6:09 AM
ehjogab marked an inline comment as done.

replace 'optimal' with 'global'

ehjogab marked an inline comment as done.Nov 2 2020, 6:12 AM

So what happens now? Should it be merge as is, or does it need to be tested more stringently before that happens? Or are there any further changes that need to be made?

I haven't had the time to test this in the new state yet, but I would love to see this integrated. However I'm not in a position to review and approve this. Someone of the GlobalISel folks should do that. As for testing, ideally you could simply leverage the existing tests of the existing GlobalISel backends by enabling this algorithm conditionally. This also requires the approval of those backend maintainers however.

Maybe it would make sense to ping the mailing list again, explaining the current state, giving some numbers how this algorithm impacts compile time and code gen, and try to discuss some integration strategy.

I haven't had the time to test this in the new state yet, but I would love to see this integrated. However I'm not in a position to review and approve this. Someone of the GlobalISel folks should do that. As for testing, ideally you could simply leverage the existing tests of the existing GlobalISel backends by enabling this algorithm conditionally. This also requires the approval of those backend maintainers however.

Maybe it would make sense to ping the mailing list again, explaining the current state, giving some numbers how this algorithm impacts compile time and code gen, and try to discuss some integration strategy.

Thanks for your suggestions! I'll attend to them when my work pipeline has cleared up a bit.

How about a test-case with fast, greedy, and global to show the differences?

How about a test-case with fast, greedy, and global to show the differences?

That's a good idea! Unfortunately, I don't know of any such test case. I will consult the mailing list for this; I know there has been some discussion where users were not happy with what the greedy algorithm produced.

Sorry for being terse. What I meant to say, you could take an existing test case and show the effects of the different algorithms. There is probably an existing test case for the existing algorithms.

Sorry for being terse. What I meant to say, you could take an existing test case and show the effects of the different algorithms. There is probably an existing test case for the existing algorithms.

Indeed, there exists ample test cases for regbank-select (at least in some targets). However, so far the ones I have seen have been quite small and therefore would probably not benefit from running the global variant. It would be good to have a test case where it is known that the greedy algorithm chooses poorly and hence the global variant should show improvement, and I recall that there was mentioning of such a case in the mailing list. In lack of such input, I will look into the existing test cases and see what I can come up with.

This does of course not exclude the need for tests showing that the global variant produces the same results as the other algorithms when there's no better solution to be had.

ehjogab updated this revision to Diff 307321.Nov 24 2020, 5:50 AM

Rewrote code to make use of applyMapping(). Now fast, greedy, and global version
all use the same code for enforcing the selection of instruction mappings.

ehjogab marked an inline comment as done.Nov 24 2020, 5:52 AM
ehjogab added inline comments.
llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
1735–1736

Feature disparity addressed.

ehjogab updated this revision to Diff 307325.Nov 24 2020, 5:58 AM

Rearranged function declarations in RegBankSelect.cpp (NFC)

ehjogab updated this revision to Diff 307343.Nov 24 2020, 7:00 AM
  • Improved selection: Now choosing the mapping with the highest utility instead of lowest cost, which should direct optimization towards blocks with high frequency.
  • Reduce vector copying.
Harbormaster completed remote builds in B79935: Diff 307325.
Harbormaster completed remote builds in B79950: Diff 307344.
ehjogab updated this revision to Diff 307364.Nov 24 2020, 8:00 AM

Use MappingCost instead of int for representing costs.

ehjogab edited the summary of this revision. (Show Details)Nov 24 2020, 8:05 AM
ehjogab updated this revision to Diff 307796.Nov 26 2020, 2:09 AM

Bug fixes

ehjogab updated this revision to Diff 308263.Nov 29 2020, 10:53 PM

More bugfixes

ehjogab updated this revision to Diff 309166.Dec 2 2020, 11:43 PM
  • Directly apply mapping of there is only one. This leads to improved performance as well as improved code quality for targets that only output a single mapping that is context-dependent (now only failing 32 testcases when forcing greedy to use global instead).
  • Bug fix