This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
RegAllocFast.cpp
-
test/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
darwin-tls-call-clobber.ll
-
arm64-fast-isel-br.ll
-
arm64-fast-isel-call.ll
-
arm64-fast-isel-conversion-fallback.ll
-
arm64-fast-isel-conversion.ll
-
arm64-vcvt_f.ll
-
arm64_32-fastisel.ll
-
arm64_32-null.ll
-
br-cond-not-merge.ll
-
cmpxchg-O0.ll
-
combine-loads.ll
-
fast-isel-cmpxchg.ll
-
popcount.ll
-
swift-return.ll
-
swifterror.ll
-
unwind-preserved-from-mir.mir
-
unwind-preserved.ll
-
AMDGPU/
-
GlobalISel/
-
inline-asm.ll
-
control-flow-fastregalloc.ll
-
fast-ra-kills-vcc.mir
-
fastregalloc-illegal-subreg-physreg.mir
-
fastregalloc-self-loop-heuristic.mir
-
indirect-addressing-term.ll
-
mubuf-legalize-operands.ll
-
partial-sgpr-to-vgpr-spills.ll
-
reserve-vgpr-for-sgpr-spill.ll
-
spill-agpr.mir
-
spill-m0.ll
-
spill192.mir
-
unexpected-reg-unit-state.mir
-
wwm-reserved.ll
-
ARM/
-
2010-08-04-StackVariable.ll
-
Windows/
-
alloca.ll
-
cmpxchg-O0-be.ll
-
cmpxchg-O0.ll
-
crash-greedy-v6.ll
-
debug-info-blocks.ll
-
fast-isel-call.ll
-
fast-isel-intrinsic.ll
-
fast-isel-ldr-str-thumb-neg-index.ll
-
fast-isel-select.ll
-
fast-isel-vararg.ll
-
ldrd.ll
-
legalize-bitcast.ll
-
stack-guard-reassign.ll
-
swifterror.ll
-
thumb-big-stack.ll
-
Hexagon/vect/
-
vect/
-
vect-load-v4i16.ll
-
Mips/
-
Fast-ISel/
-
callabi.ll
-
memtest1.ll
-
pr40325.ll
-
GlobalISel/llvm-ir/
-
llvm-ir/
-
add.ll
-
add_vec.ll
-
aggregate_struct_return.ll
-
bitreverse.ll
-
bitwise.ll
-
branch.ll
-
brindirect.ll
-
bswap.ll
-
call.ll
-
ctlz.ll
-
ctpop.ll
-
cttz.ll
-
dyn_stackalloc.ll
-
fcmp.ll
-
float_constants.ll
-
fptosi_and_fptoui.ll
-
global_address.ll
-
global_address_pic.ll
-
icmp.ll
-
jump_table_and_brjt.ll
-
load_4_unaligned.ll
-
load_split_because_of_memsize_or_align.ll
-
long_ambiguous_chain_s32.ll
-
long_ambiguous_chain_s64.ll
-
mul.ll
-
mul_vec.ll
-
phi.ll
-
rem_and_div.ll
-
select.ll
-
sitofp_and_uitofp.ll
-
store_4_unaligned.ll
-
store_split_because_of_memsize_or_align.ll
-
sub.ll
-
sub_vec.ll
-
test_TypeInfoforMF.ll
-
var_arg.ll
-
zextLoad_and_sextLoad.ll
-
zext_and_sext.ll
-
atomic-min-max.ll
-
atomic.ll
-
atomic64.ll
-
atomicCmpSwapPW.ll
-
copy-fp64.ll
-
implicit-sret.ll
-
micromips-eva.mir
-
msa/
-
ldr_str.ll
-
PowerPC/
-
addegluecrash.ll
-
aggressive-anti-dep-breaker-subreg.ll
-
aix-overflow-toc.py
-
anon_aggr.ll
-
builtins-ppc-p10vsx.ll
-
elf-common.ll
-
fast-isel-pcrel.ll
-
fp-int128-fp-combine.ll
-
fp-strict-fcmp-noopt.ll
-
fp64-to-int16.ll
-
p9-vinsert-vextract.ll
-
popcount.ll
-
spill-nor0.ll
-
spill-nor0.mir
-
stack-guard-reassign.ll
-
vsx-args.ll
-
vsx.ll
-
SPARC/
-
fp16-promote.ll
-
SystemZ/
-
swift-return.ll
-
swifterror.ll
-
Thumb2/
-
LowOverheadLoops/
-
branch-targets.ll
-
high-reg-spill.mir
-
mve-vector-spill.ll
-
X86/
-
2009-04-14-IllegalRegs.ll
-
2010-06-28-FastAllocTiedOperand.ll
-
2013-10-14-FastISel-incorrect-vreg.ll
-
atomic-monotonic.ll
-
atomic-unordered.ll
-
atomic32.ll
-
atomic64.ll
-
atomic6432.ll
-
avx-load-store.ll
-
avx512-mask-zext-bugfix.ll
-
bug47278-eflags-error.mir
-
bug47278.mir
-
crash-O0.ll
-
extend-set-cc-uses-dbg.ll
-
fast-isel-cmp-branch.ll
-
fast-isel-nontemporal.ll
-
fast-isel-select-sse.ll
-
fast-isel-select.ll
-
fast-isel-x86-64.ll
-
mixed-ptr-sizes-i686.ll
-
mixed-ptr-sizes.ll
-
phys-reg-local-regalloc.ll
-
pr11415.ll
-
pr1489.ll
-
pr27591.ll
-
pr30430.ll
-
pr30813.ll
-
pr32241.ll
-
pr32284.ll
-
pr32340.ll
-
pr32345.ll
-
pr32451.ll
-
pr32484.ll
-
pr34592.ll
-
pr34653.ll
-
pr39733.ll
-
pr42452.ll
-
pr44749.ll
-
pr47000.ll
-
regalloc-fast-missing-live-out-spill.mir
-
stack-protector-msvc.ll
-
stack-protector-strong-macho-win32-xor.ll
-
swift-return.ll
-
swifterror.ll
-
volatile.ll
-
win64_eh.ll
-
x86-32-intrcc.ll
-
x86-64-intrcc.ll
-
DebugInfo/
-
AArch64/
-
frameindices.ll
-
prologue_end.ll
-
ARM/
-
prologue_end.ll
-
Mips/
-
delay-slot.ll
-
prologue_end.ll
-
X86/
-
dbg-declare-arg.ll
-
fission-ranges.ll
-
op_deref.ll
-
parameters.ll
-
pieces-1.ll
-
prologue-stack.ll
-
reference-argument.ll
-
spill-indirect-nrvo.ll
-
sret.ll
-
subreg.ll

Differential D52010

RegAllocFast: Rewrite and improve
ClosedPublic

Authored by arsenm on Sep 12 2018, 7:29 PM.

Download Raw Diff

Details

Reviewers

qcolombet
wmi
kparzysz
myatsina
javed.absar
atanasyan
smaksimovic
petarj
abeserminji
MatzeB

Summary

This rewrites big parts of the fast register allocator. The basic strategy of
doing block-local allocation hasn't changed but I tweaked several details:

Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics).
Check register mask operands (calls) instead of conservatively assuming everything is clobbered.
Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition.

When testing this on the full llvm test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test)
0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count)

I'm still running benchmarks and I need to fix 120 tests with hardcoded registers (ugh)...

Diff Detail

Event Timeline

MatzeB created this revision.Sep 12 2018, 7:29 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 12 2018, 7:29 PM

Herald added subscribers: kristof.beyls, mcrosier. · View Herald Transcript

arsenm added a subscriber: arsenm.Sep 13 2018, 2:42 AM

arsenm added inline comments.

lib/CodeGen/RegAllocFast.cpp
164–169 ↗	(On Diff #165193)	Should comment meaning / reason for these values?
291–297 ↗	(On Diff #165193)	Isn't this SkipPHIsAndLabels without the handling of isBasicBlockPrologue?
449 ↗	(On Diff #165193)	!VRegDef
460 ↗	(On Diff #165193)	static unnecessary?
502–503 ↗	(On Diff #165193)	return after else

Hi Matthias,

Thanks for modernizing this allocator.

Could you split the patch to make the review easier?

You mentioned several points in your description, if at all possible, it would be nice to have one patch for each.

Cheers,
-Quentin

In D52010#1233619, @qcolombet wrote:

Hi Matthias,

Thanks for modernizing this allocator.

Could you split the patch to make the review easier?

You mentioned several points in your description, if at all possible, it would be nice to have one patch for each.

Cheers,
-Quentin

I can try to do that for review purposes... But some things depend on each other so the intermediate steps are probably not going to be functional...

I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling

I see a few different assertions in tests with this patch. Did you see these before or are they regressions?

In a few ARM tests during ARMExpandPseudo:

Assertion failed: (!(isKill && isDef) && "Kill flag on def"), function CreateReg, file ../include/llvm/CodeGen/MachineOperand.h, line 76

In one test:
Assertion failed: ((LRI->LastUse == &MI || MO.isTied() || !isRegUsedInInstr(LRI->PhysReg)) && "assigned reg free"), function useVirtReg, file ../lib/CodeGen/RegAllocFast.cpp, line 723.

In a few different targets' tests:
Assertion failed: (MBB != &MBB->getParent()->front() && "no reload in start block"), function reloadAtBegin, file ../lib/CodeGen/RegAllocFast.cpp, line 320.

I am slowly adatpting all the tests to the changes here; I have a commit fixing ~100 of them but there is still maybe 30-40 to go.
I believe I have the first two asserts fixed locally here.
The 3rd assert is picking up invalid MIR (a value is used before it is defined), my local version now has a commandline flag that allows us to disable that assert for the couple tests that show invalid MIR (planning to file PRs for the target owners to fix, so we can remove the commandline flag once the last test is fixed).

In D52010#1273280, @arsenm wrote:

I see a few different assertions in tests with this patch. Did you see these before or are they regressions?

In a few ARM tests during ARMExpandPseudo:

Assertion failed: (!(isKill && isDef) && "Kill flag on def"), function CreateReg, file ../include/llvm/CodeGen/MachineOperand.h, line 76

In one test:
Assertion failed: ((LRI->LastUse == &MI || MO.isTied() || !isRegUsedInInstr(LRI->PhysReg)) && "assigned reg free"), function useVirtReg, file ../lib/CodeGen/RegAllocFast.cpp, line 723.

In a few different targets' tests:
Assertion failed: (MBB != &MBB->getParent()->front() && "no reload in start block"), function reloadAtBegin, file ../lib/CodeGen/RegAllocFast.cpp, line 320.

xbolva00 added a subscriber: xbolva00.Oct 23 2018, 2:25 PM

Can you post the updated version?

Updated version

I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling

I'm not sure what exactly you mean by that. But just as a warning: This patch will lead to more kill flags being added than before, but it does not guarantee that every last use has a kill flag (especially vregs live across blocks will still lack kill flags).

In D52010#1273686, @MatzeB wrote:

I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling

I'm not sure what exactly you mean by that. But just as a warning: This patch will lead to more kill flags being added than before, but it does not guarantee that every last use has a kill flag (especially vregs live across blocks will still lack kill flags).

This patch avoids a problem I'm having applying my partial allocation patch to RegAllocFast. It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def

It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def

You mean for a pre-assigned physreg that looks something like this:
bb.0:

use $somereg ...

...

use $somereg ...

we add a kill flag on the first use? Will see if I figure this out, I'm surprised this didn't hit me in unittests then...

Adapted all the unit tests
Slight tuning of register hinting code
Improved allocation heuristic for livethrough and earlyclobber registers, allowing us to allocate more inline assembly constructs consistently (rather than the previous allocator doing it by luck).

This passes ninja check-llvm for the first time.

Next steps:

Code cleanup
Splitting into multiple commits
Re-run test-suite benchmarks to gather correctness, compiletime, code size, and execution-speed metrics

Herald added subscribers: jsji, jfb, arphaman and 7 others. · View Herald TranscriptNov 6 2018, 3:07 PM

In D52010#1289522, @MatzeB wrote:
It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def

You mean for a pre-assigned physreg that looks something like this:
bb.0:
use $somereg ...
...
use $somereg ...
we add a kill flag on the first use? Will see if I figure this out, I'm surprised this didn't hit me in unittests then...

Yes. It might have specifically been when it was a live-in. I don't see it with this patch, so I didn't spend too much time trying to understand why

@mips target owners: Please take a look at the unittests where I had to use -rafast-ignore-missing-defs to workaround invalid machine code. Those tests use a virtual register before it is defined (the old regalloc fast used to just ignore this in some cases).

@abeserminji: Aleksandar, could you take a look at the RegAllocFast::reloadAtBegin? It looks like a necessity of using the rafast-ignore-missing-defs option is caused by the rL336328 commit.

In D52010#1289784, @atanasyan wrote:

@abeserminji: Aleksandar, could you take a look at the RegAllocFast::reloadAtBegin? It looks like a necessity of using the rafast-ignore-missing-defs option is caused by the rL336328 commit.

BTW: You don't even need to apply this patch or look at any of the changes here. Just the dump the machine code of the tests before regalloc might be enough to see which vreg has no definition.

I extracted a couple of obvious cleanups to reduce the code churn here. They landed in rL346287, rL346288, rL346289, rL346296, rL346297, rL346298
(will update and split the diff here soon)

petpav01 mentioned this in D49364: [ARM] Add support for spilling high registers in Thumb1.Nov 7 2018, 5:59 AM

MatzeB mentioned this in D54364: RegAllocFast: Do not allocate registers for undef uses.Nov 9 2018, 4:27 PM

MatzeB mentioned this in D54365: RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs.Nov 9 2018, 4:31 PM

MatzeB mentioned this in D54366: RegAllocFast: Add heuristic to detect values not live-out of a block.

MatzeB mentioned this in D54367: RegAllocFast: Improve hinting heuristic.Nov 9 2018, 4:35 PM

MatzeB mentioned this in D54368: RegAllocFast: Record internal state based on register units.Nov 9 2018, 4:38 PM

I split up two more NFC commits: rL346575 and rL346576.
I split several step by step patches to ease review (note that I did not bother to fix hundreds of testcases for the intermediate steps, so I will not commit them independently): D54364, D54365, D54366, D54367, D54368

Simpler patch now that more things have been split up into separate review and NFC patches.

Some notes on what is left here:

Change allocation direction to go from end to begin of a basic block.
Change debug value handling: The tricky thing with the changed direction is that we dealing with dangling DBG_VALUE instructions (they have a vreg use after the last non-debug use of a vreg) gets trickier. They are rare but occur in several unittests so we now track them separately in the code and perform a forward walk each time after the vreg is allocation by the real last use to see whether the register reached the dangling DBG_VALUE.
This also simplifies DBG_VALUE handling, to always referenvce stack slots in case of spills as we now place spills immediately adjacent to the definition of a value/register.
Generally rewrote allocateInstruction to be hopefolly somewhat simpler
Added some heuristic for tricky inline assembly early clobber situations: If we detect a tricky situation now, we will take all def and early-clobber like operands and sort them by the size of the register class so that we can assign tricky cases more consistently and are more likely to be successfull (this was necessary to fix some unittests that were lucky with the old allocation algorithm but unlucky with the new one. With the new heuristic we should be less dependent on luck).

From what I found in rL336328, I believe that the idea was to mark registers as implicit-def which would prevent such an error.
One of the comments say:

// The scratch registers here with the EarlyClobber | Define | Implicit
// flags is used to persuade the register allocator and the machine
// verifier to accept the usage of this register.

When I do --print-before=regallocfast I get these flags included:

...implicit-def dead early-clobber %10:gpr32

But I am not sure if that's enough, or if the new implementation is taking this into account.

arsenm added inline comments.Nov 16 2018, 12:04 PM

lib/CodeGen/RegAllocFast.cpp
480–483 ↗	(On Diff #173474)	This is dead code that also doesn't compile for me

ping, I'm depending on this set of patches for D55301

The asm clobber logic is failing for me on test/CodeGen/X86/2008-09-18-inline-asm-2.ll. It fails to allocate, and then the pass fails to not assert in the case of the reported out of registers error.

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2019, 6:28 PM

Herald added a subscriber: jdoerfert. · View Herald Transcript

arsenm added inline comments.Mar 21 2019, 11:39 AM

lib/CodeGen/RegAllocFast.cpp
1019–1020 ↗	(On Diff #173474)	I was able to fix the failure I see by refining this to the number of defs of a class that may alias the current register's class, though it seems a bit overkill

arsenm added inline comments.Mar 21 2019, 1:50 PM

lib/CodeGen/RegAllocFast.cpp
390–393 ↗	(On Diff #173474)	This is problematic for AMDGPU. The order of the spills matters in the case the registers used are needed to restore the exec mask in the target block. Additionally, getMBBBeginInsertionPoint should be using SkipPHIsAndLabels so that isBasicBlockPrologue is respected. Because of this the updates to test/CodeGen/AMDGPU/control-flow-fastregalloc.ll are wrong

arsenm mentioned this in rL360887: RegAllocFast: Improve hinting heuristic.May 16 2019, 5:48 AM

arsenm mentioned this in rG828b685ebe3f: RegAllocFast: Improve hinting heuristic.

sidorovd mentioned this in rG8a5ad76dc2ce: RegAllocFast: Improve hinting heuristic.May 30 2019, 10:30 AM

arsenm mentioned this in D55301: RegAlloc: Allow targets to split register allocation.Apr 20 2020, 3:25 PM

I've scanned through the patch and it looks sensible.

@arsenm I let you fix the SkipPHIAndLabel issue and move forward with the patch.

Herald added subscribers: kerbowa, • wuzish. · View Herald TranscriptMay 4 2020, 10:19 AM

pratlucas added a subscriber: pratlucas.May 14 2020, 7:02 AM

arsenm mentioned this in D80999: [ARM][CodeGen] Enabling spilling of high registers in RegAllocFast for Thumb1.Jun 4 2020, 12:00 PM

arsenm mentioned this in D86888: RegAllocFast: Make self loop live-out heuristic more aggressive.Aug 31 2020, 12:40 PM

arsenm commandeered this revision.Sep 11 2020, 2:35 PM

arsenm added a reviewer: MatzeB.

Herald added a subscriber: wdng. · View Herald TranscriptSep 11 2020, 2:35 PM

Rebase and fix AMDGPU failures

Herald added subscribers: dmgreen, fedor.sergeev, kbarton and 2 others. · View Herald TranscriptSep 11 2020, 2:36 PM

foad added a subscriber: foad.Sep 14 2020, 9:50 AM

Fix verifier error from live out eflags encountered when looking at the testcase from bug 47278. There was an if 0'd out section for dealing with live-ins. Add all live out registers as pre-assigned. Previously allocatable live out physregs would be incorrectly killed, and non-allocatable registers would hit a verifier error

RKSimon added a subscriber: RKSimon.Sep 14 2020, 12:06 PM

Add a few tests which were failing before this is applied

qcolombet accepted this revision.Sep 14 2020, 2:22 PM

This revision is now accepted and ready to land.Sep 14 2020, 2:22 PM

With this patch applied I'm getting:

$ ~/llvm-debug/bin/llc -march=amdgcn -o /dev/null -regalloc=fast x.ll
llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/RegAllocFast.cpp:876: void (anonymous namespace)::RegAllocFast::useVirtReg(llvm::MachineInstr &, unsigned int, llvm::Register): Assertion `!MO.isTied() && "tied op should be allocated"' failed.

x.ll1 KBDownload

hans mentioned this in rGa21387c65470: Revert "RegAllocFast: Record internal state based on register units".Sep 15 2020, 4:29 AM

Fix not dropping subreg indexes from tied operands

arsenm added parent revisions: D87542: AMDGPU: Don't sometimes allow instructions before lowered si_end_cf, D87472: AMDGPU: Clear offset register when using local stack area, D86888: RegAllocFast: Make self loop live-out heuristic more aggressive.Sep 15 2020, 7:04 AM

arsenm mentioned this in rG738c73a45488: RegAllocFast: Make self loop live-out heuristic more aggressive.Sep 16 2020, 10:12 AM

arsenm mentioned this in rG870fd53e4f63: Reapply "RegAllocFast: Record internal state based on register units".Sep 18 2020, 11:05 AM

c8757ff3aa7dd7a25a6343f6ef74a70c7be04325

vitalybuka mentioned this in rG3ab118a57d38: [NFC][Asan] Fix test broken by RegAllocFast.Sep 18 2020, 4:50 PM

arsenm mentioned this in D88440: RegAllocFast: Add extra DBG_VALUE for live out spills.Sep 28 2020, 12:37 PM

arsenm mentioned this in rGa66fca44ac92: RegAllocFast: Add extra DBG_VALUE for live out spills.Sep 30 2020, 7:35 AM

pratlucas mentioned this in D96335: [NFC] Adding test for clobbering of high registers in Thumb.Feb 9 2021, 6:13 AM

pratlucas mentioned this in D81942: [NFC][CodeGen] Refactor: splitting register allocation method in RegAllocFast.Feb 9 2021, 6:16 AM