Page MenuHomePhabricator

[PostRASched] Breaking More CriticalAntiDeps
Needs ReviewPublic

Authored by arangasa on Jan 6 2021, 11:47 PM.



The current version of CriticalAntiDepBreaker makes conservative decisions:

#1 If we see a definition of a register r, we set Classes[SuperReg(r)] = -1 in scan function conservatively. Later, in pre-scan function, for a register operand r of the current instruction, if Classes[a] is non-null, where a is an alias of r, then we set Classes[r] = -1, and prevent r getting renamed.

#2 We also initialize all aliases of live-out registers to be alive, even when they are never used in the program. This may prevent renaming of registers that overlap with such aliases.

This may result in the following missed opportunities: Assume r0 and r1 are subregisters of a super-register R0.

Case1: We may only rename the last definition of a register r0, which has a super-register.
Because its super-register will be marked as live after processing r0’s definition.
This will stop r0 getting renamed later, till R0 is really defined (when Classes[R0] will be set to nullptr).
Case 2: if r0 is defined after r1, then we can’t rename even the last definition of r1.
Because r0’s definition marks Classes[R0] = -1, which will prevent both r0 and r1 getting renamed.
Case 3: If r0 is live-in and live-out, and not modified in a basic block, and r1 participates in a CriticalAntiDependence, then that can’t be broken.
Because we initialize all super-regs of r0 (liveout) as live.
Case 4: We can’t use R0 for renaming a live-range that does not overlap with another subsequent live-range involving r0.

Case 1 is exercised, for instance, in v16f32_two_step function of llvm/test/CodeGen/X86/recip-fastmath.ll - The following anti-dep involving $ymm2 is not broken.

renamable $ymm0 = nnan ninf nsz arcp contract afn reassoc nofpexcept VFNMADDPS4Yrr killed renamable $ymm2, killed renamable $ymm0, renamable $ymm2, implicit $mxcsr //broken with ymm5 in this patch
renamable $ymm2 = nnan ninf nsz arcp contract afn reassoc VRCPPSYr renamable $ymm1

Case 4 is exercised in Part_Create function in llvm/test/CodeGen/X86/critical-anti-dep-breaker.ll
In the existing version, r10 is used to break the following anti-dependence. In the new version, rsi is used (rsi is considered before r10).
MOV64mr $rsp, 1, $noreg, 8, $noreg, killed renamable $rax :: (store 8 into %ir.Vchunk) // broken by rsi in this patch
renamable $rax = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @PartClass, $noreg :: (load 8 from got)

At a high level, this patch makes the following changes:

  1. When we process a register’s definition, don’t update liveness info of aliased registers eagerly.
  2. Keep liveness info separate for aliases as much as possible.
  3. Be careful when choosing a free register for renaming, or when considering replacing a register, by considering liveness information of all aliases.

This patch updates the test cases as well, to account for the broken anti-dependencies.

Diff Detail

Event Timeline

arangasa created this revision.Jan 6 2021, 11:47 PM
arangasa requested review of this revision.Jan 6 2021, 11:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2021, 11:47 PM
pgode added a subscriber: pgode.Jan 7 2021, 12:06 AM
arangasa edited the summary of this revision. (Show Details)Jan 7 2021, 12:32 AM
arangasa added reviewers: kparzysz, efriedma.
arangasa updated this revision to Diff 316910.Jan 15 2021, 5:27 AM

Fix iterator variable names.

craig.topper added a reviewer: wxiao3.
craig.topper added inline comments.Jan 15 2021, 4:37 PM

This comment still mentions R10, but the check line now uses RSI.

Probably best to regenerate the whole test with and pre-commit it. Then rebase this patch to show how the whole function chagnes. Seeing only one instruction is hard to understand.

arangasa added inline comments.Jan 18 2021, 3:41 AM

Thank you, Craig for your time and helpful inputs. is the updated test.

arangasa updated this revision to Diff 318421.Jan 21 2021, 10:53 PM

Updated llvm/test/CodeGen/X86/critical-anti-dep-breaker.ll to show the whole test in

Rebasing this change on top of

arangasa marked an inline comment as done.Feb 2 2021, 6:01 AM

I tested this patch with the following SPEC 2006 benchmarks, with or without this patch on 11.0.0 release. This is the config I used for x86 target:

COPTIMIZE = -O2 -mllvm -stats -mllvm -break-anti-dependencies=critical -mllvm -post-RA-scheduler=1
CXXOPTIMIZE = -O2 -mllvm -stats -mllvm -break-anti-dependencies=critical -mllvm -post-RA-scheduler=1

Benchmarks tested: 401.bzip2 429.mcf 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 433.milc 444.namd 453.povray 470.lbm 482.sphinx3

The above are the benchmarks I could successfully compile and run in my setup with 11.0.0 (irrespective of whether this change is present or not).

I could see that this change results in 33% more broken antideps, when compared to the one without this change (5692 vs 4258).

Can you provide any performance data for SPEC? SPEC2017 is better than SPEC2006.

The following SPEC2017 benchmarks were tested. In our setup, this patch results in 19% more antideps getting broken (116008 vs 97228). Baseline is llvm 11.0.0 release, and this patch was added to 11.0.0 release.
Exact flags used (with/without this patch): -O2 -mllvm -stats -mllvm -break-anti-dependencies=critical -mllvm -post-RA-scheduler=1.
Benchmarks: deepsjeng, gcc, imagick, lbm, leela, mcf, nab, namd, omnetpp, parest, perlbench, povray, x264, xalancbmk, xz.

arangasa updated this revision to Diff 334410.Mar 31 2021, 5:00 AM

Rebase patch.

The broken antideps number looks good but how about the final impact to runtime performance?
Could you please provide SPEC2017 performance data such as rate (like