This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
MachineCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
RISCV/
-
addc-adde-sube-subc.ll
-
addcarry.ll
-
addimm-mulimm.ll
-
alu64.ll
-
bswap-bitreverse.ll
-
calling-conv-ilp32-ilp32f-common.ll
-
calling-conv-ilp32-ilp32f-ilp32d-common.ll
-
calling-conv-lp64-lp64f-lp64d-common.ll
-
copysign-casts.ll
-
div-by-constant.ll
-
div-pow2.ll
-
div.ll
-
fpclamptosat.ll
-
fpclamptosat_vec.ll
-
iabs.ll
-
mul.ll
-
neg-abs.ll
-
rotl-rotr.ll
-
rv32zbb.ll
-
rv64i-w-insts-legalization.ll
-
rv64zbb.ll
-
rvv/
-
expand-no-v.ll
-
fixed-vectors-elen.ll
-
fixed-vectors-reduction-mask-vp.ll
-
fixed-vectors-unaligned.ll
-
vreductions-mask-vp.ll
-
sadd_sat.ll
-
sadd_sat_plus.ll
-
select-binop-identity.ll
-
shadowcallstack.ll
-
shifts.ll
-
split-udiv-by-constant.ll
-
srem-lkk.ll
-
srem-seteq-illegal-types.ll
-
srem-vector-lkk.ll
-
ssub_sat.ll
-
ssub_sat_plus.ll
-
uadd_sat.ll
-
uadd_sat_plus.ll
-
umulo-128-legalisation-lowering.ll
-
unaligned-load-store.ll
-
urem-lkk.ll
-
urem-seteq-illegal-types.ll
-
urem-vector-lkk.ll
-
usub_sat.ll
-
usub_sat_plus.ll
-
vararg.ll
-
wide-scalar-shift-by-byte-multiple-legalization.ll
-
wide-scalar-shift-legalization.ll
-
xaluo.ll
-
X86/
-
abdu-vector-128.ll
-
add-sub-bool.ll
-
alias-static-alloca.ll
-
avx-vinsertf128.ll
-
avx512-broadcast-unfold.ll
-
avx512-intrinsics-x86_64.ll
-
avx512-intrinsics.ll
-
avx512-mask-op.ll
-
avx512-regcall-Mask.ll
-
avx512-regcall-NoMask.ll
-
avx512bw-intrinsics-upgrade.ll
-
avx512fp16-intrinsics.ll
-
avx512fp16-mov.ll
-
avx512fp16-mscatter.ll
-
avx512vl-intrinsics.ll
-
bmi-out-of-order.ll
-
combine-add.ll
-
div-rem-pair-recomposition-signed.ll
-
div-rem-pair-recomposition-unsigned.ll
-
divide-by-constant.ll
-
divmod128.ll
-
fold-add.ll
-
fold-masked-merge.ll
-
fold-tied-op.ll
-
h-registers-1.ll
-
hipe-cc.ll
-
hipe-cc64.ll
-
horizontal-reduce-add.ll
-
horizontal-reduce-fadd.ll
-
horizontal-reduce-smax.ll
-
horizontal-reduce-smin.ll
-
horizontal-reduce-umax.ll
-
horizontal-reduce-umin.ll
-
horizontal-sum.ll
-
icmp-shift-opt.ll
-
imul.ll
-
lea-opt-cse4.ll
-
lea-opt2.ll
-
logic-shift.ll
-
machine-cp.ll
-
madd.ll
-
masked_gather.ll
-
masked_gather_scatter.ll
-
memcmp-more-load-pairs-x32.ll
-
memcmp-more-load-pairs.ll
-
midpoint-int-vec-128.ll
-
midpoint-int-vec-256.ll
-
midpoint-int.ll
-
movmsk-cmp.ll
-
mul-constant-i64.ll
-
mul-constant-result.ll
-
mul-i1024.ll
-
mul-i256.ll
-
mul-i512.ll
-
mul128.ll
-
mul64.ll
-
muloti.ll
-
optimize-max-0.ll
-
popcnt.ll
-
pr34080-2.ll
-
ptest.ll
-
rev16.ll
-
rotate-multi.ll
-
sad.ll
-
setcc-wide-types.ll
-
shift-combine.ll
-
smul-with-overflow.ll
-
smul_fix.ll
-
smul_fix_sat.ll
-
smulo-128-legalisation-lowering.ll
-
sse-regcall.ll
-
stack-clash-large.ll
-
statepoint-live-in.ll
-
statepoint-regs.ll
-
swift-return.ll
-
umul-with-overflow.ll
-
umul_fix.ll
-
umul_fix_sat.ll
-
umulo-128-legalisation-lowering.ll
-
umulo-64-legalisation-lowering.ll
-
urem-seteq-nonzero.ll
-
v8i1-masks.ll
-
vec_smulo.ll
-
vec_umulo.ll
-
vector-fshr-128.ll
-
vector-interleaved-store-i16-stride-7.ll
-
vector-interleaved-store-i8-stride-5.ll
-
vector-interleaved-store-i8-stride-8.ll
-
vector-pcmp.ll
-
vector-reduce-add-mask.ll
-
vector-reduce-add-sext.ll
-
vector-reduce-add-zext.ll
-
vector-reduce-add.ll
-
vector-reduce-and-bool.ll
-
vector-reduce-and-cmp.ll
-
vector-reduce-and.ll
-
vector-reduce-mul.ll
-
vector-reduce-or-bool.ll
-
vector-reduce-or-cmp.ll
-
vector-reduce-or.ll
-
vector-reduce-smax.ll
-
vector-reduce-smin.ll
-
vector-reduce-umax.ll
-
vector-reduce-umin.ll
-
vector-reduce-xor-bool.ll
-
vector-reduce-xor.ll
-
vector-trunc-math.ll
-
vector-trunc-packus.ll
-
vp2intersect_multiple_pairs.ll
-
win-smallparams.ll
-
x86-32-vector-calling-conv.ll
-
x86-interleaved-access.ll
-
x86-no_caller_saved_registers-preserve.ll
-
xmulo.ll

Differential D141017

[MachineCombiner] Use default latency model when no detailed model available
ClosedPublic

Authored by reames on Jan 4 2023, 4:01 PM.

Download Raw Diff

Details

Reviewers

craig.topper
asi-sc
Carrot
shchenz

Commits

rG86eff6be686a: [MachineCombiner] Use default latency model when no detailed model available

Summary

This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.

Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)

The motivation here is two fold.

First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.

Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.

Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.

For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jan 4 2023, 4:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 4:01 PM

Herald added subscribers: armkevincheng, sjarus, eric-k256 and 27 others. · View Herald Transcript

reames requested review of this revision.Jan 4 2023, 4:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 4:01 PM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

the default latency information available from TargetSchedule.

Could you add some more wording here please?
Where does it come from?

Harbormaster completed remote builds in B205792: Diff 486367.Jan 4 2023, 5:29 PM

In D141017#4027369, @lebedev.ri wrote:

the default latency information available from TargetSchedule.

Could you add some more wording here please?
Where does it come from?

TargetSchedModel::computeOperandLatency and computeInstrLatency, end up calling defaultDefLatency. That in turns end up returning 0 for transient instructions, MCSchedModel.LoadLatency for loads, MCSchedModel.HighLatency for "high latency defs", and 1 for everything else.

The TargeSchedModel is initialized from the MCSchedModule and the subtarget info. It mostly just pulls information from the MCSchedModel. The MCSchedModel is initialized (when no mcpu is defined) to a default initialized instance of the class (see GetDefaultSchedModel and MCSchedule.cpp Ln 25). The default values for LoadLatency and HighLatency are 4 and 10 respectively.

isHighLatencyDef defaults to false, and only x86 and AMDGPU override it. AMDGPU isn't interesting as it doesn't use MachineCombiner. X86 lists a set of divides, sqrts, scatter, and gather instructions.