This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use materialization cost when lowering constant build_vector
Needs ReviewPublic

Authored by luke on Aug 15 2023, 6:00 AM.

Details

Summary

[RISCV] Use materialization cost when lowering constant build_vector

When lowering a constant build_vector, we currently match the following
patterns in this order:

  1. Splats
  2. vid sequences
  3. <= 32 bit scalar vmv.s.x
  4. Hidden splats
  5. Dominant values

So if a build_vector could be expressed by both a vid sequence and a vmv.s.x,
then it will always lower it to a vid. However, the vmv.s.x might be a more
profitable lowering if the constants are cheap to materialize, e.g.:

<4 x i8> <i8 1, i8 3, i8 5, i8 7>

Could be lowered as:

vsetivli zero, 4, e8, mf4, ta, ma
vid.v v8
vadd.vv v8, v8, v8
vadd.vi v8, v8, 1

Or as 3 instructions with:

vsetivli zero, 4, e32, m1, ta, ma
lui a0, 28752
vmv.s.x v8, a0

This patch computes the (rough) cost for vid sequences, scalar vmv.s.x and
hidden splat lowerings, then chooses the cheapest lowering (I've left out
dominant values for now).
An arbitrary maximum cost has been chosen for now, but it should be replaced
with an approximate constant pool load cost.
The main result of this is that we get some more vmv.s.xs where there would be
vids.

Diff Detail

Event Timeline

luke created this revision.Aug 15 2023, 6:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2023, 6:00 AM
luke requested review of this revision.Aug 15 2023, 6:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2023, 6:00 AM
LWenH added a subscriber: LWenH.Aug 15 2023, 8:10 PM

Please separate the removal of the immediate restriction on the VID into it's own change, either before or after.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
3294

Copying everything by value here to avoid scope issues is rather subtle. I don't spot a scope bug here, but I'm not a huge fan of the code structure.

3361

This is off when we can use a vmv.v.i to perform the insert.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
195

Not this change, but there's something odd here. Unless I'm misreading this, we should be doing a splat of 2.0 as double here. That's the hex value 0x4000000000000000 which is a constant we can materialize in two instructions at worst. (1 with zbs).

It looks maybe we've got a problem with how we lower constant splats of floating point types? We should be able to use the integer mat path, and it looks like we're not doing so?

reames requested changes to this revision.Aug 16 2023, 2:45 PM
This revision now requires changes to proceed.Aug 16 2023, 2:45 PM
luke added inline comments.Aug 17 2023, 10:03 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
3294

Not a fan either. I admittedly spent some time trying to track down a miscompile due to UB, where I had originally copied some stack variables by reference.

Two possible alternatives:

  • Explicitly specify the list of captures
  • Create classes for each type of lowering and move DAG logic & cost logic into them
luke added inline comments.Aug 18 2023, 4:20 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
3361

I think I'm missing something, I can't find anywhere where the insert_vector_elt below would get lowered or combined to a vmv.v.i. There doesn't seem to be any vmv.v.i's introduced in the test diff in https://reviews.llvm.org/D157299 either?

luke updated this revision to Diff 551474.Aug 18 2023, 5:35 AM

Split out relxation on vid sequence. (Still need to think about code structure but haven't
come up with any better ideas yet)

luke edited the summary of this revision. (Show Details)Aug 18 2023, 5:39 AM
luke added inline comments.Aug 29 2023, 8:20 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
195

Took a look at this, this seems to be working as intended, it uses the integer mat path only if the cost to materialise the integer < 2. I.e. adding zbs gives:

vrgather_shuffle_vx_v4f64:              # @vrgather_shuffle_vx_v4f64
	.cfi_startproc
# %bb.0:
	vsetivli	zero, 4, e64, m2, ta, ma
	vid.v	v10
	li	a0, 3
	vmul.vx	v12, v10, a0
	bseti	a0, zero, 62
	vmv.v.x	v10, a0
	vsetivli	zero, 1, e8, mf8, ta, ma
	vmv.v.i	v0, 3
	vsetivli	zero, 4, e64, m2, ta, mu
	vrgather.vv	v10, v8, v12, v0.t
	vmv.v.v	v8, v10
	ret

or alternatively, increasing -riscv-lower-fp-imm-cost from the default 2 to 3:

vrgather_shuffle_vx_v4f64:              # @vrgather_shuffle_vx_v4f64
	.cfi_startproc
# %bb.0:
	vsetivli	zero, 4, e64, m2, ta, ma
	vid.v	v10
	li	a0, 3
	vmul.vx	v12, v10, a0
	li	a0, 1
	slli	a0, a0, 62
	vmv.v.x	v10, a0
	vsetivli	zero, 1, e8, mf8, ta, ma
	vmv.v.i	v0, 3
	vsetivli	zero, 4, e64, m2, ta, mu
	vrgather.vv	v10, v8, v12, v0.t
	vmv.v.v	v8, v10
	ret