This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops
ClosedPublic

Authored by reames on May 3 2022, 12:32 PM.

Details

Summary

This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination. The motivating example comes from the fixed length vectorization of a simple loop such as:

for (unsigned i = 0; i < a_len; i++)
    a[i] += b;

Without this change, the core vector loop and preheader is as follows:

.LBB0_3:                                # %vector.ph
	andi	a1, a6, -8
	addi	a4, a0, 16
	mv	a5, a1
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	addi	a3, a4, -16
	vsetivli	zero, 4, e32, m1, ta, mu
	vle32.v	v8, (a3)
	vle32.v	v9, (a4)
	vadd.vx	v8, v8, a2
	vadd.vx	v9, v9, a2
	vse32.v	v8, (a3)
	vse32.v	v9, (a4)
	addi	a5, a5, -8
	addi	a4, a4, 32
	bnez	a5, .LBB0_4

The key thing to note here is that, I believe, the execution of the vsetivli only needs to happen once. Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop.

After this patch, we hoist the configuration into the preheader and perform it once.

.LBB0_3:                                # %vector.ph
	andi	a1, a6, -8
	vsetivli	zero, 4, e32, m1, ta, mu
	addi	a4, a0, 16
	mv	a5, a1
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	addi	a3, a4, -16
	vle32.v	v8, (a3)
	vle32.v	v9, (a4)
	vadd.vx	v8, v8, a2
	vadd.vx	v9, v9, a2
	vse32.v	v8, (a3)
	vse32.v	v9, (a4)
	addi	a5, a5, -8
	addi	a4, a4, 32
	bnez	a5, .LBB0_4

Once this lands, I plan on extending it to non-immediate AVLs in a separate patch, but frankly, that's less important. For a scalable loop, we always have a setvli above, and in most cases, full redundancy (via the existing dataflow) kicks in. There's enough cases to be worth handling via FRE eventually, but fixed length vectors are definitely much higher impact.

Diff Detail

Event Timeline

reames created this revision.May 3 2022, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 12:32 PM
reames requested review of this revision.May 3 2022, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 12:32 PM

I like the idea. We should move vsetvli out when possible. It doesn't look like all the test changes are improvements.

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
1297

confiuuered -> configured

1308

You can use the not well named ST.getRealMinVLen() here I think. It takes into account the Zvl*b provided value.

1313

begining -> beginning

1326

Use curly braches for this else body for consistency

1399

Would it make sense to use a BitVector using MBB numbering?

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
113 ↗(On Diff #426801)

This seems worse

frasercrmck added inline comments.
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
308

This is very helpful, thank you!

1313

it's -> its

1324

return on new line

llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store.ll
510

This seems like a regression too

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
521 ↗(On Diff #426801)

I'm curious, would this be showing a bug if it was instead e16,mf2 followed by vle16.v in bb.0 and e32,m1 followed by vle32.v in bb.1? If we moved the vsetvli down past the vle32.v we'd be using the old vtype with a 2-byte alignment requirement, which the vle32.v isn't obliged to use.

reames planned changes to this revision.May 5 2022, 3:23 PM

JFYI: Staring more closely at the test diffs on this WIP led me right into wondering about the correctness of the data flow. This is probably on hold until we've addressed correctness of the underlying algorithm.

reames updated this revision to Diff 431704.May 24 2022, 9:59 AM
reames retitled this revision from [WIP][RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops to [RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops.
reames edited the summary of this revision. (Show Details)

Rebase, and generally cleanup. Now ready for actual review.

craig.topper added inline comments.May 24 2022, 10:43 AM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
105

Any reason to restrict this uint8_t? Looks like it's a assigned to unsigned where it's used.

1287

Info could be a const reference I think.

1302

VSETVL -> VSETVLI

1317

Can PredInfo be a reference? Though maybe it's small enough that copies are cheap?

1363

This is more than 80 columns

reames added inline comments.May 24 2022, 10:53 AM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
105

uint8_t matches the actual type of the field. I can change it if you want, but it seems cleaner to me to match the field type.

reames updated this revision to Diff 431726.May 24 2022, 10:58 AM

address review comments

craig.topper added inline comments.May 24 2022, 11:23 AM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
105

I was thinking of the uint8_t as an implementation choice I made to keep the data structure size down. It could have been a 7 bit bitfield or stored in log2 form. It would also need to change if SEW=256, 512, or 1024 came back to life.

That was just my thinking. If you want to keep it as uint8_t it's fine with me.

craig.topper added inline comments.May 24 2022, 11:30 AM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
1293

What prevents supporting other LMULs?

reames added inline comments.May 24 2022, 1:03 PM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
105

I see your point, could go either way. I think either answer is reasonable here.

1293

Nothing, we just need to adjust the formula below for it. Mind if I do this in a separate change? I'm still not 100% trusting my mental model of L:MUL, and would prefer to have that reviewed on its own.

I will add a TODO.

craig.topper added inline comments.May 24 2022, 2:01 PM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
1293

Separate review is fine.

Anything left outstanding here? I think this could land now; it shouldn't need to wait for the strict assert change as the workaround should be perfectly compatible.

LGTM. Other than that last comment.

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
1371

Can this use getFirstInstrTerminator()?

craig.topper accepted this revision.May 24 2022, 2:11 PM
This revision is now accepted and ready to land.May 24 2022, 2:11 PM
This revision was landed with ongoing or failed builds.May 24 2022, 2:56 PM
This revision was automatically updated to reflect the committed changes.