This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use vmv.v.[v|i] if we know COPY is under the same vl and vtype.
ClosedPublic

Authored by HsiangKai on Jun 2 2021, 2:37 AM.

Details

Summary

If we know the source operand of COPY is defined by a vector instruction
with tail agnostic and the same LMUL and there is no vsetvli between
COPY and the define instruction to change the vl and vtype, we could use
vmv.v.v or vmv.v.i to copy vector registers to get better performance than
the whole vector register move instructions.

If the source of COPY is from vmv.v.i, we could use vmv.v.i for the
COPY.

This patch only considers all these instructions within one basic block.

Case 1:

bb.0:
  ...
  VSETVLI          # The first VSETVLI before COPY and VOP.
  ...              # Use this VSETVLI to check LMUL and tail agnostic.
  ...
  vy = VOP va, vb  # Define vy.
  ...              # There is no vsetvli between VOP and COPY.
  vx = COPY vy

Case 2:

bb.0:
  ...
  VSETVLI          # The first VSETVLI before VOP.
  ...              # Use this VSETVLI to check LMUL and tail agnostic.
  ...
  vy = VOP va, vb  # Define vy.
  ...              # There is no vsetvli to change vl between VOP and COPY.
  ...
  VSETVLI          # This vsetvli will not change vl.
  ...
  VSETVLI          # The first VSETVLI before COPY.
  ...              # This VSETVLI does not change vl and vtype.
  ...
  vx = COPY vy

Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>

Diff Detail

Event Timeline

HsiangKai created this revision.Jun 2 2021, 2:37 AM
HsiangKai requested review of this revision.Jun 2 2021, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2021, 2:38 AM
craig.topper added inline comments.Jun 2 2021, 12:15 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
167

What if the defining instruction something that doesn't obey vsetvli for example an earlier COPY that became a whole register move. Or a whole register load which could be a spill reload or just a VLMAX load.

Or a reduction which only writes element 0 regardless of the tail policy. I think we currently force tail agnostic on the vsetvli for those, but I might update it to keep whatever tail policy the previous instruction used.

184

If the SEW and LMul are equal, isn't the vsetvli insertion pass usually going to use x0, x0 for VSetVLIForCopy?

204

Is this equivalent to "return LMul == SetLMul;"?

209

What about this

vsetvli
v0 = vectorop
v1 = vleff
copy v2, v0

The vleff may change VL, but if you replace the copy you need the VL from before the vsetvli. So I think you need to stop if you encountered a vleff.

368

Is this supposed to be in this patch. It's not mentioned in the description.

HsiangKai updated this revision to Diff 349776.Jun 4 2021, 12:33 AM

Consider vleff and whole register load/store.

HsiangKai marked 4 inline comments as done.Jun 4 2021, 12:35 AM
HsiangKai retitled this revision from [RISCV] Use vmv.v.v if we know COPY is under the same vl and vtype. to [RISCV] Use vmv.v.[v|i] if we know COPY is under the same vl and vtype..
HsiangKai edited the summary of this revision. (Show Details)
HsiangKai updated this revision to Diff 349804.Jun 4 2021, 3:40 AM

Consider vsetvli x0, x0, vtype.

HsiangKai edited the summary of this revision. (Show Details)Jun 4 2021, 3:47 AM

My browser's really chugging on this huge patch so my input has to be brief. Maybe we could hide the test changes for now?

Regarding vmv.v.i, is there not already some support for rematerialization (e.g. of immediates)? Or is it non-trivial due to having to manage VSETVLIs?

craig.topper added inline comments.Jun 4 2021, 6:49 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
192

Can we use MBBI->modifiesRegister(RISCV::VL)?

375–387

I think we need to add the VL and VTYPE implicit uses?

craig.topper added inline comments.Jun 4 2021, 9:00 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
167

Looks like I was wrong about reductions. The behavior changed after 0.9 and our implementation is incorrectly forcing tail agnostic for those.

HsiangKai updated this revision to Diff 350027.Jun 4 2021, 10:29 PM
  • Remove most of test updates.
  • Add implicit-use arguments for vmv.v.v or vmv.v.i.

My browser's really chugging on this huge patch so my input has to be brief. Maybe we could hide the test changes for now?

Regarding vmv.v.i, is there not already some support for rematerialization (e.g. of immediates)? Or is it non-trivial due to having to manage VSETVLIs?

There seems no support for rematerialization for vmv.v.i. Do you mean to set isReMaterializable = 1, isAsCheapAsAMove = 1 to vmv.v.i pseudo instructions?

My browser's really chugging on this huge patch so my input has to be brief. Maybe we could hide the test changes for now?

Regarding vmv.v.i, is there not already some support for rematerialization (e.g. of immediates)? Or is it non-trivial due to having to manage VSETVLIs?

There seems no support for rematerialization for vmv.v.i. Do you mean to set isReMaterializable = 1, isAsCheapAsAMove = 1 to vmv.v.i pseudo instructions?

It still won’t work due to the implicit VL and VTYPE operands from vsetvli insertion. If we move vsetvli insertion later after coalescing, rematerialization would work when avl is a 5 bit immediate. Otherwise AVL being a register also disables rematerialization.

HsiangKai updated this revision to Diff 351041.Jun 9 2021, 7:01 PM
  • Add more strict constraint for the conversion.
  • Add comments.
HsiangKai updated this revision to Diff 351047.Jun 9 2021, 7:45 PM

It may have more implicit arguments in the end of the producing instruction. Such as,

$v25 = PseudoVNCLIP_WV_M1 killed renamable $v8m2, killed renamable $v10, $noreg, 3, implicit-def dead $vxsat, implicit $vxrm, implicit $vl, implicit $vtype

HsiangKai updated this revision to Diff 351339.Jun 10 2021, 8:29 PM

Add VL, VTYPE directly.

MIB.addReg(RISCV::VL, RegState::Implicit);
MIB.addReg(RISCV::VTYPE, RegState::Implicit);
HsiangKai updated this revision to Diff 356086.Jul 1 2021, 7:30 PM

Update to the latest downstream version.

HsiangKai updated this revision to Diff 376252.Sep 30 2021, 9:43 AM
HsiangKai added a subscriber: khchen.

When producing instruction is a widening reduction instruction with LMUL = 1,
we could not convert whole register move to vmv. The result of reduction
instructions is LMUL = 1 with widening SEW. @khchen has found the bug and help
to fix it. This update includes his fix.

HsiangKai edited the summary of this revision. (Show Details)Sep 30 2021, 9:44 AM
craig.topper added inline comments.Oct 8 2021, 4:17 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
410

Do we have test cases for the tuple classes?

HsiangKai updated this revision to Diff 378505.Oct 10 2021, 6:50 AM

In the current implementation, the size of registers of producing and copying must be the same. For Zvlsseg, these sizes must be different. So, we have no need to consider it. I removed the checking in the update.

HsiangKai updated this revision to Diff 378506.Oct 10 2021, 6:56 AM

Revert the unnecessary change.

craig.topper added inline comments.Oct 11 2021, 2:08 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
395

Use RISCVVType::decodeVLMUL

402

Write an assert for this?

craig.topper added inline comments.Oct 11 2021, 2:35 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
128

Does LMul need to be passed by value?

128

Oops I meant to say does it need to be passed by reference

This test case triggers a conversion for a tuple class

dead $x0 = PseudoVSETVLI killed renamable $x11, 64, implicit-def $vl, implicit-def $vtype
renamable $v0_v1 = PseudoVLSEG2E8_V_M1 killed renamable $x10, $noreg, 3, implicit $vl, implicit $vtype
$v2_v3 = COPY $v0_v1                                                         
PseudoVSSEG2E8_V_M1 killed renamable $v2_v3, killed renamable $x12, $noreg, 3, implicit $vl, implicit $vtype
PseudoRET
HsiangKai marked an inline comment as done.Oct 12 2021, 4:15 AM

This test case triggers a conversion for a tuple class

dead $x0 = PseudoVSETVLI killed renamable $x11, 64, implicit-def $vl, implicit-def $vtype
renamable $v0_v1 = PseudoVLSEG2E8_V_M1 killed renamable $x10, $noreg, 3, implicit $vl, implicit $vtype
$v2_v3 = COPY $v0_v1                                                         
PseudoVSSEG2E8_V_M1 killed renamable $v2_v3, killed renamable $x12, $noreg, 3, implicit $vl, implicit $vtype
PseudoRET

Thanks. I added the test case and added back the checking for zvlsseg instructions.

HsiangKai updated this revision to Diff 378976.Oct 12 2021, 5:00 AM

Add test cases for zvlsseg register classes and add back the checkings for zvlsseg cases.

HsiangKai marked 3 inline comments as done.Oct 12 2021, 5:02 AM
craig.topper added inline comments.Oct 12 2021, 2:37 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
335

Is fractional lmul possible here?

397

Is a fractional lmul even possible?

HsiangKai updated this revision to Diff 379244.Oct 12 2021, 6:35 PM

Add assertion for fractional lmul.

HsiangKai marked 2 inline comments as done.Oct 12 2021, 6:36 PM
frasercrmck added inline comments.Oct 19 2021, 8:18 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
203

Do we need to worry about implicit defs?

Sorry for being sluggish on the review. I'm okay with the idea and the approach but I wouldn't have caught the various issues which seem to have cropped up during review, like those with widening reductions or zvlsseg tuples, etc. I think that's why I'm hesitant to give it an official approval. I'm not sure how to take it forward though. Presumably you're using this pass by default in your downstream fork?

I can say that I have taken this patch and run some of our testing with it. Everything seemed fine, though I realise that's not formal verification.

HsiangKai added inline comments.Oct 22 2021, 8:35 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
203

The function only handles vector COPY. Is there an instruction to have an implicit define for vector registers?

Sorry for being sluggish on the review. I'm okay with the idea and the approach but I wouldn't have caught the various issues which seem to have cropped up during review, like those with widening reductions or zvlsseg tuples, etc. I think that's why I'm hesitant to give it an official approval. I'm not sure how to take it forward though. Presumably you're using this pass by default in your downstream fork?

I can say that I have taken this patch and run some of our testing with it. Everything seemed fine, though I realise that's not formal verification.

Sorry for replying late. Yes, we enable the conversion by default in our downstream. We have a random test generator to run thousands and thousands tests on it. We found several bugs in my initial version and I have applied these fixes on this patch. You could see these cases from the test case in this patch. How about to turn it off by default on the upstream?

frasercrmck accepted this revision.Oct 26 2021, 6:38 AM

Sorry for being sluggish on the review. I'm okay with the idea and the approach but I wouldn't have caught the various issues which seem to have cropped up during review, like those with widening reductions or zvlsseg tuples, etc. I think that's why I'm hesitant to give it an official approval. I'm not sure how to take it forward though. Presumably you're using this pass by default in your downstream fork?

I can say that I have taken this patch and run some of our testing with it. Everything seemed fine, though I realise that's not formal verification.

Sorry for replying late. Yes, we enable the conversion by default in our downstream. We have a random test generator to run thousands and thousands tests on it. We found several bugs in my initial version and I have applied these fixes on this patch. You could see these cases from the test case in this patch. How about to turn it off by default on the upstream?

No I think that's fine. I've given it another look over and it LGTM aside from my last comment. Thanks for your patience!

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
203

I'm not aware of one, but I still think it's more theoretically sound to check all defs in the instruction. I think we agree that if there was such an implicit def we'd need to handle it? Therefore at lesat an assert that we're not expecting that case.

This revision is now accepted and ready to land.Oct 26 2021, 6:38 AM
HsiangKai updated this revision to Diff 382513.Oct 26 2021, 8:05 PM

Check implicit defined operands.

This revision was landed with ongoing or failed builds.Oct 27 2021, 8:40 PM
This revision was automatically updated to reflect the committed changes.