Page MenuHomePhabricator

[RISCV] Initial infrastructure for code generation of the RISC-V V-extension
ClosedPublic

Authored by craig.topper on Oct 14 2020, 10:53 PM.

Details

Summary

The companion RFC (http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html) gives lots of details on the overall strategy, but we summarize it here:

  • LLVM IR involving vector types is going to be selected using pseudo instructions (only MachineInstr). These pseudo instructions contain dummy operands to represent the vector type being operated and the vector length for the operation.
  • These two dummy operands, as set by instruction selection, will be used by the custom inserter to prepend every operation with an appropriate vsetvli instruction that ensures the vector architecture is properly configured for the operation. Not in this patch: later passes will remove the redundant vsetvli instructions.
  • Register classes of tuples of vector registers are used to represent vector register groups (LMUL > 1).
  • Those pseudos are eventually lowered into the actual instructions when emitting the MCInsts.

About the patch:

Because there is a bit of initial infrastructure required, this is the minimal patch that allows us to select instructions for 3 LLVM IR instructions: load, add and store vectors of integers. LLVM IR operations have "whole-vector" semantics (as in they generate values for all the elements).

Later patches will extend the information represented in TableGen.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Co-Authored-by: Craig Topper <craig.topper@sifive.com>

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
xmj added a subscriber: xmj.Oct 16 2020, 1:20 AM
StephenFan added inline comments.Oct 16 2020, 2:00 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
208

May be

assert(MI.getOpcode() == RISCV::PseudoVSETVLI && "Unexpected pseudo instruction");
MCInstr = &TII.get(RISCV::VSETVLI);

is better

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1995

may be

assert(SEWIndex >= 0 && "SEWIndex must be >= 0");

is better

llvm/utils/TableGen/GlobalISelEmitter.cpp
190–193

Is this if statement necessary?

StephenFan added inline comments.Oct 16 2020, 2:13 AM
llvm/lib/Target/RISCV/RISCVInstrInfoPseudoV.td
209 ↗(On Diff #298303)

The PseudoVSETVLI has the hasSideEffects = 1, mayLoad = 0, mayStore = 0

frasercrmck added inline comments.Oct 16 2020, 2:27 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1979

Perhaps this logic should go into RISCVBaseInfo, as I would expect other parts of the compiler will need to manipulate this operand's data at some point. It would be nice to have getters/setters for that, rather than relying on the underlying "encoding".

llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir
17

Is this test function missing a body? I can't see how it would generate the expected MIR

evandro marked 3 inline comments as done.Oct 16 2020, 6:09 PM
evandro added inline comments.
llvm/utils/TableGen/GlobalISelEmitter.cpp
190–193

Yes, as the last test below could prove true.

evandro updated this revision to Diff 298794.Oct 16 2020, 6:50 PM
evandro edited the summary of this revision. (Show Details)
evandro set the repository for this revision to rL LLVM.
arcbbb added a subscriber: arcbbb.Oct 18 2020, 11:14 PM
rogfer01 added inline comments.Oct 19 2020, 2:42 AM
llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir
17

This test checks the case when the vl is not RISCV::X0 by using the vreg %3.

We can't currently express this in LLVM but we still need some LLVM IR function. Perhaps we can add some comment explaining this.

frasercrmck added inline comments.Oct 19 2020, 6:23 AM
llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir
17

Okay yeah I see my original misunderstanding; sorry about that. Presumably there will eventually be intrinsics that can set vl; I've seen those in other proposals. Until then, a comment wouldn't hurt.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
99

The reserved VL and VTYPE would immediately dead after implicit def. How do you support the calling convention? Both CSRs are caller-saved.

Could you demonstrate that RVV intrinsic can share the same infrastructure?

frasercrmck added inline comments.Oct 21 2020, 2:23 AM
llvm/lib/Target/RISCV/RISCVInstrInfoPseudoV.td
145 ↗(On Diff #298794)

I notice that the RFC mentions using early-clobber constraints but don't see it being used here. From the RFC:

early-clobber %2:vrm2 = PseudoVADD_VV_M2 %3:vrm2(tied-def 0), %0:vrm2, %1:vrm2,$noreg, $x0, 32, implicit $vl, implicit $vtype

(If you wonder about the early-clobber it is needed to fulfill some constraints between sources and destination registers under lmul>1)

I ask because I'm concerned about the use of tied and early-clobber on the same operand: it is a special-case in SlotIndexes (as once explained in the mailing lists) and I've seen issues with this on another target I was working on, where LLVM forgets about this special case in several places and generates wrong code (subregister lanes are incorrectly deemed to be undef). I worry we're going to see really hard-to-track bugs a few months down the line when trying to compile more complex programs.

Is early-clobber really needed? Perhaps you could explain which constraints under lmul>1 are fulfilled by using this?

rogfer01 added inline comments.Oct 21 2020, 5:39 AM
llvm/lib/Target/RISCV/RISCVInstrInfoPseudoV.td
145 ↗(On Diff #298794)

Hi, apologies I wasn't clear enough with this aspect of the proposal.

early-clobber is in practice only relevant for widenings and narrowings (and a few other instructions). That is the reason why it is not in this very first patch. In the particular case of widenings and narrowings, we cannot have a def operand and a use operand where their sew is different and their actual vector registers overlap (under some conditions).

For instance vwadd.vv v2, v1, v2 is not valid (the rule is a bit obscure as I understand vwadd.vv v2, v1, v3 might be valid, see https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands). The simplest way to avoid this issue was using early-clobber.

If we don't use early-clobber then I understand we need to amend somehow the instructions after RA. Perhaps it is possible to let RA know what registers are still feasible as it goes allocating them? (I have not looked into that, tbh)

There are some cases where early-clobber may be too strict. I asked the list whether there is a way to model something more than what early-clobber does (for operands that have the same EEW as the destination, consider vwadd.wv) in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141383.html but apparently there is no straightforward solution at the moment.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
99

To be honest we haven't outlined any calling convention at this stage yet. See for some ideas we're considering but nothing is set in stone https://github.com/riscv/rvv-intrinsic-doc/issues/38

However as you mention, even if both CSRs are caller-saved and given the current mechanism in which every instruction using them is prefixed with a vsetvl, vl is implicitly saved in a GPR whose value will be preserved through the call.

Until the calling convention is clarified, any pass that removes redundant vsetvl instructions must be aware that calls may have clobbered it.

From your question, though, now I realize that we want to extend the lowering of a call in selectiondag to assert in the regmask that vl and vtype are clobbered. I think this would the safe thing to do for the usual calling convention. Does this seem reasonable at this stage?

Could you demonstrate that RVV intrinsic can share the same infrastructure?

I'll talk with @evandro and see if we can put together some patch showing it.

frasercrmck added inline comments.Oct 22 2020, 12:48 AM
llvm/lib/Target/RISCV/RISCVInstrInfoPseudoV.td
145 ↗(On Diff #298794)

Hi Roger, thanks for the explanation. I see where you're coming from and why early-clobber is needed. I think we should just go with early-clobber for now and fix the bugs that come our way, or improve LLVM in line with the question you asked on the list.

And on that note, it sounds like what we need is something like let Constraints = "$src != $dst": it sounds like a register allocation thing rather than a liveness thing, doesn't it?

frasercrmck added inline comments.Oct 23 2020, 5:01 AM
llvm/lib/Target/RISCV/RISCVRegisterInfo.td
277

Perhaps it would be good to explain in the code if/how an implementation with ELEN<32 or ELEN>64 could/would be supported, as there's bound to be one some day. For example, is it impossible, is it incompatible, is it awkward, or are there just performance implications? Basically, what are the tradeoffs to the imposed constraints? The RFC goes a little bit into that regarding i128 but that might get lost in time.

craig.topper added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1919

Is VLMul here already in the proper encoding? The cases seem to align with the enum values.

1975

Seems weird to define a reserved register constant register. Are there are other examples of this in RISC-V or another target?

llvm/lib/Target/RISCV/RISCVInstrInfoPseudoV.td
1 ↗(On Diff #298794)

nit 80 columns

llvm/lib/Target/RISCV/RISCVRegisterInfo.td
281

Not sure if we should be using these special fraction characters in source files.

282

Looks like some formatting of columns was lost here?

craig.topper added inline comments.Oct 28 2020, 3:59 PM
llvm/lib/Target/RISCV/CMakeLists.txt
16 ↗(On Diff #298794)

I think these are in alphabetical order?

llvm/lib/Target/RISCV/RISCVInstrInfoPseudoV.td
174 ↗(On Diff #298794)

Is this foreach just to make instruction a variable?

llvm/lib/Target/RISCV/RISCVMCInstLower.cpp
222

I think you can use explicit_operands() here to avoid the isImplicit check later.

234

Can this just be an llvm_unreachable?

llvm/lib/Target/RISCV/RISCVRegisterInfo.td
390–391

Is this list used?

llvm/lib/Target/RISCV/Utils/RISCVBaseInfo.cpp
17

This is arguably a layering violation since Utils is supposed to be usable by the MC layer. And MC layer tools don't use IR. But since its only a header it might not be an issue. Though it might break a modules build?

jrtc27 added inline comments.Oct 28 2020, 4:09 PM
llvm/lib/Target/RISCV/Utils/RISCVBaseInfo.cpp
17

It's a TableGen'ed header that pull in who knows what else generated so that can cause issues in parallel builds due to not having dependencies in that direction. Not sure if that's a problem in this particular case but I have definitely seen that cause issues in our fork when we've made that mistake before in clang/.

Would you be able to explain how spills & reloads of vector registers work with this method? Namely, LLVM can insert spills and reloads at any point in the instruction sequence (IIRC). I would imagine that this includes right between VSETVLI/PseudoInst pairs:

dead %25:gpr = PseudoVSETVLI %12:gpr, /*e32,mf4*/, implicit-def $vl, implicit-def $vtype
; insert spill of V4 here!
%17:vr = PseudoVFMUL_VF_M1 %18:vr(tied-def 0), killed %13:vr, %15:fpr64, $noreg, $noreg, -1, implicit $vl, implicit $vtype

I would imagine that the spill needs its own VSETVLI as it must spill the whole physical register: it must ensure a vtype of e.g. e8,m1. If it needs to spill V4M4 it would configure e8,m4. This is likely overwrite the previous configuration, so will we have to save/restore vtype around the spill/reload? I'm not sure that's possible in general. I don't even think storeRegToStackSlot and loadRegFromStackSlot allow the insertion of multiple MIs, and we'd need a reserved/scavenged register. Though perhaps there's another trick I'm not thinking of.

Would you be able to explain how spills & reloads of vector registers work with this method? Namely, LLVM can insert spills and reloads at any point in the instruction sequence (IIRC). I would imagine that this includes right between VSETVLI/PseudoInst pairs:

dead %25:gpr = PseudoVSETVLI %12:gpr, /*e32,mf4*/, implicit-def $vl, implicit-def $vtype
; insert spill of V4 here!
%17:vr = PseudoVFMUL_VF_M1 %18:vr(tied-def 0), killed %13:vr, %15:fpr64, $noreg, $noreg, -1, implicit $vl, implicit $vtype

I would imagine that the spill needs its own VSETVLI as it must spill the whole physical register: it must ensure a vtype of e.g. e8,m1. If it needs to spill V4M4 it would configure e8,m4. This is likely overwrite the previous configuration, so will we have to save/restore vtype around the spill/reload? I'm not sure that's possible in general. I don't even think storeRegToStackSlot and loadRegFromStackSlot allow the insertion of multiple MIs, and we'd need a reserved/scavenged register. Though perhaps there's another trick I'm not thinking of.

There is no need to specify the vtype for spilling code. In RVV specification, we have whole register load/store without setting vtype. To know LMUL values is enough to do spilling.

evandro updated this revision to Diff 303600.Nov 6 2020, 5:39 PM
evandro marked 12 inline comments as done.

This patch doesn't apply cleanly to trunk. Especially RISCVRegisterInfo.td

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1961

else should be on the same line as the closing curly brace above

1975

Is this from clang-format? I would have expected ElementWidth to line up more with Multiplier on the previous line.

craig.topper added inline comments.Nov 10 2020, 10:17 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1964

I think this does need RegState::Define, but this path isn't exercised in this patch.

I found that AArch64 has a pass (AArch64DeadRegisterDefinitionsPass) that replaces defs of some instructions with WZR(their zero register). So I guess its not unprecedented.

frasercrmck added inline comments.Nov 10 2020, 11:02 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1964

For what it's worth, I worked on a downstream target which would often define and use a reserved constant predicate register.

craig.topper added inline comments.Nov 10 2020, 11:14 AM
llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir
32

I'm a little concerned that Machine IR immediately out of SelectionDAG doesn't reflect the real semantics. The Pseudos should implicit-def $vl and $type and not implicit use them. As long as their fused with vsetvli, they should have the semantics of the pair.

But I don't know how to get the right semantics without having 2 sets of pseudos. How many instructions would that come out to?

craig.topper added inline comments.Nov 10 2020, 6:23 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

Should the types be qualified with hasStdExtD()? I have the same question for F but it looks like V extension currently implicitly enables the F extension.

rogfer01 added inline comments.Nov 11 2020, 1:45 AM
llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir
32

One option could be to remove their Uses=[VL, VTYPE] but then they would come out without those implicit uses which is not ideal either but I wonder if this might lead to other issues elsewhere (e.g. in the machine inst verifier?)

Perhaps we could hook in the InstrEmitter (or whoever creates the actual MachineInstrs) somehow and add the implicit-vdefs. Then in the custom inserter remove those and put the implicit-uses after we have emitted vsetvli.

HsiangKai added inline comments.Nov 12 2020, 8:49 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

I am curious about V should imply F or not. It is vague in the V specification.

jrtc27 added inline comments.Nov 12 2020, 9:13 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

Currently I read it as no, in that V+Zfinx is a valid combination and would have the FP vector-scalar instructions reading the FP scalar from the "integer' register file.

evandro marked 5 inline comments as done.Nov 24 2020, 10:51 AM
evandro added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

V does not imply Zfh, F, D or Zfinx. However, V supports half, float and double types regardless of F or D or Zfinx. The V instructions which specify scalar half, float or double operands do require the respective extensions. At least as I read the spec.

evandro updated this revision to Diff 307412.Nov 24 2020, 11:04 AM
craig.topper added inline comments.Nov 24 2020, 11:09 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

This doesn't sound like it is just talking about the scalar operands, but maybe I'm reading it wrong.

Vector floating-point instructions require the presence of base scalar floating-point extensions corresponding to the supported vector floating-point element widths.

Note
Profiles supporting 16-bit half-precision floating-point values will also have to implement scalar half-precision floating-point support in the f registers.
evandro added inline comments.Nov 24 2020, 12:05 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

Indeed. Now I wonder about the integer instructions if they have requirements on XLEN to support the corresponding integer SEWs.

frasercrmck added inline comments.Nov 25 2020, 4:29 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

In section 3.3.1, we have this:

In the base vector "V" extension, only SEW up to ELEN = max(XLEN,FLEN) are required to be supported. Other platforms may impose different constraints on ELEN

So I wouldn't say it's required. We can have e64 on RV32, for instance. And at least for the integer scalar move instructions (17.1) it accounts for SEW > XLEN:

The vmv.x.s instruction copies a single SEW-wide element from index 0 of the source vector register to a destination integer register. If SEW > XLEN, the least-significant XLEN bits are transferred and the upper SEW-XLEN bits are ignored. If SEW < XLEN, the value is sign-extended to XLEN bits.

I haven't thought through the implications for SEW > XLEN on code generation but it might trip us up if we're not careful.

craig.topper commandeered this revision.Nov 30 2020, 7:48 PM
craig.topper edited reviewers, added: evandro; removed: craig.topper.

Commandeering so I can rebase the patch

Rebase. Remove RISCVGenSystemOperands.inc from CMakeLists.txt. It's the same as RISCVSearchableTables.inc now. I'll probably pre-commit that rename.

Rebase after pre-commiting rename of RISCVGenSearchableTables.inc

clang-format

-Simplify the code in RISCVExpandPseudoInsts by using MBBI directly instead of creating MI from it and using the TII pointer already in the class.
-Copy the dead flag from the dest register when expanding PseudoVSETVLI.

craig.topper edited the summary of this revision. (Show Details)

Make LowerRISCVVMachineInstrToMCInst local to RISCVMCCodeEmitter.cpp instead of exposing it in RISCV.h

Rename RISCVInstrInfoPseudoV.td to RISCVInstrInfoVPseudos.td so it alphabetizes next to RISCVInstrInfoV.td making their relationship more obvious.

Does anyone have any additional feedback on this patch or the direction? @evandro should be providing a patch to show how intrinsics will work soon.

Does anyone have any additional feedback on this patch or the direction? @evandro should be providing a patch to show how intrinsics will work soon.

I'm generally happy with this approach; I've been using it downstream to support fixed-length vectors with success.

I think there was your outstanding question about the pseudos "using" VL and VTYPE immediately after ISel, and there's the idea you brought up in D92228 about duplicating the pseudos for masked and unmasked operations. I think it'd be good to come to some kind of agreement on those, but I'm also happy for them to be deferred until later. I don't think either affect the "correctness" of this patch.

Does anyone have any additional feedback on this patch or the direction? @evandro should be providing a patch to show how intrinsics will work soon.

I'm generally happy with this approach; I've been using it downstream to support fixed-length vectors with success.

I think there was your outstanding question about the pseudos "using" VL and VTYPE immediately after ISel, and there's the idea you brought up in D92228 about duplicating the pseudos for masked and unmasked operations. I think it'd be good to come to some kind of agreement on those, but I'm also happy for them to be deferred until later. I don't think either affect the "correctness" of this patch.

I'm hoping the VL/VTYPE uses don't cause a problem as I don't know how to make InstrEmitter fix it. It looks like the VE target is also marking their instructions as using their VL register and don't insert LVL instructions until very late in the pipeline.

I don't know if we're going to have an answer for the masked/unmasked soon. I think we might just use earlyclobber in the initial patches and suffer the bad register allocation so we can make forward progress.

I don't know if we're going to have an answer for the masked/unmasked soon. I think we might just use earlyclobber in the initial patches and suffer the bad register allocation so we can make forward progress.

I also agree that at this point we have something correct, even if at times suboptimal, is better than nothing at all. We can always improve on top of that baseline.

This revision is now accepted and ready to land.Dec 4 2020, 7:08 AM
evandro added inline comments.Dec 4 2020, 7:12 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
133

I asked Andrew Waterman and his answer is, in English, that:

The data types fully supported in V, for both scalar and vector operations, mirror those supported by the base ISA. For example, RV32IMCFV, only SEW up to 32, for both integer and FP, both scalar and vector operations; RV32IMCFDV, only SEW up to 32, both scalar and vector, integer operations and SEW up to 64, both scalar and vector, FP operations.

This comment was removed by NickHung.
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
99

At this stage, inserting vsetvli in front of every instruction is

craig.topper removed a reviewer: efriedma.