Page MenuHomePhabricator

[RISCV] Add MIR tests exposing missed InstAliases
Needs ReviewPublic

Authored by frasercrmck on Nov 27 2020, 6:40 AM.

Details

Summary

The InstAlias framework cannot match registers against zero_reg, which
RVV uses to encode unmasked operations.

Diff Detail

Event Timeline

frasercrmck created this revision.Nov 27 2020, 6:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2020, 6:40 AM
frasercrmck requested review of this revision.Nov 27 2020, 6:40 AM

I thought I'd put this up for review since it's (somehow) the first RISC-V MIR test, and in case anyone had looked into this before. I know @craig.topper has some experience with InstAlias.

From a brief look, it looks like one way forward could be to allow RegisterOperand and RegisterClass records to use something like MCOperandPredicate when matching for aliases (with AliasPatternCond::K_Custom). I say "something like" because that's a property of Operand, not RegisterOperand.

That or just handle zero_reg for aliases, but I'm not yet aware of the broader implications of doing that.

Adding @arcbbb. Him and I were talking about maybe splitting masked and unmasked into separate instructions. We need to prevent masked instructions from using v0 as a destination. To do that we've been using @earlyclobber, but that prevents any source operand from being the same as the destination. To fix this it might be better to have separate masked and unmasked instructions. Then we won't use @earlyclobber and instead make the masked instruction forbid v0 as a destination with a register class constraint. Not sure how many additional instructions and pseudo instructions that will require. Thoughts?

Adding @arcbbb. Him and I were talking about maybe splitting masked and unmasked into separate instructions. We need to prevent masked instructions from using v0 as a destination. To do that we've been using @earlyclobber, but that prevents any source operand from being the same as the destination. To fix this it might be better to have separate masked and unmasked instructions. Then we won't use @earlyclobber and instead make the masked instruction forbid v0 as a destination with a register class constraint. Not sure how many additional instructions and pseudo instructions that will require. Thoughts?

Initially, I implemented the V instructions using separate classes for masked and unmasked instructions. You could see the difference from
https://reviews.llvm.org/D69987?vs=228607&id=229020#toc

We need to define some additional classes for masked instructions. After that, we could use multiclass to define unmasked/masked instructions at once.

Adding @arcbbb. Him and I were talking about maybe splitting masked and unmasked into separate instructions. We need to prevent masked instructions from using v0 as a destination. To do that we've been using @earlyclobber, but that prevents any source operand from being the same as the destination. To fix this it might be better to have separate masked and unmasked instructions. Then we won't use @earlyclobber and instead make the masked instruction forbid v0 as a destination with a register class constraint. Not sure how many additional instructions and pseudo instructions that will require. Thoughts?

Initially, I implemented the V instructions using separate classes for masked and unmasked instructions. You could see the difference from
https://reviews.llvm.org/D69987?vs=228607&id=229020#toc

We need to define some additional classes for masked instructions. After that, we could use multiclass to define unmasked/masked instructions at once.

I think this is definitely worth considering. The @earlyclobber isn't modeling what we want, and duplicating instructions saves us having to add some new constraint to llvm.

However I think we should bear in mind that the vector codegen proposal(s) use pseudo instructions for the majority of the pipeline, only resolving to the "final" vector instructions during the MCInstLower step. So those are arguably more important to duplicate since they're the ones likely to ever go through register allocation. A quick look shows that we have over 800 vector instructions. That's a rough number, but if we were to naively duplicate them for masked/unmasked/pseudo-masked/pseudo-unmasked we'd have 3200. I've seen worse, but it does get a bit hairy.

So one option we can keep in mind is to duplicate only the pseudos for masked/unmasked, and then keep the existing vector instructions as they are.

Hi all,

a couple of comments.

  1. First regarding the non-matched aliases: we also observed this issue when working on that and we deployed a bit of a hack. We added a new attribute AllowsNoRegister
// AllowsNoRegister - Specify that instructions may use NoRegister (0) instead
// of a physical register for an operand of this register class.
bit AllowsNoRegister = 0;

And then we extended VMV0 register class with this attribute

def VMV0 : RegisterClass<"RISCV", VMaskVT.types, 64, (add V0)> {
  let Size = 64;
  let AllowsNoRegister = 1;
}

and then we updated matchAliasCondition in MCInstPrinter.cpp to look like this (after having implemented some needed plumbing to bring the bit where used)

case AliasPatternCond::K_RegClass:
  // Operand must be a register in this class. Value is a register class id. 
  return Opnd.isReg() && (MRI.getRegClass(C.Value).contains(Opnd.getReg()) ||
                          (MRI.getRegClass(C.Value).allowsNoRegister() &&    
                           Opnd.getReg() == 0));

but this might feel a bit hackish, so I'm not suggesting this approach (unless there is a general sentiment that it is sensible).

  1. Regarding @earlyclobber, yes it is not ideal as it is now. But it doesn't only impact masking (though perhaps that is the clearest case).

Unless the ISA has changed (and I admit I might be a bit outdated here) my understanding is that we still have cases where @earlyclobber will be needed and it will still fall short.

For instance, mixed widenings such as vwadd.wv are complex to model because iirc an instruction like vwadd.wv v2, v2, v1 is fine while vwadd.wv v2, v2, v3 is not. I understand that using @earlyclobber a thing like vwadd.wv v4, v2, v1 is the best we can get.

I asked about this here http://lists.llvm.org/pipermail/llvm-dev/2020-May/141383.html

One of the suggestions made in that thread is teaching RA new tricks, which I don't feel qualified to do :) (I envision that if RA were able to dynamically limit the valid registers, as it assigns registers to operands, by virtue of some target-specific constraints, then, perhaps, this would be doable. But tbh I have no idea such approach is even feasible in the current RA. If we could do that we could also solve the case where v0 can't be used as the destination of masked instructions).

The other suggestion is to amend the instructions after RA in a specific pass. Seems doable at first but I am afraid this is going to be testing hell just to make sure no instruction manages to escape from that pass incorrectly.

I personally don't oppose duplicating all the instructions between mask and unmasked if this simplifies things. Just let's be conscious that, for instructions with a mask, we would go from 7 pseudos per actual instruction to 14. Perhaps the impact is not that noticeable.

khchen added a subscriber: khchen.Nov 30 2020, 6:29 PM

Hi all,

a couple of comments.

  1. First regarding the non-matched aliases: we also observed this issue when working on that and we deployed a bit of a hack. We added a new attribute AllowsNoRegister
// AllowsNoRegister - Specify that instructions may use NoRegister (0) instead
// of a physical register for an operand of this register class.
bit AllowsNoRegister = 0;

And then we extended VMV0 register class with this attribute

def VMV0 : RegisterClass<"RISCV", VMaskVT.types, 64, (add V0)> {
  let Size = 64;
  let AllowsNoRegister = 1;
}

and then we updated matchAliasCondition in MCInstPrinter.cpp to look like this (after having implemented some needed plumbing to bring the bit where used)

case AliasPatternCond::K_RegClass:
  // Operand must be a register in this class. Value is a register class id. 
  return Opnd.isReg() && (MRI.getRegClass(C.Value).contains(Opnd.getReg()) ||
                          (MRI.getRegClass(C.Value).allowsNoRegister() &&    
                           Opnd.getReg() == 0));

but this might feel a bit hackish, so I'm not suggesting this approach (unless there is a general sentiment that it is sensible).

  1. Regarding @earlyclobber, yes it is not ideal as it is now. But it doesn't only impact masking (though perhaps that is the clearest case).

Unless the ISA has changed (and I admit I might be a bit outdated here) my understanding is that we still have cases where @earlyclobber will be needed and it will still fall short.

For instance, mixed widenings such as vwadd.wv are complex to model because iirc an instruction like vwadd.wv v2, v2, v1 is fine while vwadd.wv v2, v2, v3 is not. I understand that using @earlyclobber a thing like vwadd.wv v4, v2, v1 is the best we can get.

I asked about this here http://lists.llvm.org/pipermail/llvm-dev/2020-May/141383.html

One of the suggestions made in that thread is teaching RA new tricks, which I don't feel qualified to do :) (I envision that if RA were able to dynamically limit the valid registers, as it assigns registers to operands, by virtue of some target-specific constraints, then, perhaps, this would be doable. But tbh I have no idea such approach is even feasible in the current RA. If we could do that we could also solve the case where v0 can't be used as the destination of masked instructions).

The other suggestion is to amend the instructions after RA in a specific pass. Seems doable at first but I am afraid this is going to be testing hell just to make sure no instruction manages to escape from that pass incorrectly.

I personally don't oppose duplicating all the instructions between mask and unmasked if this simplifies things. Just let's be conscious that, for instructions with a mask, we would go from 7 pseudos per actual instruction to 14. Perhaps the impact is not that noticeable.

Thanks for your input, Roger. I think we can defer any work on the InstAliases until we know what we're doing about masked/unmasked operations. If we duplicate the instructions we're going to lose the only motivation (that I know of) to fix this in the InstAlias support.

Does this question affect the scope of D89449 at all, or shall we address that question later?

arcbbb added a comment.Dec 3 2020, 1:20 AM
  1. Regarding @earlyclobber, yes it is not ideal as it is now. But it doesn't only impact masking (though perhaps that is the clearest case).

Unless the ISA has changed (and I admit I might be a bit outdated here) my understanding is that we still have cases where @earlyclobber will be needed and it will still fall short.

For instance, mixed widenings such as vwadd.wv are complex to model because iirc an instruction like vwadd.wv v2, v2, v1 is fine while vwadd.wv v2, v2, v3 is not. I understand that using @earlyclobber a thing like vwadd.wv v4, v2, v1 is the best we can get.

I asked about this here http://lists.llvm.org/pipermail/llvm-dev/2020-May/141383.html

Thank you for the sharing, @rogfer01.
This is a good case that shows I cannot get an ideal RA by just duplicating a set of masked/unmasked pseudos.
In this vwadd.wv case,

; Assume LMUL=M1, EMUL=M2
; Widening signed integer add, 2*SEW = 2*SEW +/- SEW
vwadd.wv  {v2,v3}, {v2 v3}, v4 ; valid: vd eew == vs2 eew
vwadd.wv  {v4,v5}, {v2 v3}, v5 ; valid: vd eew > vs1 eew, overlap is in the highest-numbered part.
vwadd.wv  {v4,v5}, {v2 v3}, v4 ; invalid

A register class constraint still falls short for this.

  1. Regarding @earlyclobber, yes it is not ideal as it is now. But it doesn't only impact masking (though perhaps that is the clearest case).

Unless the ISA has changed (and I admit I might be a bit outdated here) my understanding is that we still have cases where @earlyclobber will be needed and it will still fall short.

For instance, mixed widenings such as vwadd.wv are complex to model because iirc an instruction like vwadd.wv v2, v2, v1 is fine while vwadd.wv v2, v2, v3 is not. I understand that using @earlyclobber a thing like vwadd.wv v4, v2, v1 is the best we can get.

I asked about this here http://lists.llvm.org/pipermail/llvm-dev/2020-May/141383.html

Thank you for the sharing, @rogfer01.
This is a good case that shows I cannot get an ideal RA by just duplicating a set of masked/unmasked pseudos.
In this vwadd.wv case,

; Assume LMUL=M1, EMUL=M2
; Widening signed integer add, 2*SEW = 2*SEW +/- SEW
vwadd.wv  {v2,v3}, {v2 v3}, v4 ; valid: vd eew == vs2 eew
vwadd.wv  {v4,v5}, {v2 v3}, v5 ; valid: vd eew > vs1 eew, overlap is in the highest-numbered part.
vwadd.wv  {v4,v5}, {v2 v3}, v4 ; invalid

A register class constraint still falls short for this.

Although some cases are unavoidable to use @earlyclobber to conform the instruction constraints, it is still beneficial to model the pseudo instructions using different classes for mask and unmasked instructions. It could avoid unnecessary copies between vector registers for @earlyclobber constraint. Vector copy should be expensive and it hurts performance of vector code.

One of the suggestions made in that thread is teaching RA new tricks, which I don't feel qualified to do :) (I envision that if RA were able to dynamically limit the valid registers, as it assigns registers to operands, by virtue of some target-specific constraints, then, perhaps, this would be doable. But tbh I have no idea such approach is even feasible in the current RA. If we could do that we could also solve the case where v0 can't be used as the destination of masked instructions).

The other suggestion is to amend the instructions after RA in a specific pass. Seems doable at first but I am afraid this is going to be testing hell just to make sure no instruction manages to escape from that pass incorrectly.

IMO, this last suggestion is no different from the first one: new RA tricks. Like you, I don't feel qualified, but I also feel that it's a rabbit hole too, especially the last suggestion, in terms of maintenance.

Although some cases are unavoidable to use @earlyclobber to conform the instruction constraints, it is still beneficial to model the pseudo instructions using different classes for mask and unmasked instructions. It could avoid unnecessary copies between vector registers for @earlyclobber constraint. Vector copy should be expensive and it hurts performance of vector code.

I agree, this is the right thing to do.

llvm/test/CodeGen/MIR/RISCV/aliases-v.mir
34 ↗(On Diff #308036)

Is there any reason why the bodies are outside the definitions? It's a bit confusing...

rebase & move test

@evandro I'm not sure what your comment was specifically referring to, but I've moved the test into the CodeGen/RISCV so I'll continue it here. If you were asking why the checks were in an odd place, it's because I can't use update_mir_test_checks.py because it requires MIR output, and update_llc_test_checks.py puts the CHECKs under the IR functions. So instead I just manually put the checks in the MIR bodies.

lenary resigned from this revision.Thu, Jan 14, 9:42 AM