This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Define vmclr.m/vmset.m intrinsics.
ClosedPublic

Authored by khchen on Dec 27 2020, 7:18 PM.

Details

Summary

Define vmclr.m/vmset.m intrinsics and lower to vmxor.mm/vmxnor.mm.

Ideally all rvv pseudo instructions could be implemented in C header,
but those two instructions don't take an input, codegen can not guarantee
that the source register becomes the same as the destination.

I define two pseduo insts, one would expand to vsetvli + pseduo-v-inst,
then pseudo-v-inst would expand to corresponding v-inst.

Diff Detail

Event Timeline

khchen created this revision.Dec 27 2020, 7:18 PM
khchen requested review of this revision.Dec 27 2020, 7:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2020, 7:18 PM
khchen updated this revision to Diff 313814.Dec 27 2020, 7:20 PM

update ceudoinstructions comments

craig.topper added inline comments.Dec 27 2020, 8:30 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
65

Drop the default and just pass value at both call sites. There's no strong reason for vmset to the be the default behavior so its better to make it explicit.

llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
559

pseduo -> pseudo

llvm/test/CodeGen/RISCV/rvv/vmclr-rv32.ll
3

Can we drop -riscv-no-aliases so that we'll print vmset/vmclr?

khchen updated this revision to Diff 313826.Dec 27 2020, 11:04 PM
  1. Address craig.topper's comments.
  2. Set BaseInstr as corresponding v-inst, so we only need to define one pseudo instruction.
khchen marked 3 inline comments as done.Dec 27 2020, 11:06 PM

LGTM with that change.

llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
240

Just use MBBI->getDebugLoc() here and drop MI.

Out of curiosity, what would the pros/cons be between instead expanding these early in the ISel custom inserter vs. late in the expand-pseudos pass as you've done? I would imagine that the vmxor and vmxnor instructions will have better integration with the rest of the compiler (scheduling info, for instance) so we'd prefer to have those around. Otherwise we might have to add special-cases for these pseudos to treat them like their underlying instructions.

llvm/include/llvm/IR/IntrinsicsRISCV.td
407

nit: the class is called RISCVNullaryIntrinsic producing llvm_anyvector_ty but says it outputs "mask type". Is that too specific? The other intrinsic classes which output "mask type" are called e.g. RISCVMaskXXX. Should this be named e.g. RISCVMaskNullary(Intrinsic)? I don't think we expect other nullaries, do we?

llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
238

Opcode would be a more familiar casing for this variable name.

246

Is there a reason not to use BuildMI(..., DstReg) instead of this extra .addReg?

Out of curiosity, what would the pros/cons be between instead expanding these early in the ISel custom inserter vs. late in the expand-pseudos pass as you've done? I would imagine that the vmxor and vmxnor instructions will have better integration with the rest of the compiler (scheduling info, for instance) so we'd prefer to have those around. Otherwise we might have to add special-cases for these pseudos to treat them like their underlying instructions.

If we expand them early there's no guarantee the register allocator would pick the same register for the inputs as the output. Our downstream repo and probably BSC's repo was expanding them in a custom inserter and putting the same vreg on the inputs and the outputs with the input operands marked with undef. Which seemed to work, but I'm surprised that isn't flagged as a violation of SSA by the machine verifier. Expanding after register allocation allows us to force the source registers explicitly.

I would imagine a high performance out of order processor would want to detect these instructions as a special case of xor/xnor and not read the data from the input register. Whether they would check that the inputs and the output are the same register or just the inputs are the same register I'm not sure. X86 CPUs for many generations have recognized xor eax, eax as special zero idiom that doesn't have a dependency on the previous value of eax. The X86 backend also uses pseudos and a post RA expansion. For scheduling, X86 has various WriteZero classes the pseudos are assigned to and I believe there is special code in most of the scheduler models to detect xor with same input post RA.

khchen updated this revision to Diff 313836.Dec 28 2020, 1:27 AM

Address frasercrmck's comments, thanks!

I did this way jsut because I think PseduoInst should be expanded after RA
which can ensure source and dest use the same register(non-SSA). I didn't
consider the pros/cons yet.
Could you please share me is there any target also expand PsedoInst in the
ISel custom inserter? I'm wondering that in order to get more precise schedule info,
it means ideally any target would prefer expand pseudo as early as possible.
Maybe the another way is adding schedule info for those pseudo insts?

khchen marked 4 inline comments as done.Dec 28 2020, 1:28 AM
frasercrmck accepted this revision.Dec 28 2020, 11:51 AM

If we expand them early there's no guarantee the register allocator would pick the same register for the inputs as the output. Our downstream repo and probably BSC's repo was expanding them in a custom inserter and putting the same vreg on the inputs and the outputs with the input operands marked with undef. Which seemed to work, but I'm surprised that isn't flagged as a violation of SSA by the machine verifier. Expanding after register allocation allows us to force the source registers explicitly.

I would imagine a high performance out of order processor would want to detect these instructions as a special case of xor/xnor and not read the data from the input register. Whether they would check that the inputs and the output are the same register or just the inputs are the same register I'm not sure. X86 CPUs for many generations have recognized xor eax, eax as special zero idiom that doesn't have a dependency on the previous value of eax. The X86 backend also uses pseudos and a post RA expansion. For scheduling, X86 has various WriteZero classes the pseudos are assigned to and I believe there is special code in most of the scheduler models to detect xor with same input post RA.

I similarly don't understand how the pre-RA undef inputs managed to "just work".

That's really interesting to learn about the X86 aliases/idioms, thanks. The processors I've worked on have been rather different so that might explain where I was coming from (see below).

I did this way jsut because I think PseduoInst should be expanded after RA
which can ensure source and dest use the same register(non-SSA). I didn't
consider the pros/cons yet.
Could you please share me is there any target also expand PsedoInst in the
ISel custom inserter? I'm wondering that in order to get more precise schedule info,
it means ideally any target would prefer expand pseudo as early as possible.
Maybe the another way is adding schedule info for those pseudo insts?

My previous (downstream) project had a complex auto-generated per-operand scheduling model with all sorts of bypasses, all to satisfy an exposed pipeline, so it was important for us to have information about the underlying instruction as accurate as possible as early as possible to avoid hundreds of special cases and copy/paste code in the rest of the compiler. So our general approach was to limit the number pseudos that survived past pre-RA scheduling. I can't point to an upstream target that does this. I haven't yet familiarised myself with the RISCV scheduling info so I can't be of much help about the best approach there. If Craig's right then it sounds like it's up to the RISC-V implementation about whether aliases could be "special" from the processor's point of view or whether they're just assembler syntactic-sugar.

I think Craig's comments help resolve the matter about where to do this expansion. So LGTM!

This revision is now accepted and ready to land.Dec 28 2020, 11:51 AM

Those different chooses reason are interesting to learn, I really appreciate your information.

This revision was landed with ongoing or failed builds.Dec 28 2020, 6:57 PM
This revision was automatically updated to reflect the committed changes.