This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add sched to pseudo function call instructions
ClosedPublic

Authored by pcwang-thead on Apr 12 2022, 2:21 AM.

Details

Summary

To fix llvm-mca's error of 'found an unsupported instruction
in the input assembly sequence.' caused by the lack of
scheduling info.

Pseudo function call instructions will be expanded to auipc
and jalr, so their scheduling info are the combination of
two.

This may not be the right way to fix the error. I have tried
to expand calls in RISCVAsmParser but it failed because of
some encoding errors.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:21 AM
pcwang-thead requested review of this revision.Apr 12 2022, 2:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:22 AM
craig.topper added inline comments.Apr 12 2022, 9:36 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

Don't exceed 80 characters per line.

1282–1283

I'm not sure it's correct to have more than one Write scheduling class with only a single output register. Each Write is supposed to correspond to the latency of the register.

pcwang-thead marked an inline comment as done.Apr 13 2022, 12:14 AM
pcwang-thead added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

According to comments of Sched class in llvm/include/llvm/Target/TargetSchedule.td:

One SchedWrite type must be listed for each explicit def operand in order. Additional SchedWrite types may optionally be listed for implicit def operands.

It is legal to add more Writes for implicit operands and I see a lot of definitions like this in other targets. Example in AArch64:

def TLSDESC_CALLSEQ
    : Pseudo<(outs), (ins i64imm:$sym),
             [(AArch64tlsdesc_callseq tglobaltlsaddr:$sym)]>,
      Sched<[WriteI, WriteLD, WriteI, WriteBrReg]>;
craig.topper added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

My understanding of "implicit def operands" would be physical registers listed in let Defs = [] syntax.

@andreadb is this the right thing to do?

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

I'm not for sure. Here is an example in llvm/lib/Target/AArch64/AArch64InstrInfo.td:

let isCall = 1, Defs = [LR, X0, X1], hasSideEffects = 1, Size = 16,
    isCodeGenOnly = 1 in
def TLSDESC_CALLSEQ
    : Pseudo<(outs), (ins i64imm:$sym),
             [(AArch64tlsdesc_callseq tglobaltlsaddr:$sym)]>,
      Sched<[WriteI, WriteLD, WriteI, WriteBrReg]>;

Defs are 3 registers ([LR, X0, X1]) and there is no output register, while there are 4 writes.

andreadb added inline comments.Apr 21 2022, 10:31 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

Sorry for the late reply (I am just back from holiday).

My understanding of "implicit def operands" would be physical registers listed in let Defs = [] syntax.

@andreadb is this the right thing to do?

Your understanding is correct.

Each write is associated with a specific register definition. The sequence of writes matters, and implicit definitions always follow explicit definitions. Implicit definitions come from the let Defs = [...] directive.

To answer to your question:

It is legal to declare a scheduling class with more writes than register definitions.
The subtarget emitter doesn't consider it a user error, and a valid scheduling class would be generated for it.

The generated scheduling class would declare a set of MCWriteProcResEntry entries which also includes contributions from the extra write(s). However, that class would declare more MCWriteLatencyEntry than register definitions.

Scheduling algorithms would still work fine because method MCSchedModel::computeInstrLatency always iterates over the fill set of MCWriteLatencyEntry when computing the "max" instruction latency.

Method TargetSchedModel::computeOperandLatency (lib/CodeGen/TargetSchedule.cpp) can be used by scheduling algorithms to query the latency of specific register definition. That method would still work fine (simply because it would never be queried for an invalid (e.g. off-by-one) DefOperIdx value.

In conclusion: resource consumption related to extra writes would be added to the set of processor resources. Extra writes can still affect the total instruction latency. However, scheduling algorithms won't be able to associate those to any particular register operand.

Ideally, if possible, it would be better to have a matching number of writes. That's just my opinion though.

The TLSDESC_CALLSEQ pseudo that Wang mentioned in his last comment is associated with a scheduling class descriptor named WriteI_WriteLD_WriteI_WriteBrReg (whose definition can be found in AArch64GenSubtargetInfo.inc).
That scheduling class reports the right number of MCWriteProcResEntry (it also includes contributions from the "extra" write, i.e. WriteBrReg).
The "problem" is that it declares 4 MCWriteLatencyEntry, but the instruction only specifies 3 (implicit) register writes.
So, the fourth write latency cannot be attributed to any register operand in particular.

I hope it helps.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

Thank you for your detailed explanation! I have learnt a lot from it!

So such definitions in this patch and other targets won't cause errors and scheduler will work fine, the only 'problem' is that we can't associate extra writes to any specific register. But I think it is not a problem in this patch, because all calls are scheduling boundaries (sees isSchedBoundary in CodeGen/MachineScheduler.cpp) and llvm-mca simply sets max latency of calls to 100 cycles.

Correct me if I am wrong, this patch makes no difference to scheduler, just makes llvm-mca happy. :)

andreadb added inline comments.Apr 22 2022, 3:32 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1282–1283

Thank you for your detailed explanation! I have learnt a lot from it!

So such definitions in this patch and other targets won't cause errors and scheduler will work fine, the only 'problem' is that we can't associate extra writes to any specific register. But I think it is not a problem in this patch, because all calls are scheduling boundaries (sees isSchedBoundary in CodeGen/MachineScheduler.cpp) and llvm-mca simply sets max latency of calls to 100 cycles.

Correct me if I am wrong, this patch makes no difference to scheduler, just makes llvm-mca happy. :)

It makes no difference to scheduling algorithms because - as you wrote - calls are treated like scheduling boundaries anyway.

I was a bit surprised to see that this patch is required for llvm-mca.

Is it because the RISCV assembly parser can match assembly against a "Pseudo" MCInst? For example, does this means that call foo is parsed as MCInst PseudoCALL $foo ?

If so, then the change looks good to me.

This revision is now accepted and ready to land.Apr 22 2022, 9:01 AM
This revision was landed with ongoing or failed builds.Apr 23 2022, 11:58 PM
This revision was automatically updated to reflect the committed changes.

I think using two Write is not accurate because the latency are not added together but is the max one. It consumes resource simultaneously, but the expanded instructions auipc and jalr do have data dependency.
I think it should create a new SchedWrite such as WritePesudoCall and the cycle is sum up of WriteIALU and WriteJalr.

let Latency = 2 in
def : WriteRes<WritePesudoCall, [RocketUnitALU, RocketUnitB]>;

Or we can make it simple with leveraging WriteSequence to create WritePesudoCall.

def WritePesudoCall : WriteSequence<[WriteIALU, WriteJalr]>;

I think using two Write is not accurate because the latency are not added together but is the max one. It consumes resource simultaneously, but the expanded instructions auipc and jalr do have data dependency.
I think it should create a new SchedWrite such as WritePesudoCall and the cycle is sum up of WriteIALU and WriteJalr.

let Latency = 2 in
def : WriteRes<WritePesudoCall, [RocketUnitALU, RocketUnitB]>;

Or we can make it simple with leveraging WriteSequence to create WritePesudoCall.

def WritePesudoCall : WriteSequence<[WriteIALU, WriteJalr]>;

I think it is unnecessary, since calls won't be part of scheduling regions.
As we discussed above, this patch just wants to make llvm-mca happy. :)