This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Make Latency/ResourceCycles relevant to LMUL
AcceptedPublic

Authored by wangpc on Mar 16 2023, 1:00 AM.

Details

Summary

When modeling vector WriteRes, there are some fields that we can
specify to model its costs like Latency, ResourceCycles, etc.

For Latency, it may not be relevant to LMUL with mechanisms like
chaining[1].

But for ResourceCycles, it may be different. The cycles of some
resources can be relevant to LMUL. For example, the generation and
issuing of uops.

In this patch, we add two new template parameter latency and
resourceCycles. The latency is a function that accepts LMUL and
SEW and returns cycles. The resourceCycles is a list of such
function, each presents the cycles of resource.

We provide pre-defined function fixed that returns a function who
returns fixed-value to model latency/resources which are not relevant
to LMULs. User may definde their own functions according to their
processor model.

References:
[1] Chaining (vector processing)

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 1:00 AM
pcwang-thead requested review of this revision.Mar 16 2023, 1:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 1:00 AM
pcwang-thead edited the summary of this revision. (Show Details)Mar 16 2023, 1:03 AM

One concern I have is that a microarchitecture may wish to have more flexibility over the number of resource cycles for each LMUL than the multiplier subroutine allows for. I am imagining a scenario where the number of resource cycles for different LMUL is more complex than multiplication by the LMUL factor. Something like this would allow for maximum flexibility:

for mx in MxList {
  defvar RC = ...; // Something that may be more complex than BaseCycles * multiplier
  defvar L = ...; // Maybe latency is still relevant, even if it is less important than ResourceCycles
  let ResourceCycles = [RC], Latency = L in {
    ...
  }
}
pcwang-thead added a comment.EditedMar 16 2023, 6:57 PM

One concern I have is that a microarchitecture may wish to have more flexibility over the number of resource cycles for each LMUL than the multiplier subroutine allows for. I am imagining a scenario where the number of resource cycles for different LMUL is more complex than multiplication by the LMUL factor. Something like this would allow for maximum flexibility:

for mx in MxList {
  defvar RC = ...; // Something that may be more complex than BaseCycles * multiplier
  defvar L = ...; // Maybe latency is still relevant, even if it is less important than ResourceCycles
  let ResourceCycles = [RC], Latency = L in {
    ...
  }
}

Yes, I wanted it to be flexible as what you described.
But for Latency and ResourceCycles, both of them are TableGen compile-time constants (of course we can override them via some target hooks, but it is off the table), so there is no way to specify them as custom handling code.
I tried to model them as something like below:

  1. We pass subroutines to LMULWriteResImpl in RISCVSchedule.td
multiclass LMULWriteResImpl<string name, list<ProcResourceKind> resources,
                            LatencySubroutine latencySubroutine,
                            list<ResourceCycleSubroutine> resourceCycleSubroutines>{
  for mx in MxList {
    defvar RC = apply resourceCycleSubroutines to mx; // It acts like calling these subroutines.
    defvar L =apply latencySubroutine to mx; // Same as above.
    let ResourceCycles = [RC], Latency = L in {
      ...
    }
  }
}
  1. We define these subroutines in custom scheduling model RISCVSchedXXX.td
class CustomSubroutine1<string mx> {
  list<int> ResourceCycles = ...; // Custom handling of different LMULs.
}
class CustomSubroutine2<string mx>{
......
}
……

But for TableGen, we can't pass functions since it is a template description language, so we can't achieve something like this (if I understand TableGen correctly). So I think we may define some pre-defined subroutines like fixed, multiplier and so on in RISCVScheduleV.td, and then users can use them in their scheduling models. If there are some microarchitectures that can't be modeled, just add a new subroutine to upstream if approved.

If there are some microarchitectures that can't be modeled, just add a new subroutine to upstream if approved.

Does this mean that subtarget routines must be added to the RISCVScheduleV file since the following function needs to know about the custom subroutine to do its isa checks:

// Helper class for generating a list of resource cycles of different LMULs.
class ResourceCycles<list<ResourceCycle> resourceCycles, string mx> {

I am concerned that the RISCVScheduleV file will take on bloat due to holding subtarget related routines if this is the case.

If there are some microarchitectures that can't be modeled, just add a new subroutine to upstream if approved.

Does this mean that subtarget routines must be added to the RISCVScheduleV file since the following function needs to know about the custom subroutine to do its isa checks:

// Helper class for generating a list of resource cycles of different LMULs.
class ResourceCycles<list<ResourceCycle> resourceCycles, string mx> {

I am concerned that the RISCVScheduleV file will take on bloat due to holding subtarget related routines if this is the case.

Yes. So I posted this patch here just to discuss how we should handle this.
For example, solutions may be:

  1. Add routines to RISCVScheduleV.td just as what I have done.
  2. Extend TableGen to support pass functions:
// Supposes that we have a Function class to present a function object that its parameters are function parameters.
class TargetSubroutine<int base, string mx> : Function;

// Then. Supposes that we have a new bang operator to apply this function to input parameters and the result is `ret`.
class ResourceCycles<list<TargetSubroutine> subroutines, int base, string mx> {
  list<int> value = !foreach(subroutine, subroutines,
                             !apply(subroutine, base, mx)
                            );
}

// In SchedXXX.td, we can define our own routines.
class Multiplier<int base, string mx>:TargetSubroutine {
 // We return an int value calculated from mx.
 int ret = !mul(base, multiplier<mx>.value);
}
  1. Some templates are flexible to specify cycles according to LMULs (I haven't figured out one...).
michaelmaitland added a comment.EditedMar 20 2023, 11:08 AM

If there are some microarchitectures that can't be modeled, just add a new subroutine to upstream if approved.

Does this mean that subtarget routines must be added to the RISCVScheduleV file since the following function needs to know about the custom subroutine to do its isa checks:

// Helper class for generating a list of resource cycles of different LMULs.
class ResourceCycles<list<ResourceCycle> resourceCycles, string mx> {

I am concerned that the RISCVScheduleV file will take on bloat due to holding subtarget related routines if this is the case.

Yes. So I posted this patch here just to discuss how we should handle this.
For example, solutions may be:

  1. Add routines to RISCVScheduleV.td just as what I have done.
  2. Extend TableGen to support pass functions:
// Supposes that we have a Function class to present a function object that its parameters are function parameters.
class TargetSubroutine<int base, string mx> : Function;

// Then. Supposes that we have a new bang operator to apply this function to input parameters and the result is `ret`.
class ResourceCycles<list<TargetSubroutine> subroutines, int base, string mx> {
  list<int> value = !foreach(subroutine, subroutines,
                             !apply(subroutine, base, mx)
                            );
}

// In SchedXXX.td, we can define our own routines.
class Multiplier<int base, string mx>:TargetSubroutine {
 // We return an int value calculated from mx.
 int ret = !mul(base, multiplier<mx>.value);
}
  1. Some templates are flexible to specify cycles according to LMULs (I haven't figured out one...).

What stops us from doing something like this:
https://github.com/llvm/llvm-project/blob/0c0468e6df2bcabd207858891c2387357857b0bc/llvm/lib/Target/SystemZ/SystemZScheduleZ13.td#L95https://github.com/llvm/llvm-project/blob/0c0468e6df2bcabd207858891c2387357857b0bc/llvm/lib/Target/SystemZ/SystemZScheduleZ13.td#L95
or
https://github.com/llvm/llvm-project/blob/0c0468e6df2bcabd207858891c2387357857b0bc/llvm/lib/Target/AMDGPU/SISchedule.td#L160
?

Num or 2 could be replaced with something like MyTargetGetCycles<mx>.c without needing to extend the tablegen language. For example in the SchedXXX.td file:

class MyTargetGetCycles<string mx> {
  int c = !cond(
    !eq(mx, "M1") : 1,
    !eq(mx, "M2") : 1,
    !eq(mx, "M4") : 1,
    !eq(mx, "M8") : 1,
    !eq(mx, "MF2") : 1,
    !eq(mx, "MF4") : 1,
    !eq(mx, "MF8") : 1,
    !eq(mx, "UpperBound") : 1
  );
}

foreach mx = SchedMxList in {
  defvar Cycles = MyTargetGetCycles<mx>.c;
  let Latency = Cycles, ResourceCycles = [Cycles] in {
    defm "" : LMULWriteResMX<"WriteVLDE",   [MyTargetSomeResource], mx>;
    defm "" : LMULWriteResMX<"WriteVSTE",   [MyTargetSomeResource], mx>;
  }
}

If there are some microarchitectures that can't be modeled, just add a new subroutine to upstream if approved.

Does this mean that subtarget routines must be added to the RISCVScheduleV file since the following function needs to know about the custom subroutine to do its isa checks:

// Helper class for generating a list of resource cycles of different LMULs.
class ResourceCycles<list<ResourceCycle> resourceCycles, string mx> {

I am concerned that the RISCVScheduleV file will take on bloat due to holding subtarget related routines if this is the case.

Yes. So I posted this patch here just to discuss how we should handle this.
For example, solutions may be:

  1. Add routines to RISCVScheduleV.td just as what I have done.
  2. Extend TableGen to support pass functions:
// Supposes that we have a Function class to present a function object that its parameters are function parameters.
class TargetSubroutine<int base, string mx> : Function;

// Then. Supposes that we have a new bang operator to apply this function to input parameters and the result is `ret`.
class ResourceCycles<list<TargetSubroutine> subroutines, int base, string mx> {
  list<int> value = !foreach(subroutine, subroutines,
                             !apply(subroutine, base, mx)
                            );
}

// In SchedXXX.td, we can define our own routines.
class Multiplier<int base, string mx>:TargetSubroutine {
 // We return an int value calculated from mx.
 int ret = !mul(base, multiplier<mx>.value);
}
  1. Some templates are flexible to specify cycles according to LMULs (I haven't figured out one...).

What stops us from doing something like this:
https://github.com/llvm/llvm-project/blob/0c0468e6df2bcabd207858891c2387357857b0bc/llvm/lib/Target/SystemZ/SystemZScheduleZ13.td#L95https://github.com/llvm/llvm-project/blob/0c0468e6df2bcabd207858891c2387357857b0bc/llvm/lib/Target/SystemZ/SystemZScheduleZ13.td#L95
or
https://github.com/llvm/llvm-project/blob/0c0468e6df2bcabd207858891c2387357857b0bc/llvm/lib/Target/AMDGPU/SISchedule.td#L160
?

Num or 2 could be replaced with something like MyTargetGetCycles<mx>.c without needing to extend the tablegen language. For example in the SchedXXX.td file:

class MyTargetGetCycles<string mx> {
  int c = !cond(
    !eq(mx, "M1") : 1,
    !eq(mx, "M2") : 1,
    !eq(mx, "M4") : 1,
    !eq(mx, "M8") : 1,
    !eq(mx, "MF2") : 1,
    !eq(mx, "MF4") : 1,
    !eq(mx, "MF8") : 1,
    !eq(mx, "UpperBound") : 1
  );
}

foreach mx = SchedMxList in {
  defvar Cycles = MyTargetGetCycles<mx>.c;
  let Latency = Cycles, ResourceCycles = [Cycles] in {
    defm "" : LMULWriteResMX<"WriteVLDE",   [MyTargetSomeResource], mx>;
    defm "" : LMULWriteResMX<"WriteVSTE",   [MyTargetSomeResource], mx>;
  }
}

That's because LMULWriteRes is already LMUL-relevant and we have looped SchedMxList in LMULWriteResImpl. If we loop it again, the result would be weird.
Another approach is to define something like LMULWriteResImpl in SchedXXX.td and override Latency and ResourceCycles according to LMUL. But if so, why bother to define some boilerplates like LMULWriteRes in RISCVScheduleV.td?

Then yea, I think (2) is probably the approach that feels most natural to me.

This comment was removed by michaelmaitland.

Why can't ResourceCycles be the base class that just contains a list of integers. Other classes inherit that and construct the list however they want. A fixed class could take a cycle count and put that value in every entry in the list. The lmul scaled class could take an cycle and mutliply.

Why can't ResourceCycles be the base class that just contains a list of integers. Other classes inherit that and construct the list however they want. A fixed class could take a cycle count and put that value in every entry in the list. The lmul scaled class could take an cycle and mutliply.

I think the reason is that we need LMUL info to generate the list but we can't get it in SchedXXX.td.
We had a complex implementation which seems to be likely what you described(if I understand correctly), I will upload it later. :-)

Why can't ResourceCycles be the base class that just contains a list of integers. Other classes inherit that and construct the list however they want. A fixed class could take a cycle count and put that value in every entry in the list. The lmul scaled class could take an cycle and mutliply.

I think the reason is that we need LMUL info to generate the list but we can't get it in SchedXXX.td.
We had a complex implementation which seems to be likely what you described(if I understand correctly), I will upload it later. :-)

Why do we need LMUL info?

We can have a class that contains an 8 entry list of resource cycles for each LMUL plus upper bound. We can have derived classes that construct this list based on common cases.

LMULWriteResImpl can index into the list to the entry corresponding to the LMUL. Nothing in RISCVScedule.td needs to know how the list was constructed.

pcwang-thead added a comment.EditedMar 30 2023, 2:02 AM

Why can't ResourceCycles be the base class that just contains a list of integers. Other classes inherit that and construct the list however they want. A fixed class could take a cycle count and put that value in every entry in the list. The lmul scaled class could take an cycle and mutliply.

I think the reason is that we need LMUL info to generate the list but we can't get it in SchedXXX.td.
We had a complex implementation which seems to be likely what you described(if I understand correctly), I will upload it later. :-)

Why do we need LMUL info?

We can have a class that contains an 8 entry list of resource cycles for each LMUL plus upper bound. We can have derived classes that construct this list based on common cases.

LMULWriteResImpl can index into the list to the entry corresponding to the LMUL. Nothing in RISCVScedule.td needs to know how the list was constructed.

Oh I get it. The key point is that we can't index list by dynamic index(?), the index can only be constant:

[build] llvm/lib/Target/RISCV/RISCVScheduleV.td:63:82: error: Variable not defined: 'i'
[build]   defvar i = IndexOfLMUL<mx>.value;
[build]   list<int> value = !foreach(resourceCycle, resourceCycles, resourceCycle.Cycles[i]);
                                                                                         ^

OK, I just know how to do it.

pcwang-thead retitled this revision from [RISCV] Make ResourceCycles relevant to LMUL to [RISCV] Make Latency/ResourceCycles relevant to LMUL.Mar 30 2023, 3:37 AM
pcwang-thead edited the summary of this revision. (Show Details)
pcwang-thead edited the summary of this revision. (Show Details)
pcwang-thead edited the summary of this revision. (Show Details)

@craig.topper Is this what you suggest? It seems to be OK for cases where only LMUL is taken into consideration. But when both SEW and LMUL are accounted for, would it be too complicated to generate two-dimension lists?

  • Fix errors.
  • Rename index to IndexByLMUL.
  • Support WorstCase.
  • Rebase.
  • Use function.
pcwang-thead edited the summary of this revision. (Show Details)Apr 21 2023, 5:37 AM
michaelmaitland accepted this revision.Apr 21 2023, 9:49 AM

LGTM.

llvm/lib/Target/RISCV/RISCVScheduleV.td
75

When we call latency("WorstCase") and resourceCycle("WorstCase"), we're treating WorstCase as an LMUL value since we're passing it as the parameter that is used to pass LMUL. The last few changes to this file have aimed to move away from this by trying to have WorstCase mean worst case SchedWrite, not mean worst case LMUL.

We still need to get the Latency and ResourceCycles for the worst case WriteRes though, and it would make sense to get it from this list. I thought about a solution where we pass a boolean parameter which signifies to return the WorstCase value:

function MyCyclesFunc() : function<bit, int, string, int> {
  return function(bit isWorstCase, string lmul = M1, int sew = 0,): int {
    return !if(isWorstCase : 10, !cond( /* return the lmul&sew cycles */);
  };
}

However, calling latency(true) feels worst than calling latency("WorstCase"). It also makes the body of the lambda messier. As a result, I am willing to concede to passing WorstCase` to these functions as the LMUL parameter. Curious if anyone has any input here.

This revision is now accepted and ready to land.Apr 21 2023, 9:49 AM
evandro removed a subscriber: evandro.Jun 12 2023, 2:32 PM
wangpc commandeered this revision.Jul 5 2023, 12:47 AM
wangpc added a reviewer: pcwang-thead.
wangpc removed a reviewer: pcwang-thead.