Page MenuHomePhabricator

[RISCV] Do not mandate scheduling for CSR instructions
ClosedPublic

Authored by evandro on Aug 5 2020, 3:09 PM.

Details

Summary

Scheduling information is of little value when they may disrupt the pipeline. This patch allows omitting the scheduling information for CSR instructions while still setting SchedMachineModel::CompleteModel. For specific cases, any scheduling information added will be used by the scheduler.

Diff Detail

Event Timeline

evandro created this revision.Aug 5 2020, 3:09 PM
evandro requested review of this revision.Aug 5 2020, 3:09 PM
lenary added a comment.Aug 6 2020, 1:44 AM

LGTM!

Given the comment on isBarrier in Target.td, I'm not sure it applies: bit isBarrier = 0; // Can control flow fall through this instruction?

jrtc27 added a comment.Aug 6 2020, 5:59 AM

Have you seen this actually occur? The fact that hasSideEffects is set surely means that it cannot possibly be duplicated, nor scheduled out-of-order with other side-effect-affected instructions? Also, the only CSR instructions that currently get generated _by CodeGen_ are reads of cycle[h], which are very well-behaved. Naive implementations might still take a slow path just because they're CSR instructions, but it'd be very easy microarchitecturally to quickly read the cycle counter in an ALU as if it were just a normal GPR move. In such cases you really would want to be able to express scheduling information for it just like other arithmetic operations.

evandro added a comment.EditedAug 11 2020, 2:48 PM

hasSideEffects may imply isNotDuplicable, especially when rematerializing, but the latter prevents duplication more extensively in the middle end (e.g., tail end duplication).
Indeed, isReMaterializable (default to false) applies here too.

Architecting reading the cycle counter to be like a GPR is not a given, since it's usually a side band register away from the register file. Therefore, such instructions follow unusual paths in the architecture, when methinks that hasNoSchedulingInfo makes sense.

Moreover, though the CSR instructions may be used as RO, they are defined as RMW instructions. But this probably just backs up hasSideEffects.

And not all code piped through the LLVM IR comes from CodeGen. Thus, it's important to describe the instructions as faithfully as possible to the ISA, regardless of how they are used in the compiler.

hasSideEffects may imply isNotDuplicable, especially when rematerializing, but the latter prevents duplication more extensively in the middle end (e.g., tail end duplication).

But why does that matter? If LLVM decides it's better to fork a code path then let it, whether it's a CSR instruction or not really does not matter, it's just I-cache usage. In fact that's another reason why I _wouldn't_ want this change to go through.

Architecting reading the cycle counter to be like a GPR is not a given, since it's usually a side band register away from the register file. Therefore, such instructions follow unusual paths in the architecture, when methinks that hasNoSchedulingInfo makes sense.

Did you read what I said? I didn't say it was a given. I said it was a possibility, and something that _should_ be done in a real high-end core, but no such implementation yet exists. The performance monitoring counters are _meant_ to be cheap to access per the spec ("Additional counters should be provided to help diagnose performance problems and these should be made accessible from user-level application code with low overhead.") and that's key to not perturbing the behaviour of the system by probing it. Yes, _some_ implementations (currently all that I know of) make them expensive to read. But that does not mean we should declare that _all_ implementations forever will, and having scheduling info is _key_ to being able to capture that. If your architecture makes them really expensive then go and tell LLVM that by writing a processor-specific schedule description. But don't try and enforce a one-size-fits-all approach for something where that's not true.

Moreover, though the CSR instructions may be used as RO, they are defined as RMW instructions. But this probably just backs up hasSideEffects.

Depends what you do with them. But even if they're RO, marking them hasSideEffects is I believe still important to ensure LLVM doesn't merge them as they're volatile reads.

And not all code piped through the LLVM IR comes from CodeGen. Thus, it's important to describe the instructions as faithfully as possible to the ISA, regardless of how they are used in the compiler.

Yes and no. There's currently no way to express IR that asks for anything other than RDCYCLE[H], so any frontend can only ever generate that; you'd have to hook into the MachineInstr bits and MC layer directly if you really wanted to emit anything else without the use of inline assembly (which is not subject to scheduling, so your flags wouldn't affect that) and that's really not encouraged, nor am I aware of anyone who does that. You are right though in the sense that we shouldn't assume RDCYCLE[H] will always be the only things emitted. But that potentially just means the performance counters should be separated out from the other CSRs if implementations exist (or are created) that behave differently between the two. Importantly, though, we should not sacrifice expressivity and flexibility by limiting all CSR instructions in this way. As I've said before and will keep saying, that is up to your machine scheduler, not the generic instruction info.

So I am still very against the entire concept of this patch. Before you reply next, please carefully read what I wrote last time and what I've written this time, because what you are saying leads me to believe you have not digested it; you're certainly not addressing the points I actually made.

hasSideEffects may imply isNotDuplicable, especially when rematerializing, but the latter prevents duplication more extensively in the middle end (e.g., tail end duplication).

But why does that matter? If LLVM decides it's better to fork a code path then let it, whether it's a CSR instruction or not really does not matter, it's just I-cache usage. In fact that's another reason why I _wouldn't_ want this change to go through.

Control registers may have their own requirements that are often akin to volatile memory. In my view this is equivalent to hasSideEffects and isNotDuplicable.

Architecting reading the cycle counter to be like a GPR is not a given, since it's usually a side band register away from the register file. Therefore, such instructions follow unusual paths in the architecture, when methinks that hasNoSchedulingInfo makes sense.

Did you read what I said? I didn't say it was a given. I said it was a possibility, and something that _should_ be done in a real high-end core, but no such implementation yet exists. The performance monitoring counters are _meant_ to be cheap to access per the spec ("Additional counters should be provided to help diagnose performance problems and these should be made accessible from user-level application code with low overhead.") and that's key to not perturbing the behaviour of the system by probing it. Yes, _some_ implementations (currently all that I know of) make them expensive to read. But that does not mean we should declare that _all_ implementations forever will, and having scheduling info is _key_ to being able to capture that. If your architecture makes them really expensive then go and tell LLVM that by writing a processor-specific schedule description. But don't try and enforce a one-size-fits-all approach for something where that's not true.

High end cores are the least likely to implement this possibility. This is the state of things among all such cases currently.

But you are misinterpreting what hasNoSchedulingInfo does:

This instruction is not expected to be queried for scheduling latencies and therefore needs no scheduling information even for a complete scheduling model.

IOW, it just makes TableGen happy if the its scheduling information is not provided. If it is, it will continue to be happy, and so will the scheduler.

asb added a comment.Aug 20 2020, 2:28 AM

The description of isNotDuplicable in MachineInstr.h is:

/// Return true if this instruction cannot be safely duplicated.
/// For example, if the instruction has a unique labels attached
/// to it, duplicating it would cause multiple definition errors.
bool isNotDuplicable(QueryType Type = AnyInBundle) const {
  return hasProperty(MCID::NotDuplicable, Type);
}

I don't think this obviously applies to CSRs, and the property doesn't seem to be applied to instructions that modify control registers for other in-tree targets.

evandro updated this revision to Diff 287768.Aug 25 2020, 1:31 PM
evandro retitled this revision from [RISCV] Disparage CSR instructions to [RISCV] Do not mandate scheduling for CSR instructions.
evandro edited the summary of this revision. (Show Details)
asb accepted this revision.Sep 3 2020, 4:55 AM

LGTM, thanks.

This revision is now accepted and ready to land.Sep 3 2020, 4:55 AM
evandro closed this revision.Sep 22 2020, 10:47 AM