This is an archive of the discontinued LLVM Phabricator instance.

SystemZ scheduling implementation
ClosedPublic

Authored by jonpa on Feb 15 2016, 2:31 AM.

Details

Summary

General review and comments / suggestions would be greatly appreciated for this implementation of instruction scheduling for SystemZ.

It contains some changes outside the SystemZ backend, which will be addressed in separate revisions, but are included here also since they are part of this project.

There are some experimental parts left, which will not be commited, such as statistics and options.

Main points are:

  • post-ra mischeduling with a HazardRecognizer for decoder groups, and a custom SchedStrategy.
  • pre-ra mischeduling enabled.
  • Instruction scheduling classes and definitions for z13, zEC12 and z196. z10 scheudling is not supported.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 47963.Feb 15 2016, 2:31 AM
jonpa retitled this revision from to SystemZ scheduling implementation.
jonpa updated this object.
jonpa added reviewers: atrick, hfinkel.
jonpa added a subscriber: llvm-commits.
jonpa updated this revision to Diff 47989.Feb 15 2016, 6:22 AM

A minor change to make Release builds succeed.

atrick accepted this revision.Feb 15 2016, 11:22 AM
atrick edited edge metadata.

Overall nice job. I won't be able to review your hazard recognizer or any of the SystemZ models.

lib/Target/SystemZ/SystemZISelLowering.cpp
123–125 ↗(On Diff #47989)

You won't be able to rely on this since SelectionDAG scheduler is deprecated. It's just waiting for a replacement. I think you should focus on the right MI scheduler heuristics for your target.

That said, I can see why you did this because your MI scheduler is top-down only so it might be hard for you to control register pressure. As an alternative, we could easily support a two-pass MI scheuler, bottom-up, then top-down.

lib/Target/SystemZ/SystemZInstrInfo.h
171–173

FYI, the "new" machine model is meant to be flexible enough that you don't need to create your own hazard recognizer (you can add predicates and arbitrary pseudo machine resources). However, it's tricky to do that and fine just to use a hazard recognizer when you have complicated decode/issue group constraints.

lib/Target/SystemZ/SystemZScheduleZ13.td
44

In-order scheduling with multiple functional units of the same type is somewhat broken in the generic scheduler. ReservedCycles is only tracking a worst-cast resource availability across all units. It should really be a two-dimensional array. I know this has been fixed out-of-tree at least one target but never pushed back. It would be great if you fixed that!

This revision is now accepted and ready to land.Feb 15 2016, 11:22 AM
jonpa added inline comments.Feb 16 2016, 6:08 AM
lib/Target/SystemZ/SystemZISelLowering.cpp
123–125 ↗(On Diff #47989)

I would hope that the pre-ra MI scheduler is bidirectional, due to the overrideSchedPolicy() call, where this is selected. Is there a better /normal way of doing this, perhaps?

lib/Target/SystemZ/SystemZScheduleZ13.td
44

This unit is handled as a special case by the HazardRecognizer, I guess partly because I couldn't see that the code - as you say - did what I wanted.

That would be interesting to fix... could perhaps this out-of-tree target push this possibly?

jonpa updated this revision to Diff 48068.Feb 16 2016, 6:12 AM
jonpa edited edge metadata.

Latencies of z13 vector instructions corrected.

One more test case updated.

atrick added inline comments.Feb 17 2016, 9:44 AM
lib/Target/SystemZ/SystemZISelLowering.cpp
123–138 ↗(On Diff #48068)

Right, I was looking at your PostRA policy, which is rightly top-down. That said, you might still achieve better register pressure results by a two-pass scheduling approach. I tried hard to wedge all heuristics into a single pass because I was paranoid about compile time.

Ultimately it's whatever works for your target, I was just pointing out that

  • SelectionDAG is a bad place for scheduler heuristics
  • We could support a multiple-pass MI scheduler if anyone needs it
lib/Target/SystemZ/SystemZScheduleZ13.td
45

It wasn't a complete/general fix. But yes, I'll encourage anyone I can to improve the in-tree code. Out-of-tree work usually leads to problems.

On the other hand, making it easy to write custom, possibly out-of-tree schedulers was a major goal of MI scheduling.

jonpa added inline comments.Feb 17 2016, 11:54 PM
lib/Target/SystemZ/SystemZISelLowering.cpp
123–138 ↗(On Diff #48068)

I am curious as to what you think would be the possibilities of a multi-pass scheduling approach. I have tried this once before like: First do a minimal reg-pressure scheduling. Then increase parallelism (overlapping live intervals) only when it seems to not cause too much spilling. Is this also what you had in mind?

Currently this is not needed for SystemZ, as the main focus at least right now is on JIT compilation.

atrick added inline comments.Feb 18 2016, 8:27 AM
lib/Target/SystemZ/SystemZISelLowering.cpp
123–138 ↗(On Diff #48068)

Yes that's what I had in mind. I didn't take that approach because I wanted to preserve source order in common cases where an out-of-order target has enough registers, as well as compile time.

jonpa updated this revision to Diff 48475.Feb 19 2016, 4:52 AM

Latencies corrected, mainly for z13 vector instructions.

jonpa updated this revision to Diff 49500.Mar 1 2016, 8:59 AM

Improved modelling of execution units by separating execution units and decoder slots needed for each instruction. Instructions with a double use of exec unit or with a coupled use of the LSU now gets this modelled.

For z13, more of the pipelines have been modelled (FXa, FXb, and the various vector pipelines).

LSU latency corrected to 4.
Latencies properly summed for cracked / expanded instructions. Instructions with joined dispatch have a latency same as for single issue
type instructions.

Tried use an include of commmon defs for different SchedModels, but TableGen
rejected this -- see comment in for example top screen of SystemZScheduleZ13.td.

This patch is on its way, but some regressions need to be fixed first.

jonpa updated this revision to Diff 73919.Oct 7 2016, 5:01 AM

This was already approved before, but that was quite some time ago, so I now reopen this review since my patch then never passed the performance measurements.

This is now *post-RA scheduling only* for SystemZ. Nearly all of the patch belongs to the SystemZ backend.

jonpa updated this revision to Diff 74345.Oct 12 2016, 2:17 AM

NFC update per Uli's requests. No longer any common code changes.

uweigand added inline comments.Oct 16 2016, 8:37 AM
lib/Target/SystemZ/SystemZHazardRecognizer.cpp
45

I guess instead of this we could simply use getInstrInfo on the SchedModel.

85

Is there a reason why this is a separate function and no just done directly in ::Reset()?

155

This looks a bit ad-hoc ... isn't there a more generic way to find the shorter name?

lib/Target/SystemZ/SystemZHazardRecognizer.h
121

Why does groupingCost use the return value while resourcesCost uses an output parameter?

lib/Target/SystemZ/SystemZMachineScheduler.h
58

As discussed offline, we really need to get rid of the gobal/static variable here. I note that there is quite a bit of similarily between this "preliminary" sorter and the final sorter in Candidate. Maybe we should actually store "preliminary" candidates in the Available list (with GroupingCost and ResourceCost only set to 0 or 1 depending on whether grouping or reserved resources are involved), and the update the cost parameters with actual values once we known them? We then might even be able to reuse the same comparison routine ...

lib/Target/SystemZ/SystemZScheduleZ13.td
28

We should really try to get this complete, so that instructions added in the future don't accidentally lack scheduling information. How difficult would it be to get there?

104

Ideally, the ordering in these files should mostly correspond to the ordering in the original InstrInfo files, just to make them easier to find ...

105

Also, it is somewhat annoying that we need to list not just the basic instruction definitions, but all the various aliases as well. I'm wondering if there isn't some what to annotate the Alias definition in the main file with the opcode the alias will be resolved in the end, so that can be used for scheduling purposes ...

225

This is not an "And", it's a non-transactional store and should go with the transaction-related instructions.

582

It would be nice to at least separate out vector floating-point instructions, so we can easily see where W variants are needed.

739

I don't think there's a real difference between those and the ones listed under Other.

764

This is just an alias for a LARL and should go there.

jonpa updated this revision to Diff 74984.Oct 18 2016, 5:27 AM
jonpa marked 10 inline comments as done.

Updated per requests.

Does anyone know how to say "Instruction B should have the same scheduling class as instruction A" ? This was the only point I could not get fixed.

For the rest, please see replies to comments below.

jonpa requested a review of this revision.Oct 18 2016, 5:46 AM
jonpa edited edge metadata.
jonpa added inline comments.
lib/Target/SystemZ/SystemZHazardRecognizer.cpp
155

Fixed by using string::substr()/resize() instead. Now all units should really be named per a Z13_XXXUnit pattern.

lib/Target/SystemZ/SystemZMachineScheduler.h
58

I gave this a try, but it thought it was a bit messy to flip the sorting variables back and forth inside a set of Candidates.

I instead use the isScheduleHigh flag for groupers / FPd ops, which simplifies the SUSorter method while also eliminating the static variable. The iteration in pickNode() should be nearly unaffected, since these nodes are quite rare.

lib/Target/SystemZ/SystemZScheduleZ13.td
28

I added the hasNoSchedulingInfo flag on the appropriate instructions, and then set CompleteModel (the reason I did not do that before is that AFAICR, this flag then also demanded modeling of operand writes or something of that sort).

Worthy of mentioning is that this triggers an error during build for any instruction missing scheduling input for *all* subtargets. In a debug build, TargetSchedule.cpp will then cause an abort during compilation if the subtarget does not have scheduling input for an emitted instruction.

104

I have reorganized the files completely to match the InstrInfo file sections.

105

Could not as of yet find any way to achieve this, but there might be some way...

jonpa added inline comments.Oct 18 2016, 5:50 AM
lib/Target/SystemZ/SystemZHazardRecognizer.h
121

During experiments I have also returned a cost here for the 'other' processor side. I guess I could change that back now until it's needed again...?

uweigand edited edge metadata.Oct 18 2016, 6:47 AM

Updated per requests.

Does anyone know how to say "Instruction B should have the same scheduling class as instruction A" ? This was the only point I could not get fixed.

For the rest, please see replies to comments below.

Looking quite good now, thanks! The one thing I'm still wondering about is the hasNoSchedulingInfo flags. Those are of course fine for the .insn directive, and also for custom-inserter pseudos. However, I don't think we want them for the Asm* branch variants; those are just regular branch instructions that might be used in inline assembler, and they really should have the same scheduling info as the standard form of the branch instructions. I think it should be straightforward to implement this by adding "(Asm)?" to the instregex strings for the branch instructions.

lib/Target/SystemZ/SystemZMachineScheduler.h
58

OK, that makes sense.

lib/Target/SystemZ/SystemZScheduleZ13.td
27

I think this is the default setting, so we can just leave it out here.

28

Great, thanks! I do think we indeed want the error, even for older subtargets. Hopefully in the not-too-distant future we'll have completed the instruction set for the old processors anyway ...

104

Excellent, it is much more readable now!

105

Oh well, if there's no straightforward way, it's OK the way it is now ...

lib/Target/SystemZ/SystemZScheduleZ196.td
68

What's the EC12 doing here?

Does anyone know how to say "Instruction B should have the same scheduling class as instruction A" ? This was the only point I could not get fixed.

I'm not sure I understand the problem. I *think* you can mark your scheduling model "complete", then add InstAlias records in your .td file without adding new InstRW records...

Does anyone know how to say "Instruction B should have the same scheduling class as instruction A" ? This was the only point I could not get fixed.

I'm not sure I understand the problem. I *think* you can mark your scheduling model "complete", then add InstAlias records in your .td file without adding new InstRW records...

This is not about InstAlias records (which are aliases for the assembler) -- those indeed work fine. The issue here was about aliases for the code generator. We use those usually because some instructions need operands on the MC level that we don't want to expose on the MI level. For example, a "return" instruction on SystemZ is actually a "br %r14", but at the MI level this doesn't have any operands, but is simply defined as an alias:

def Return : Alias<2, (outs), (ins), [(z_retflag)]>;

where Alias is formally an Instruction, but doesn't have any information about mnemonic or opcodes. Instead, when this alias is about to be emitted, we emit an actual BR instruction pattern, adding the R14 operand at this point.

This means that to get a complete scheduler model, we have to duplicate the scheduling info for BR also for Return. It would be nice if there were instead a way to say in the definition of Return to just look at BR for scheduling info.

Does anyone know how to say "Instruction B should have the same scheduling class as instruction A" ? This was the only point I could not get fixed.

I'm not sure I understand the problem. I *think* you can mark your scheduling model "complete", then add InstAlias records in your .td file without adding new InstRW records...

This is not about InstAlias records (which are aliases for the assembler) -- those indeed work fine. The issue here was about aliases for the code generator. We use those usually because some instructions need operands on the MC level that we don't want to expose on the MI level. For example, a "return" instruction on SystemZ is actually a "br %r14", but at the MI level this doesn't have any operands, but is simply defined as an alias:

def Return : Alias<2, (outs), (ins), [(z_retflag)]>;

where Alias is formally an Instruction, but doesn't have any information about mnemonic or opcodes. Instead, when this alias is about to be emitted, we emit an actual BR instruction pattern, adding the R14 operand at this point.

This means that to get a complete scheduler model, we have to duplicate the scheduling info for BR also for Return. It would be nice if there were instead a way to say in the definition of Return to just look at BR for scheduling info.

The scheduling class is on the MCInstrDesc. AFAIK, there's no abstract way to tie your aliasing pseudo instruction's MCInstrDesc to their lowered instruction's MCInstrDesc. You can try to factor instruction records in the InstrInfo.td file itself by using a SchedRW field, but that's not the way you've structured things. I think the only reasonable thing to do here is duplicate the InstRW records. You've basically told CodeGen that these are two distinct MC instrs.

In D17260#573190, @atrick wrote:

The scheduling class is on the MCInstrDesc. AFAIK, there's no abstract way to tie your aliasing pseudo instruction's MCInstrDesc to their lowered instruction's MCInstrDesc. You can try to factor instruction records in the InstrInfo.td file itself by using a SchedRW field, but that's not the way you've structured things. I think the only reasonable thing to do here is duplicate the InstRW records. You've basically told CodeGen that these are two distinct MC instrs.

Yes, it look like this is what we'll have to do. But I guess that's fine, we don't have all that many of those aliases anyway.

jonpa updated this revision to Diff 75127.Oct 19 2016, 4:37 AM
jonpa edited edge metadata.
jonpa marked 3 inline comments as done.

Updated:
Asm* instructions now also get a useful schedclass.
getNumDecoderSlots() fixed to handle generic opcodes correctly.
resourcesCost() returns an int, just like groupingCost()
'CompleteModel = 1' removed

What are your thoughts on asserting for target instructions' sched class desc if CompleteModel is true? (see comment below)

jonpa added inline comments.Oct 19 2016, 4:39 AM
lib/Target/SystemZ/SystemZScheduleZ13.td
27

aah, right.

28

I found that my previous statement on the compile time checking for scheduling input was not actually correct - this does not really cover all instructions:

Currently asserts trigger only if

  • All subtargets have omitted the instruction from Schedule .td files. TableGen catches this, and only this because it isn't clever enough to know if a given subtarget (with a missing sched class for an instruction), actually supports that instruction or not.
  • An instruction has a def operand and the sched class does not have a WriteLatency entry for it. This is a specialized assert (in computeOperandLatency()) which doesn't cover all instructions - I could for instance remove the InstRW for a compare and not get any assert triggering.

So there really isn't any general assert that checks that for a subtarget with a CompleteModel, all instructions actually emitted have a sane scheduling class. What happened if I removed the InstRW for an instruction, was that it got its own (valid) schedclass for the subtarget, but with just 0 values.

I think we could catch the error of forgetting a subtarget/instruction sched annotation with an assert in the scheduler that demands at least one uop for any target instruction. This has at least worked well during my experiments previously.

Should I add this just in the SystemZ backend? Or could it be part of the common code somewhere? (I am thinking this should be done both pre-ra and post-ra). Or is there any reason not to demand this that I have missed?

lib/Target/SystemZ/SystemZScheduleZ196.td
68

Good heavens!

uweigand added inline comments.Oct 19 2016, 6:21 AM
lib/Target/SystemZ/SystemZHazardRecognizer.cpp
48

This looks OK to me. However, with this change we should now give a scheduling class to the DirectiveInsn pesudo instruction -- these do emit some instruction, we just don't know which one, so it should probably be modeled as some "generic" instruction.

lib/Target/SystemZ/SystemZHazardRecognizer.h
113

Seems you forgot to update the comment when changing the code :-)

lib/Target/SystemZ/SystemZScheduleZ13.td
28

I don't think doing it in the backend is the right place. In the backend, you only see the instructions that the code being compiled happens to use; and when you do find an error there, there's not much you can do.

The right place does seem to be TableGen. And in fact, I had interpreted the code in CodeGenSchedModels::checkCompleteness to do just that check. If this doesn't work as expected, it probably ought to be fixed there.

But in any case this is a separate problem and shouldn't hold up this patch.

jonpa updated this revision to Diff 75137.Oct 19 2016, 6:42 AM
jonpa marked 2 inline comments as done.

Updated with an empty sched class for Insn.. instructions, so that getNumDecoderSlots() will return 1 and not 0.

jonpa added inline comments.Oct 19 2016, 6:44 AM
lib/Target/SystemZ/SystemZHazardRecognizer.cpp
48

I used an empty InstRW construct, which seems to do the job.

uweigand accepted this revision.Oct 19 2016, 8:22 AM
uweigand edited edge metadata.

OK, this looks good to me now. Thanks!

This revision is now accepted and ready to land.Oct 19 2016, 8:22 AM
jonpa closed this revision.Oct 20 2016, 1:37 AM

Commited as r284704.