This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings to operands
ClosedPublic

Authored by jonpa on May 17 2018, 2:40 AM.

Details

Summary

As a consequence of recent discussions (http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html) this patch changes the SystemZ SchedModels so that the IssueWidth is 6, which is the decoder capacity, and NumMicroOps become the number of decoder slots needed per instruction (Note: The first diff is just for z13 and z14. z196 and zEC12 will be updated also after all seems good).

Making sure that SchedWrite latencies match the MachineInstructions def-operand indexes has also been addressed. Basically, I have added a WLat for each def-operand with the instruction latency.

ReadAdvances have as well been added on instructions with one register operand and one memory operand. The register operand is then needed later than the address registers, since the load from memory must be done first. Since all these instructions always have the register operand before the address operands, it is enough to simply insert one ReadAdvance in the list.

I have used an approach where I try to separate the different concerns (latencies, functional units and micro-ops/grouping). All instructions are one micro-op, except in cases with special grouping rules, so micro-ops and grouping belongs together. The InstrRW lists are patterned like:

[Def-operand latencies, use operand read advances, FUs, GroupingRule]

For example, Insert-Character has one register (first use operand) and one memory operand, produces its result after 1 cycle, uses the FXa and LSU units, and groups normally:

def : InstRW<[WLat1LSU, RegReadAdv, FXa, LSU, NormalGr], (instregex "IC(Y)?$")>;

I think this looks nice and simple. Does this look sound?

Some minor questions:

  • When duplicating the latency for the CC operand, I have simply inserted another WLat entry. Is it possible and better to instead use some kind of list like WLatCC, to just expose one entry in the InstRW?
  • I tried making loops for e.g.
def : WriteRes<LSU,     [Z13_LSUnit]>;
def : WriteRes<LSU2,    [Z13_LSUnit]> { let ResourceCycles = [2]; }
def : WriteRes<LSU3,    [Z13_LSUnit]> { let ResourceCycles = [3]; }
def : WriteRes<LSU4,    [Z13_LSUnit]> { let ResourceCycles = [4]; }
def : WriteRes<LSU5,    [Z13_LSUnit]> { let ResourceCycles = [5]; }

, but did not find a way to do this (I think that it didn't work to express a LSU#I SchedWrite). I guess the current above will have to do?

  • I would like to have a constant for the LSULatency a value to be used also by RegReadAdv. Right now I have it hard coded to '4' in both places. How is this done in TableGen?

Diff Detail

Event Timeline

jonpa created this revision.May 17 2018, 2:40 AM
jonpa updated this revision to Diff 148812.May 28 2018, 7:05 AM

Patch updated per off-line review.

jonpa updated this revision to Diff 150085.Jun 6 2018, 1:24 AM

ZEC12 and Z196.

jonpa updated this revision to Diff 150288.Jun 7 2018, 3:25 AM

Some SystemZ tests needed updating. Some were due to minor differences in scheduling, but more notably:

  • int-uadd-08.ll (and int-uadd-09, int-usub-08, int-usub-09):

It seems that If Converter now makes a different decision which seems equivalent to me:

# *** IR Dump After SystemZ pseudo in   # *** IR Dump After SystemZ pseudo in
# Machine code for function f7: NoPHI   # Machine code for function f7: NoPHI
Function Live Ins: $r3l, $r4d           Function Live Ins: $r3l, $r4d

bb.0 (%ir-block.0):                     bb.0 (%ir-block.0):
  successors: %bb.2(0x40000000), %bb.     successors: %bb.2(0x40000000), %bb.
  liveins: $r4d, $r3l                     liveins: $r4d, $r3l
  renamable $r0l = ALHSIK killed rena     renamable $r0l = ALHSIK killed rena
  ST killed renamable $r0l, killed re     ST killed renamable $r0l, killed re
  BRC 15, 3, %bb.2, implicit $cc          BRC 15, 3, %bb.2, implicit $cc

bb.1.exit:                              bb.1.exit:
; predecessors: %bb.0                   ; predecessors: %bb.0

  Return                                  Return

bb.2.call:                              bb.2.call:
; predecessors: %bb.0                   ; predecessors: %bb.0

  CallJG @foo, <regmask $f8d $f9d $f1     CallJG @foo, <regmask $f8d $f9d $f1

# End machine code for function f7.     # End machine code for function f7.

Ifcvt: function (6) 'f7'                Ifcvt: function (6) 'f7'
Ifcvt (Simple false): %bb.0 (1) succe | Ifcvt (Simple): %bb.0 (2) succeeded!

# *** IR Dump After If Converter ***:   # *** IR Dump After If Converter ***:
# Machine code for function f7: NoPHI   # Machine code for function f7: NoPHI
Function Live Ins: $r3l, $r4d           Function Live Ins: $r3l, $r4d

bb.0 (%ir-block.0):                     bb.0 (%ir-block.0):
  successors: %bb.1(0x40000000); %bb.     successors: %bb.1(0x40000000); %bb.
  liveins: $r4d, $r3l                     liveins: $r4d, $r3l
  renamable $r0l = ALHSIK killed rena     renamable $r0l = ALHSIK killed rena
  ST killed renamable $r0l, killed re     ST killed renamable $r0l, killed re
  CondReturn 15, 12, implicit killed  |   CallBRCL 15, 3, @foo, <regmask $f8d

bb.1.call:                            | bb.1.exit:
; predecessors: %bb.0                   ; predecessors: %bb.0

  CallJG @foo, <regmask $f8d $f9d $f1 |   Return

# End machine code for function f7.     # End machine code for function f7.
  • store_nonbytesized_vecs.ll: rewritten with REGs and -DAG checks (NFC intended).

but did not find a way to do this (I think that it didn't work to express a LSU#I SchedWrite). I guess the current above will have to do?

This will work.

foreach L = 1-5 in {

def : WriteRes<!cast<SchedWrite>("LSU"#L), [Z13_LSUnit]> { let ResourceCycles = [L]; }

}

But you have LSU, LSU2, LSU3, ... . The first one would have to change to LSU1.

javed.absar added inline comments.Jun 7 2018, 9:00 AM
lib/Target/SystemZ/SystemZSchedule.td
60

You could rewrite it more compactly as :

foreach FPUSuffix = ["a2", "a3", "a4"] in {

def FX#FPUSuffix : SchedWrite;

}

uweigand accepted this revision.Jun 25 2018, 2:43 AM

This version looks good to me, and we're also seeing good performance results.

As suggested by Javed, you may want to use foreach loops to remove some of the duplication.

But in either case, LGTM.

This revision is now accepted and ready to land.Jun 25 2018, 2:43 AM
jonpa closed this revision.Jul 20 2018, 6:59 AM
jonpa marked an inline comment as done.

Thanks for the review on the functional changes, which were committed as r337538.

I have made a new patch for the NFC changes involving tablegen loops (thanks Javed) here: https://reviews.llvm.org/D49598