This is an archive of the discontinued LLVM Phabricator instance.

[TableGen][SchedModels] Fix read/write variant substitution #2
ClosedPublic

Authored by evgeny777 on Nov 5 2020, 5:55 AM.

Details

Summary

This is continuation to D89777 and addresses the case when some SchedRW can be partially pushed (i.e some of its read variants intersect with transition vector and some don't). Patch removes write variant for WriteALUsi which causes IIC_WriteALUsi_ReadALUsr to have two read variants (Swift and A57 models), but only one write variant (Swift model).

Diff Detail

Event Timeline

evgeny777 created this revision.Nov 5 2020, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 5:55 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
evgeny777 requested review of this revision.Nov 5 2020, 5:55 AM
evgeny777 updated this revision to Diff 303120.Nov 5 2020, 7:40 AM

Simplified

evgeny777 updated this revision to Diff 303352.Nov 6 2020, 12:50 AM

Eliminated unused variable

I'm not sure if I see how both halves are related here. Is this a problem that you found, so you altered the A57 to show that issue, so the tablegen code fixes the issue?

llvm/lib/Target/ARM/ARMScheduleA57.td
180

This is "Move, shift by immed, no setflags" _and_ "Move, shift by immed, setflags"?
I agree that the predicated pred should not matter, but there probably should be some difference between flag setting and not.

I think the TODO above is referring to A57WriteALUSsr? I'm not sure why A57WriteALUsr is treated the same way though. From what I can see it should be using A57Write_1cyc_1.

Is A57ReadALUsr worth keeping around?

I'm not sure if I see how both halves are related here. Is this a problem that you found, so you altered the A57 to show that issue, so the tablegen code fixes the issue?

Yep, this is a problem I've found after D89777 was landed. Modifications to A57 code are required to test it. D89777 addressed the case when entire SchedRW couldn't be expanded, this one is actually improvement to which also handles case when some of SchedRW aliases are expanded and some are not.

evgeny777 added inline comments.Nov 7 2020, 10:38 AM
llvm/lib/Target/ARM/ARMScheduleA57.td
180

Is A57ReadALUsr worth keeping around?

I don't think it is, however I decided to keep it for now for testing purposes.

I'm not sure why A57WriteALUsr is treated the same way though. From what I can see it should be using A57Write_1cyc_1.

Why? From opt guide:

ALU, shift	by register,	
unconditional                          (same as above) 2 1 M
ALU, shift by register,	
conditional                            (same as above) 2 1 I0/I1
dmgreen added inline comments.Nov 9 2020, 1:30 AM
llvm/lib/Target/ARM/ARMScheduleA57.td
180

Hmm. Which opt guide is that from? I seem to see:

Move, shift by immed, no setflags    1I
Move, shift by immed, setflags    2M
Move, shift by register, no setflags, unconditional    1I
Move, shift by register, no setflags, conditional    2I
Move, shift by register, setflags, unconditional    2M
Move, shift by register, setflags, conditional    2I

So the first 2 are currently in WriteALUsi (which should probably be split up to get it correct, but that is a separate issue. A better default is probably A57Write_1cyc_1I).
A57WriteALUSsr is the last 2 and fits the opt guide correctly at least.
A57WriteALUsr is the middle two, but should probably be using Pred:A57Write_2cyc_1I and NoPred: A57Write_1cyc_1I.

evgeny777 added inline comments.Nov 9 2020, 4:25 AM
llvm/lib/Target/ARM/ARMScheduleA57.td
180

Hmm. Which opt guide is that from?

I think we're both looking at the same one. I've copy-pasted from section 3.3

I seem to see:

Right, but Move, shift by register (MOVsr) is bound to WriteALU (ARM version) and thumb version (t2MOVsr) is unbound. The following commands are currently bound to WriteALUsr:

ADCrsr ADDrsr ANDrsr BICrsr EORrsr ORRrsr RSBrsr RSCrsr SBCrsr SUBrsr SXTAB SXTAB16 SXTAH UXTAB UXTAB16 UXTAH

It seems ARM/Thumb instruction definition is incomplete and broken in many ways when it comes to scheduling. Patch however is more about simplifying adding new models, not fixing existing ones.

Hello. Sorry for the delay. I don't know this code super well and was trying to figure out what the different parts were doing. I see that the produced output has code like this:

case 703: // t2ADDSrs
  if (SchedModel->getProcessorID() == 4) { // CortexA57Model
    if ((TII->isSwiftFastImmShift(MI))
        && TII->isPredicated(*MI))
      return 1066; // _ReadDefault
  }

The isSwiftFastImmShift is odd, when paired with a CortexA57Model, and comes from using StartIdx's PredTerm in TransVec.emplace_back(TransVec[StartIdx].PredTerm) I think. The _ReadDefault as the schedule class also sounds odd to me, as opposed to something that would use Writes too.

llvm/lib/Target/ARM/ARMScheduleA57.td
180

Ah. I was thinking more about shifts than arithmetic operations. In that case, A57Write_2cyc_1M is probably a better default than A57Write_1cyc_1I.

llvm/utils/TableGen/CodeGenSchedule.cpp
1358

Does this need to use a densemap? It seems to be being used to check whether the TransVariant have already been handled. Can it use a set or something simpler for that?

evgeny777 added inline comments.Nov 23 2020, 9:49 AM
llvm/utils/TableGen/CodeGenSchedule.cpp
1358

Unfortunately it can't, because we not only need to check for same record definition, but also for same processor index (it's a bug in current implementation to ignore this). This is because same variant record may be shared between different processor models (like ReadAdrBase in ThunderX2T99 and ThunderX3T110)

@dmgreen Nice find again, thanks. Somehow I missed that. I've fixed the issue and checked ARM and AArch64 targets - looks good so far.

evgeny777 planned changes to this revision.Nov 24 2020, 7:13 AM

Needs to be rebased

evgeny777 updated this revision to Diff 308072.Nov 27 2020, 9:00 AM

@dmgreen After some studying I came up with different approach: if we get rid of artifical 'AnyCPU' (zero) processor index and explicitly create one PredTransition per processor, we can use much simpler algorithm for variant expansion. Besides fixing read variant compilation issues this also fixes few other things:

  • forwarding for sched classes with write variant (see test case)
  • per-operand latency calculation when sched class (w/o InstRWs) has write variant for one operand and plain write for other operand (seems to be not present anywhere now, but may be used in future), e.g
SchedAlias<WriteMAC64Lo, WriteMAC64LoVariant>;
SchedAlias<WriteMAC64Hi, Write_1cyc_ALU>;

Otherwise changes to ARM/AArch64/X86 subtargets look reasonable

evgeny777 added a comment.EditedNov 27 2020, 9:36 AM

Patch also lowers time of AArch64 subtarget generation from ~13s to ~10s in Debug build and from ~6s to 2.5s in Release build on my PC

dmgreen accepted this revision.Nov 30 2020, 12:33 AM

LGTM. This does seem simpler, Thanks

This revision is now accepted and ready to land.Nov 30 2020, 12:33 AM

^ Oh that's a shame. Any idea what was causing the output to be different? This fixed a problem we were seeing (and are now seeing again) in one of our downstream schedules.

^ Oh that's a shame. Any idea what was causing the output to be different? This fixed a problem we were seeing (and are now seeing again) in one of our downstream schedules.

It's most likely this beast in CodeGenSchedule.h

using ProcModelMapTy = DenseMap<Record*, unsigned>;

I'll take a look when I have time