This is continuation to D89777 and addresses the case when some SchedRW can be partially pushed (i.e some of its read variants intersect with transition vector and some don't). Patch removes write variant for WriteALUsi which causes IIC_WriteALUsi_ReadALUsr to have two read variants (Swift and A57 models), but only one write variant (Swift model).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm not sure if I see how both halves are related here. Is this a problem that you found, so you altered the A57 to show that issue, so the tablegen code fixes the issue?
llvm/lib/Target/ARM/ARMScheduleA57.td | ||
---|---|---|
186 | This is "Move, shift by immed, no setflags" _and_ "Move, shift by immed, setflags"? I think the TODO above is referring to A57WriteALUSsr? I'm not sure why A57WriteALUsr is treated the same way though. From what I can see it should be using A57Write_1cyc_1. Is A57ReadALUsr worth keeping around? |
I'm not sure if I see how both halves are related here. Is this a problem that you found, so you altered the A57 to show that issue, so the tablegen code fixes the issue?
Yep, this is a problem I've found after D89777 was landed. Modifications to A57 code are required to test it. D89777 addressed the case when entire SchedRW couldn't be expanded, this one is actually improvement to which also handles case when some of SchedRW aliases are expanded and some are not.
llvm/lib/Target/ARM/ARMScheduleA57.td | ||
---|---|---|
186 |
I don't think it is, however I decided to keep it for now for testing purposes.
Why? From opt guide: ALU, shift by register, unconditional (same as above) 2 1 M ALU, shift by register, conditional (same as above) 2 1 I0/I1 |
llvm/lib/Target/ARM/ARMScheduleA57.td | ||
---|---|---|
186 | Hmm. Which opt guide is that from? I seem to see: Move, shift by immed, no setflags 1I Move, shift by immed, setflags 2M Move, shift by register, no setflags, unconditional 1I Move, shift by register, no setflags, conditional 2I Move, shift by register, setflags, unconditional 2M Move, shift by register, setflags, conditional 2I So the first 2 are currently in WriteALUsi (which should probably be split up to get it correct, but that is a separate issue. A better default is probably A57Write_1cyc_1I). |
llvm/lib/Target/ARM/ARMScheduleA57.td | ||
---|---|---|
186 |
I think we're both looking at the same one. I've copy-pasted from section 3.3
Right, but Move, shift by register (MOVsr) is bound to WriteALU (ARM version) and thumb version (t2MOVsr) is unbound. The following commands are currently bound to WriteALUsr: ADCrsr ADDrsr ANDrsr BICrsr EORrsr ORRrsr RSBrsr RSCrsr SBCrsr SUBrsr SXTAB SXTAB16 SXTAH UXTAB UXTAB16 UXTAH It seems ARM/Thumb instruction definition is incomplete and broken in many ways when it comes to scheduling. Patch however is more about simplifying adding new models, not fixing existing ones. |
Hello. Sorry for the delay. I don't know this code super well and was trying to figure out what the different parts were doing. I see that the produced output has code like this:
case 703: // t2ADDSrs if (SchedModel->getProcessorID() == 4) { // CortexA57Model if ((TII->isSwiftFastImmShift(MI)) && TII->isPredicated(*MI)) return 1066; // _ReadDefault }
The isSwiftFastImmShift is odd, when paired with a CortexA57Model, and comes from using StartIdx's PredTerm in TransVec.emplace_back(TransVec[StartIdx].PredTerm) I think. The _ReadDefault as the schedule class also sounds odd to me, as opposed to something that would use Writes too.
llvm/lib/Target/ARM/ARMScheduleA57.td | ||
---|---|---|
186 | Ah. I was thinking more about shifts than arithmetic operations. In that case, A57Write_2cyc_1M is probably a better default than A57Write_1cyc_1I. | |
llvm/utils/TableGen/CodeGenSchedule.cpp | ||
1360 | Does this need to use a densemap? It seems to be being used to check whether the TransVariant have already been handled. Can it use a set or something simpler for that? |
llvm/utils/TableGen/CodeGenSchedule.cpp | ||
---|---|---|
1360 | Unfortunately it can't, because we not only need to check for same record definition, but also for same processor index (it's a bug in current implementation to ignore this). This is because same variant record may be shared between different processor models (like ReadAdrBase in ThunderX2T99 and ThunderX3T110) |
@dmgreen Nice find again, thanks. Somehow I missed that. I've fixed the issue and checked ARM and AArch64 targets - looks good so far.
@dmgreen After some studying I came up with different approach: if we get rid of artifical 'AnyCPU' (zero) processor index and explicitly create one PredTransition per processor, we can use much simpler algorithm for variant expansion. Besides fixing read variant compilation issues this also fixes few other things:
- forwarding for sched classes with write variant (see test case)
- per-operand latency calculation when sched class (w/o InstRWs) has write variant for one operand and plain write for other operand (seems to be not present anywhere now, but may be used in future), e.g
SchedAlias<WriteMAC64Lo, WriteMAC64LoVariant>; SchedAlias<WriteMAC64Hi, Write_1cyc_ALU>;
Otherwise changes to ARM/AArch64/X86 subtargets look reasonable
Patch also lowers time of AArch64 subtarget generation from ~13s to ~10s in Debug build and from ~6s to 2.5s in Release build on my PC
^ Oh that's a shame. Any idea what was causing the output to be different? This fixed a problem we were seeing (and are now seeing again) in one of our downstream schedules.
It's most likely this beast in CodeGenSchedule.h
using ProcModelMapTy = DenseMap<Record*, unsigned>;
I'll take a look when I have time
This is "Move, shift by immed, no setflags" _and_ "Move, shift by immed, setflags"?
I agree that the predicated pred should not matter, but there probably should be some difference between flag setting and not.
I think the TODO above is referring to A57WriteALUSsr? I'm not sure why A57WriteALUsr is treated the same way though. From what I can see it should be using A57Write_1cyc_1.
Is A57ReadALUsr worth keeping around?