This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C
ClosedPublic

Authored by jonpa on Mar 25 2020, 6:52 AM.

Details

Summary

(MS(G)RKC part of original patch)

Diff Detail

Event Timeline

jonpa created this revision.Mar 25 2020, 6:52 AM
uweigand added inline comments.Mar 25 2020, 7:31 AM
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1201

Is this necessary for correctness, or just for performance tuning?

I thought the point of the MemFold pseudos was that they actually are three-operand instructions; and if they're used with non-tied operands, that gets fixed up in a later pass.

So I guess it would be *correct* to use them even for non-tied operands, right?

jonpa marked an inline comment as done.Mar 25 2020, 8:40 AM
jonpa added inline comments.
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1201

Yeah, so far we have aimed to only fold instructions when we see that at least at this point the registers match. And in the case when that changes (register reassignment), SystemZPostRewrite will insert a COPY so that the 2-address reg/mem instruction can be used. This is a performance tuning consideration, I think.

So yes, I it should still work to fold into a MemFold pseudo even if the registers are different, but I haven't tried it.

The current (trunk) check with getTwoOperandOpcode() works as to find a 3-address instruction with a mapped FoldMem pseudo, an they have non-tied operands, so that is certainly correct.

jonpa updated this revision to Diff 252790.Mar 26 2020, 4:00 AM

Simplify the check to simply check for instructions with three explicit operands that are not tied (and fix the check for TIED_TO).

Moving the check for VRM up a bit in the function makes the code cleaner and is NFC (currently).

jonpa updated this revision to Diff 252796.Mar 26 2020, 4:09 AM

Fix a syntax error that slipped into last diff.

I'm not sure I understand those latest changes. You seem to no longer check at all whether the target opcode actually requires tied operands, you just always tie them?

jonpa added a comment.Mar 26 2020, 5:18 AM

I'm not sure I understand those latest changes. You seem to no longer check at all whether the target opcode actually requires tied operands, you just always tie them?

The current check 'if (SystemZ::getTwoOperandOpcode(Opcode) != -1)' finds instructions that are three address that have a two-address equivalent opcode, e.g. AGRK -> AGR. I was thinking that the check 'if (NumOps == 3 && MI.getDesc().getOperandConstraint(1, MCOI::TIED_TO) == -1)' should be exactly equivalent to that and would also include the MS(G)RKC opcodes.

I suppose all these instructions should have a MemFoldPseudo, so checking for a target memory instruction to detect this case should also be equivalent.

I am also thinking that there really isn't any reg/memory instructions that are not tied (?), so this can simply be assumed to be the case and checking for that doesn't make much sense.

This is just a heuristic to not use MemFoldPseudos unless it is fairly certain that they do not need a COPY to match the target memory instruction.

This change is NFC on benchmarks.

The more I think about it, the more it seems that the original check has always been somewhat questionable.

What is it that we really need to check here? The main correctness question is, whether the resulting MI would be correct, i.e. have the right operand types. In general, for all the possible MemOpcode values, the memory operand must come last, so it is only valid to replace the final register operand with a memory operand. The exception for this is if the original instruction is commutable, in which case we may also first commute the (commutable pair of) operands and then replace the (now) final register operand with a memory operand. All of this has nothing to do with getTwoOperandOpcode, with existing physreg assignments or anything else. We do this commutability check for compares already, and we can do it just the same for other MemOperands.

And if this is all we do, I believe the code would still be correct.

Now, everything else seems to be just a performance tuning question. And even here I'm not sure whether this is really doing us any good. We originally have a three-way opcode like

dst = ARK src1, src2

where src2 now lives in a spill slot (assuming the non-commutating case for simplicity). If dst and src1 are mapped to the same register, this is transferred to A (using the MemFold pseudo for A as an intermediate step). Now, if dst and src1 are *not* mapped to the same register, we currently do *not* perform the transformation, so we get

tmp = L spill-src2
dst  = ARK src1, tmp

However, if we simply did perform the transformation, we'd instead get

dst = COPY src1
dst = A dst, spill-src2

Now, it is a bit unclear which version is preferable. The first version may have one fewer microops, but it uses one more register (which may be an issue given that we're already spilling). So this would be worthwhile to check.

However, if we actually want to avoid the COPY and keep the existing code, then I believe the correct check is simply whether or not MemOperand is a MemFold pseudo. This is the only case where this can ever be a concern. (Note that getMemoryOpcode actually preserves the state of whether or not the destination is tied; I had been confused about that above.) So going back to your previous version of the check and simply only checking for MemFold pseudos should do what we want here.

jonpa updated this revision to Diff 253637.Mar 30 2020, 10:56 AM

Now, it is a bit unclear which version is preferable. The first version may have one fewer microops, but it uses one more register (which may be an issue given that we're already spilling). So this would be worthwhile to check.

I made a build of benchmarks (on top of these recent improvements) where the known allocations were ignored, which caused a lot more folds with the needed register moves:

lg             :              1090487              1081897    -8590
lgr            :               874009               882174    +8165
ag             :                37427                44978    +7551
agr            :               101720                94572    -7148
l              :               229926               225837    -4089
lr             :                56066                59363    +3297
a              :                59296                61866    +2570
ar             :                42553                40021    -2532
...

Over a nightly run, it looked to be just slightly in favor to *not* do this, in other words it seemed better to avoid the many register moves. But it was a close race...

However, if we actually want to avoid the COPY and keep the existing code, then I believe the correct check is simply whether or not MemOperand is a MemFold pseudo. This is the only case where this can ever be a concern. (Note that getMemoryOpcode actually preserves the state of whether or not the destination is tied; I had been confused about that above.) So going back to your previous version of the check and simply only checking for MemFold pseudos should do what we want here.

I agree this makes most sense and patch is updated.

Thanks for running the benchmark, I guess I'm OK with the current implementation then.

Just one minor inline comment, otherwise this now looks good.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1179

Does it still make sense to move the VRM check up here? I guess this could now cancel the transformation in some cases where it isn't necessary because we don't even need VRM below ...

1219

The one thing I'm wondering about: are there any cases where we don't have a MemFold pseudo, but it still would be beneficial to commute the operands? I guess not, given that "real" memory operations are never three-operand ...

jonpa marked 2 inline comments as done.Mar 31 2020, 4:37 AM
jonpa added inline comments.
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1179

I do believe that it is theoretically better to have the VRM check further down as before, but that it is cluttering the code. It never happens (at least not on a 2017 SPEC build) that this is reached with a null VRM, but of course that may change in the future.

1219

I checked this by building 2017, and it turns out that there are actually some cases of LOCGR/LOCRMux that show up here as commutable without a MemFold pseudo (select instructions are mapped to the LOC MemFold pseudos, but LOC instructions map directly to the target memory instruction).

However, when they show up here (during regalloc), op0 and op1 are tied and have the same virtual register, so since we are only handling cases of exactly one operand, I don't think these instructions could have anything but op2 to fold.

uweigand added inline comments.Mar 31 2020, 5:38 AM
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1179

Given that the check is already in place, I don't see a reason to move it up -- just leave the code as it was ...

1219

OK, that looks fine then. Thanks for checking!

jonpa updated this revision to Diff 253861.Mar 31 2020, 6:20 AM

Patch updated per review.

uweigand accepted this revision.Mar 31 2020, 7:05 AM

LGTM, thanks!

This revision is now accepted and ready to land.Mar 31 2020, 7:05 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2020, 8:50 AM