Page MenuHomePhabricator

[InstrEmitter, SystemZ] Copy Access registers with the correct register class.
ClosedPublic

Authored by jonpa on Feb 22 2020, 3:40 PM.

Details

Summary

On SystemZ there are a set of "access registers" that can be copied in and out of 32-bit GPRs with special instructions. These instructions can only perform the copy using low 32-bit parts of the 64-bit GPRs. As reported and discussed at https://bugs.llvm.org/show_bug.cgi?id=44254, this is currently broken due to the fact that the default register class for 32-bit integers is GRX32, which also contains the high 32-bit part registers.

I have tried to find a simple way to constrain the register class of such copies (also at -O0), but this turned out to not be quite simple. Just selecting a pseudo instruction with a custom inserter does not seem to work since CopyToReg/CopyFromReg have special handlings in InstrEmitter.

I then tried in SystemZDAGToDAG.cpp to select a CopyToReg to (CopyToReg COPY_TO_REGCLASS), which worked fine. But I could not get the same results with CopyFromReg. (COPY_TO_REGCLASS CopyFromReg) only resulted in a later COPY into GR32, but the COPY from the Access register was still first made to GRX32.

One alternative might be to let InstrEmitter deduce the needed register class for a CopyFromReg if there is only one user which is a COPY_TO_REGCLASS. I thought this seemed a bit messy, so I instead tried adding a new TargetRegisterInfo hook that is used in InstrEmitter when emitting CopyToReg/CopyFromReg nodes.

Does this make sense, or is there a better way?

Diff Detail

Event Timeline

jonpa created this revision.Feb 22 2020, 3:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2020, 3:40 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
jonpa updated this revision to Diff 246305.Feb 24 2020, 2:23 PM

This is an "uglier" handling that does not involve any common code changes. The reason it is not so nice is that the backend needs to transform the SelectionDAG (in Select()) by first recognizing any copy to/from a special set of registers and then in the case of copyFromReg insert a target pseudo opcode just with the purpose of constraining the regclass of the created virtreg.

It would be much simpler if the target could just supply the right regclass in the first place, which was my original suggestion (please see "Diff 1" under the History tab).

hliao removed a reviewer: hliao.Feb 24 2020, 3:05 PM
hliao added a subscriber: hliao.

Removed from reviewer list as the new patch takes all target-specific approach.

jonpa added a comment.Feb 24 2020, 3:09 PM

Removed from reviewer list as the new patch takes all target-specific approach.

Sorry - I meant to post these two approaches side-by-side, and personally I hope that the target-specific approach will not end up being used... So any comments on this would be much appreciated still! Do you think the new TRI seems reasonable?

I'm wondering if this handles all cases ... for CopyFromReg you apparently rely on the logic in EmitCopyFromReg that checks whether the value is used by some MachineNode with constrained regclass. But that logic isn't unconditionally used, e.g. it is skipped for "cloned" SUs ... not sure whether this could cause issues in more complicated scenarios.

Also, I'm not sure if the Glue handling is fully correct: for CopyToReg, you seem to simply drop the glue, which doesn't look right. For CopyFromReg, you keep the glue ... but you also glue the new node in as well, which may not be necessary (and may actually confuse code in EmitNode that scans glue chains?).

That said, if we do need to handle this problem in target code, it might actually be cleaner to just do it as a separate PreReload pass that simply looks at COPY nodes and constrains source/target virtual registers if required. This could be something like the X86 FlagsCopyLowering pass, I guess (but much simpler).

(Talking about flags, it seems that COPY from/to the %cc register would also require the same handling as access registers. Well, I guess we could implement a copy *to* %cc from a high register using TMHH, but we cannot implement a copy from %cc to a high register ...)

Oh, and one more thing: either way, can you please add the original test case from D74601 so we're sure this problem is (and remains) fixed. Thanks!

jonpa added a comment.Feb 25 2020, 8:46 AM

Oh, and one more thing: either way, can you please add the original test case from D74601 so we're sure this problem is (and remains) fixed. Thanks!

I added the two test functions I could find already as @_ZTW1x -> tls-08.ll:@fun0, and @_Z6squareiiiiiii -> tls-09.ll.

Oh, and one more thing: either way, can you please add the original test case from D74601 so we're sure this problem is (and remains) fixed. Thanks!

I added the two test functions I could find already as @_ZTW1x -> tls-08.ll:@fun0, and @_Z6squareiiiiiii -> tls-09.ll.

Ah, I missed that, sorry. That's fine then.

jonpa updated this revision to Diff 246851.Feb 26 2020, 4:04 PM

... That said, if we do need to handle this problem in target code, it might actually be cleaner to just do it as a separate PreReload pass that simply looks at COPY nodes and constrains source/target virtual registers if required. This could be something like the X86 FlagsCopyLowering pass, I guess (but much simpler).

Handling this with a new pre-regalloc pass instead that transforms COPY instructions of special physregs before register allocation. An alternative to this would be to do this in EmitInstrWithCustomInserter(), but currently COPY instructions are not handled there.

  • It seems safest to build the target instructions compared to just constrain the virtual register class of the register of the COPY.
  • I don't think kill flags are useful to manage here, so they are ignored.

(Talking about flags, it seems that COPY from/to the %cc register would also require the same handling as access registers. Well, I guess we could implement a copy *to* %cc from a high register using TMHH, but we cannot implement a copy from %cc to a high register ...)

Handling also COPYs of CC. Copy *to* CC is now done either with TMLH or TMHH (depending on the source reg) in copyPhysReg(). A copy from CC is handled in this new pass instead.

It seems safest to build the target instructions compared to just constrain the virtual register class of the register of the COPY.

I'm not sure I understand this: can you explain what problem you see with constraining the register class?

Also, if you do directly emit the EAR etc., I'm wondering why you still keep the COPY in as well?

It seems safest to build the target instructions compared to just constrain the virtual register class of the register of the COPY.

I'm not sure I understand this: can you explain what problem you see with constraining the register class?

I remember seeing that the register allocator would create a new virtual register and give it the register class based on calling MI->getRegClassConstraint() (or TII->getRegClass() directly). So in theory, it seems that if there is no MCInstrDesc anywhere that demands a particular register class, regalloc might feel free to take the optimal one (GRX32). I am not sure this is needed, but there is no mechanism that I know of that would constrain a *COPY* register regclass, although it may be that a COPY of a physreg into a virtreg is left alone.

Maybe someone could instead confirm that physreg copies do not get their virtreg regclasses changed ever. Maybe that is obvious and I just wasn't aware. Or, if there is no guarantee for this, perhaps a target hook like I suggested (getPhysRegCopyRegClass()) would be a better solution after all, since that would also then be an error if regalloc broke that.

Also, if you do directly emit the EAR etc., I'm wondering why you still keep the COPY in as well?

I figured that the coalescer should typically remove it. And I wasn't sure if there could ever be a problem with any other connected virtregs involved having to use a high-register. In that case there would have to be a copy to/from a GRH32.

uweigand accepted this revision.Mar 2 2020, 7:17 AM

It seems safest to build the target instructions compared to just constrain the virtual register class of the register of the COPY.

I'm not sure I understand this: can you explain what problem you see with constraining the register class?

I remember seeing that the register allocator would create a new virtual register and give it the register class based on calling MI->getRegClassConstraint() (or TII->getRegClass() directly). So in theory, it seems that if there is no MCInstrDesc anywhere that demands a particular register class, regalloc might feel free to take the optimal one (GRX32). I am not sure this is needed, but there is no mechanism that I know of that would constrain a *COPY* register regclass, although it may be that a COPY of a physreg into a virtreg is left alone.

Maybe someone could instead confirm that physreg copies do not get their virtreg regclasses changed ever. Maybe that is obvious and I just wasn't aware. Or, if there is no guarantee for this, perhaps a target hook like I suggested (getPhysRegCopyRegClass()) would be a better solution after all, since that would also then be an error if regalloc broke that.

Hmm, I see. I would expect that the one virtreg that was constrained will certainly keep its register class. However, I guess you're right that regalloc may create *new* virtregs (e.g. due to live range splitting etc.). I'm not sure either how the case where the virtreg is a COPY of a phyregs will be handled in that case. Thanks for pointing this out.

Also, if you do directly emit the EAR etc., I'm wondering why you still keep the COPY in as well?

I figured that the coalescer should typically remove it. And I wasn't sure if there could ever be a problem with any other connected virtregs involved having to use a high-register. In that case there would have to be a copy to/from a GRH32.

OK. Well, it's certainly conservatively correct to do it this way.

Given that the patch as-is looks correct to me, and it does fix an actual bug, I think we should go ahead with it for now. If we find another solution later (e.g. via common code change), we can always take the SystemZ pass out again. LGTM.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
824

I'm wondering if we need to check (or at least assert) that SrcReg is even a GRX32 here?

This revision is now accepted and ready to land.Mar 2 2020, 7:17 AM
jonpa marked an inline comment as done.Mar 3 2020, 7:45 AM
jonpa added inline comments.
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
824

The MachineVerifier would catch this with "Illegal physical register for instruction". There are no subregs anymore at this point (after regalloc), so it would be a very strange error. Perhaps an assert that copyPhysReg() is only called with two physreg operands (post-RA) would make sense?

It seems that without the verifier a COPY from $r4d in tls-11.mir does not get caught anywhere.

This revision was automatically updated to reflect the committed changes.