Page MenuHomePhabricator

[SystemZ] Support i128 inline asm operands
ClosedPublic

Authored by jonpa on Apr 19 2021, 1:05 PM.

Details

Summary

As it seems difficult to mix in only some cases of legal i128 (for instance adding i128 as a type of GR128 immediately causes things to not even build), I did some experiments on how to handle this without doing that. This was (as expected) not as simple as one might have hoped - here is what I did so far to make the test case compile:

  • Use a new SystemZISD node "SUBREG128" which is used when splitting/joining i128 inline asm values. Implement splitValueIntoRegisterParts() and joinRegisterPartsIntoValue() in SystemZTargetLowering using this new node to keep track of a GR64 reg and a hi64/low64 subreg index. This unfortunately also gets called for other things than inline-assembly, but it seems to work at least during experiments to check for an existing CallingConvention, but I am not really sure that is correct...
  • Use a new psuedo instruction (also called SUBREG128) that lowers the SUBREG128 ISD node to an MI that uses a custom inserter hook. Here the INLINEASM instruction is found and replaced by a new one with GR128 operands (INLINEASM is a generic opcode without custom insertion).
  • Return GR64 for i128 in SystemZTargetLowering::getRegForInlineAsmConstraint(). Currently GR128 is returned, but that is not really working either as two GR128 vregs were created. The i128 case is identified by an inline asm operand of GR64 that uses two (GR64) registers. This seems to be the only case where a register constraint could produce this, I think.
  • Build a new INLINEASM with altered Flag operands and GR128 (tied) operands. This is a little tricky given the tied operands, updating of Flag operands, and making sure that the SUBREG128 instructions are there as expected...

I tried leaving the INLINEASM as it was all the way until the AsmPrinter, but that didn't seem workable with the tied operands as they were then two pairs of tied GR64 registers that didn't belong in a GR128 pair during regalloc...

The approach is sort of simple but yet quite intricate to implement. There are probably things that could be simplified by some change in common-code...

Diff Detail

Event Timeline

jonpa created this revision.Apr 19 2021, 1:05 PM
jonpa requested review of this revision.Apr 19 2021, 1:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 1:05 PM
jonpa updated this revision to Diff 339177.EditedApr 21 2021, 4:19 AM

The idea per the initial patch to follow the general pattern of splitting the illegal i128 during isel and then do a fixup later was proven very cumbersome. I instead tried keeping GR128 as the regclass for the inline-asm operands which was actually simpler, even still a "ugly hack" since it had 2xGR128 for each i128 operands. Even better is to fix that problem so that SelectionDAGBuilder only builds one register operand to begin with, which is what the patch now does.

These are the needed common-code changes:

  • New method: TargetLowering::getNumRegistersForInlineAsmOp(). Allow target to override default number of parts for inline-asm operands.
  • splitValueIntoRegisterParts() / joinRegisterPartsIntoValue(): Add a new argument to pass the Value* of the CallInst in order to identify the case of an InlineAsm instruction.
  • SelectionDAGBuilder.cpp:GetRegistersForValue(): Don't do the RegClass/type fixup for MVT::Untyped.
  • SelectionDAGBuilder::visitInlineAsm(): Get the RegClass from the constraint code for MVT::Untyped.
jonpa updated this revision to Diff 340855.Apr 27 2021, 8:25 AM

Patch updated (NFC).

  • No need to pass the original Value to splitValueIntoRegisterParts()/joinRegisterPartsIntoValue() to detect the inline-asm case. However, this is still reached for normal calls where the i128 argument has been expanded to 2 x i64 parts, so need to check NumParts argument and do nothing in those cases (for example test/CodeGen/SystemZ/args-09.ll).
  • I tried to instead of adding a new hook "getNumRegistersForInlineAsmOp()" (per previous version of patch), pass RC to getNumRegisters() as an optional argument. This can instead be used by target to do an override. I am not quite sure this is preferred: in contrast to calling a separate new hook that puts this in a clear context, this is made general and it requires a (tiny) bit more work in SelectionDAGBuilder of passing the RegisterClass along. Perhaps an alternative would be to return this from getRegForInlineAsmConstraint() and that way have the assigned register, the register class and numparts declared there in one place..?
  • I tried to reuse the register class for a tied operand in visitInlineAsm(), but that turned out to fail in cases where the def is a specific physreg, in which case there is no regclass specified...
uweigand added inline comments.May 5 2021, 3:17 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8626

Instead of recomputing the RC here, can't we get it from the flag word of the matching operand, e.g. using hasRegClassConstraint(OpFlag, RCID)? This would match the way we get the number of registers from the flag word instead of recomputing it as well.

jonpa marked an inline comment as done.May 5 2021, 4:16 AM
jonpa added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8626

No - the matching operand may be a specific physreg in which case there is no register class specified (e.g. SystemZ/asm-17.ll).

uweigand added inline comments.May 5 2021, 4:35 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8626

Hmm, maybe I'm confused but I thought in those cases, we'd still get the RC for that physreg installed into the flags. Isnt't that exactly what the code in AddInlineAsmOperands does:

const TargetRegisterClass *RC = MRI.getRegClass(Regs.front());
Flag = InlineAsm::getFlagWordForRegClass(Flag, RC->getID());

(Regs.front() should have been set up to the physreg if there's one.)

jonpa marked 2 inline comments as done.May 5 2021, 5:23 AM
jonpa added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8626

That is guarded with a check to only do that for virtual registers...

uweigand added inline comments.May 11 2021, 2:05 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8626

Doh, of course, sorry for missing that.

Still, I think calling getRegForInlineAsmConstraint again here is redundant. The routine was already called in GetRegistersForValue, resulting in either a virtual register, in which case the RC is stored in the Flags, or else a physical register which is guaranteed to be a member of RC otherwise GetRegistersForValue will fail an assertion.

So we should check the register that was created for the matching index (which is in
AsmNodeOperands[CurOp+1] at this point); if it is a virtual register, get the RC from the flags, and if it is a physical register, use the RC this physreg is a member of. Does that make sense?

jonpa marked an inline comment as done.May 12 2021, 4:48 AM
jonpa added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8626

Ah, I was not quite aware of the fact that the phys-reg returned would also have a matching RC returned.

I also then think it would be best to call getRegForInlineAsmConstraint() once so I tried to pass an RC argument to it by reference so that the RC returned could be used in all the following places. The problem with this is however that targets fall back to the default TargetLowering::getRegForInlineAsmConstraint() which may return an unallocatable register class. I then tried demanding an allocatable register class in that method, but then physregs like CC become a problem since they don't have that. So to do this it seems then all targets must take care to return the right RC if there is not an allocatable one for a physreg.

What's more is that this wasn't actually working for an operand where the reg is a specified GR128 phys-reg. AddInlineAsmOperands will then call getNumRegisters() without an RC (and return 2 for i128). I tried returning just '1' here for a phys-reg, but that failed on other targets. Even so, InstrEmitter then fails to find the RC for the defined register, so SystemZTargetLowering::getRegClass() would have to return GR128 for MVT::Untyped...

I wonder:

  • Do we want to also handle the explicit phys reg operands (at this point)?
  • It is quite messy to first call getRegForInlineAsmConstraint() and then recompute the RC after that fact. Would it make sense to try to make TargetLowering::getRegForInlineAsmConstraint() return an allocatable RC and fix target cases of unallocatable physregs that show up in tests, and then just reuse that RC in throughout visitInlineAsm()? Alternatively, try first to return an allocatable RC and if there is none, return an unallocatable one.

If we are just handling virtual registers, maybe we could just get the RC from the flags per your suggestion...

jonpa updated this revision to Diff 346211.May 18 2021, 10:34 AM

So we should check the register that was created for the matching index (which is inAsmNodeOperands[CurOp+1] at this point); if it is a virtual register, get the RC from the flags, and if it is a physical register, use the RC this physreg is a member of. Does that make sense?

Yes... I however found that we have to implement getRegClassFor(MVT::Untyped) to make inline assembly phys-regs work in InstrEmitter::EmitCopyFromReg(). If we do this we can use this also in SelectionDAGBuilder instead of the above. This seems to work as long as we only have one untyped regclass even though it is slightly confusing to not use the regclass returned by getRegForInlineAsmConstraint().

So we should check the register that was created for the matching index (which is inAsmNodeOperands[CurOp+1] at this point); if it is a virtual register, get the RC from the flags, and if it is a physical register, use the RC this physreg is a member of. Does that make sense?

Yes... I however found that we have to implement getRegClassFor(MVT::Untyped) to make inline assembly phys-regs work in InstrEmitter::EmitCopyFromReg(). If we do this we can use this also in SelectionDAGBuilder instead of the above. This seems to work as long as we only have one untyped regclass even though it is slightly confusing to not use the regclass returned by getRegForInlineAsmConstraint().

Can you elaborate? I thought InstrEmitter::EmitCopyFromReg() was supposed to also work with types that are not legal. It uses

SrcRC = TRI->getMinimalPhysRegClass(SrcReg, VT);

to get the RC for use with a physreg ...

I don't really like implementing getRegClassFor(MVT::Untyped) ... that seems a bit underspecified.

As I said, I still think the best option would be to remove the getRegClassFor call in the tied register case in visitInlineAsm, and instead retrieve the RC either from the OpFlags (if virtual reg) or via getMinimalPhysRegClass (if phys reg).

jonpa updated this revision to Diff 346554.May 19 2021, 1:52 PM

Patch updated per review.

Can you elaborate? I thought InstrEmitter::EmitCopyFromReg() was supposed to also work with types that are not legal. It uses
SrcRC = TRI->getMinimalPhysRegClass(SrcReg, VT);
to get the RC for use with a physreg ...

It does that for SrcRC, but for DstRC it calls getRegClassFor(). Another way of solving this might be to check if the type is legal and if not then take SrcRC (see patch).

Patch updated per review.

Can you elaborate? I thought InstrEmitter::EmitCopyFromReg() was supposed to also work with types that are not legal. It uses
SrcRC = TRI->getMinimalPhysRegClass(SrcReg, VT);
to get the RC for use with a physreg ...

It does that for SrcRC, but for DstRC it calls getRegClassFor(). Another way of solving this might be to check if the type is legal and if not then take SrcRC (see patch).

Interesting. Note that for legal type, UseRC is always non-null here due to the lines above:

// Stick to the preferred register classes for legal types.
if (TLI->isTypeLegal(VT))
  UseRC = TLI->getRegClassFor(VT, Node->isDivergent());

so we really should have never arrived at the (second) getRegClassFor. Can you verify if that ever happens?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
954

Moving Regs.front() out of the Regs.empty() check is probably not a good idea.

jonpa updated this revision to Diff 346578.May 19 2021, 3:01 PM
jonpa marked an inline comment as done.

...so we really should have never arrived at the (second) getRegClassFor. Can you verify if that ever happens?

Indeed, that seems to be the case - removed.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
954

oops

This already looks a lot simpler. I'm now wondering if it would be even better to change the getNumRegisters interface from (ValueVT, RC) to (ValueVT, RegisterVT). (Or I guess an Optional<RegisterVT> if we don't want to change all callers.) All callers of getNumRegisters already have the RegisterVT immediately available, and it allows the back end to make any distinction it needs just as well as a RC would.

jonpa updated this revision to Diff 347022.May 21 2021, 7:22 AM

This already looks a lot simpler. I'm now wondering if it would be even better to change the getNumRegisters interface from (ValueVT, RC) to (ValueVT, RegisterVT). (Or I guess an Optional<RegisterVT> if we don't want to change all callers.) All callers of getNumRegisters already have the RegisterVT immediately available, and it allows the back end to make any distinction it needs just as well as a RC would.

Yes, even better! :-)

Now it's looking mostly good to me; a couple of (mainly cosmetic) remaining comments inline.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8624–8625

Maybe avoid using AsmNodeOperands[CurOp+1] twice by taking the type of the TiedReg below?

8625–8626

The wording of the error message now no longer makes sense. In fact, I think with the new code it no longer can even happen for RC to be null, so I guess the whole error can be removed.

llvm/lib/Target/SystemZ/SystemZISelLowering.h
428

A brief comment would be helpful here.

jonpa updated this revision to Diff 347818.May 25 2021, 4:17 PM
jonpa marked 3 inline comments as done.

Updated per review.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8625–8626

Yes, since RC must be provided I guess this can't happen...

The RC is asserted in createVirtualRegister(), so no need to assert here.

uweigand accepted this revision.May 26 2021, 3:09 AM

This version LGTM now. Thanks!

This revision is now accepted and ready to land.May 26 2021, 3:09 AM
This revision was landed with ongoing or failed builds.May 26 2021, 8:10 AM
This revision was automatically updated to reflect the committed changes.