This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add not_all_same constraint check
Needs ReviewPublic

Authored by greened on Sep 30 2020, 10:19 AM.

Details

Summary

Add a new constraint to express that not all operands of an instruction can be
assigned the same register. This is necessary for efficient use of SVE's MOVPRFX
as MOVPRFX has a restriction that its destination register may only appear as
the tied source of the following instruction. MOVPRFX is used for two purposes,
to turn a destructive operation into a non-destructive operation and to turn a
passthru-merging predicate operation into zero-merging predicate operation.

The constraint is needed when using MOVPRFX for the latter.

As an example, consider FDIV and FDIVR, where FDIVR is a "reverse" divide:

a = FDIV  p/m, b, c  # a[i] = p[i] ? b[i] / c[i] : a[i]
a = FDIVR p/m, c, b  # a[i] = p[i] ? b[i] / c[i] : a[i]

These are defined with tied constraints (simplified from existing
AArch64SVEInstrInfo.td):

let Constraints = "$Zdn = $_Zdn" in {
  // a[i] = p[i] ? b[i] / c[i] : a[i]
  defm FDIV_MERGE  : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)>
  // a[i] = p[i] ? b[i] / c[i] : a[i]
  defm FDIVR_MERGE : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)>
}

There are no separate zero-merging versions of these instructions. However,
we can define pseudo-instructions that act as zero-merging instructions:

let Constraints = "not_all_same($Zdn, $_Zdn, $Zm)" in {
  // a[i] = p[i] ? b[i] / c[i] : 0
  defm FDIV_ZERO  : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)>
  // a[i] = p[i] ? b[i] / c[i] : 0
  defm FDIVR_ZERO : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)>
}

Instruction selection can generate one of these:

vz0 = FDIV_ZERO pz1/z, vz2, vz3

If register allocation can't easily tie vz0 and vz2, it could produce this:

z0 = FDIV_ZERO p1, z2, z3

This is trivially expandable with MOVPRFX:

z0 = MOVPRFX p1/z, z2         # Both copies and makes FDIV_MERGE zero-merging
z0 = FDIV_MERGE p1/m, z0, z3

Note that even if the register allocator could tie vz0 and vz2, we'd still need
the MOVPRFX to support zero-merging so there's no adtantage to using a tied
constraint.

But say instruction selection produces this:

vz0 = FDIV_ZERO pz1/z, vz2, vz2

Note that even though both sources use the same virtual register, we may not be
able to replace this instruction with "1.0" due to traps or IEEE conformance so
we can't jst delete it.

Let's say instead of not_all_same we'd specified the constraint as "$Zdn =
$_Zdn" (i.e. tied) or had no constraints at all. Register allocation could
produce this without any extra copies:

z0 = FDIV_ZERO p1, z0, z0     # vz2 was dead after this instruction

When we go to expand the pseudo we might naively do this:

z0 = MOVPRFX p1/z, z0
z0 = FDIV_MERGE p1/m, z0, z0

Except now we've violated the constraint on MOVPRFX that its destination can
only appear on the tied source operand and nowhere else. So we need an extra
MOV:

z1 = MOV z0                   # Satisfy the MOVPRFX constraint
z0 = MOVPRFX p1/z, z0         # Needed to make FDIV_MERGE zero-merging
z0 = FDIV_MERGE p1/m, z0, z1

If we use the not_all_same constraint instead, register allocation can assign a
different register to vz0:

z1 = FDIV_ZERO p1, z0, z0

Now we can easily expand this without an extra MOV:

z1 = MOVPRFX p1/z, z0         # Both copies and makes FDIV_MERGE zero-merging
z1 = FDIV_MERGE p1, z1, z0

Note than an @earlyclobber constraint will not save us because it would
pessimize the lowering of this:

vz0 = FDIV_ZERO vp1, vz2, vz3

If vz2 were dead after the FDIV_ZERO, we could allocate vz0 and vz2 to the same
register and trivially lower the FDIV_ZERO. If vz0 had an @earlyclobber
constraint, we would not be able to allocate vz0 and vz2 to the same register,
pessimizing register allocation. Similarly, if one of vz2 or vz3 were
@earlyclobber, then vz2 and vz3 could not be the same register, which would
require an extra MOV for our second example above.

Since neither the tied nor the @earlyclobber constraint can express the needed
semantics, we need a new constraint type that accurately expresses the register
allocation restrictions on these zero-merging pseudo-instructions, which is the
role that not_all_same plays.

Diff Detail

Event Timeline

greened created this revision.Sep 30 2020, 10:19 AM
greened requested review of this revision.Sep 30 2020, 10:19 AM
greened edited the summary of this revision. (Show Details)Sep 30 2020, 10:19 AM
greened edited the summary of this revision. (Show Details)Sep 30 2020, 10:21 AM
greened edited the summary of this revision. (Show Details)
greened updated this revision to Diff 295389.Sep 30 2020, 1:14 PM

Make clang-tidy (more?) happy.

greened added inline comments.Oct 1 2020, 7:05 AM
llvm/utils/TableGen/CodeGenInstruction.cpp
16

Hmm. Is this actually a correct warning? Why would "ST" be lexigraphically before "Sm?"

Thanks for working on this @greened!

llvm/utils/TableGen/CodeGenInstruction.cpp
256

nit: s/CStr/ConstraintString/

I keep reading this as C string, whereas it's actually a std::string :)

266

If you would pass the constraint as a StringRef, then you can use its convenient split() method and also use the trim method to trim any whitespace.

285

nit: given that it may return true/false based on success/failure, can you rename it to tryToPlaceNotAllSameConstraint?

291

Is there a reason it can only be applied to a single operand? I would expect the constraint to be applied to all the operands that are described in the constraint string.

/// This holds information about one operand of a machine instruction,
/// indicating the register class for register operands, etc.
class MCOperandInfo {
public:

  [...]

  /// The lower 16 bits are used to specify which constraints are set.
  /// The higher 16 bits are used to specify the value of constraints (4 bits
  /// each).
  uint32_t Constraints;

That suggests that multiple constraints can be set per operand?

411

nit: Maybe pull the semantic checks out into a separate function?

441

Should this be >= Operands.size() ?

llvm/utils/TableGen/InstrInfoEmitter.cpp
187

This only working for 4 operands doesn't seem very future proof. I guess you can still redefine that bitfield to take '8bits' info for each constraint, so that there are 3 bytes for the 3 constraints + one byte that specifies which constraints are enabled. New constraints that are added after this one would mean changing the int32_t -> int64_t, but given that after all these years only IsTied and EarlyClobber exist, shows doesn't need to happen very often :)

greened added inline comments.Oct 7 2020, 12:03 PM
llvm/utils/TableGen/CodeGenInstruction.cpp
256

I was following existing convention but I agree with you!

291

TableGen will not allow multiple constraints on an operand. There are checks and aborts for that. Perhaps that restriction can be lifted but that's a whole other can of worms.

llvm/utils/TableGen/InstrInfoEmitter.cpp
187

Again, I think that may be a fine change as a follow-up but I'd rather not conflate it with this change. This change is large enough already. :)

greened added inline comments.Oct 8 2020, 3:14 PM
llvm/utils/TableGen/CodeGenInstruction.cpp
291

Thinking about this some more, I'll see if I can lift this restriction. It certainly would simplify the handling of this constraint since we wouldn't need all the rigmarole around finding a (single) place for it.

I don't really understand the upper/lower 16 bit thing. If we have the ability to signal 16 different constraints on an operand, why does the comment talk about four bits of value for each constraint? That would seem to mean only four constraints are allowed at once. It's an odd partition of the 32 bits available.

llvm/utils/TableGen/InstrInfoEmitter.cpp
187

I like your 8 bit value idea though. If we think it's important to do sooner rather than later (because near-future ISAs needing this constraint will have more than four operands) perhaps I can make the 8 bit change first and then land this one. I don't really know what the implications are of changing the constraint layout though. It's been this way for so long that it makes me nervous to change it. What do you think? Important to do now or can it wait?

sdesmalen added inline comments.Oct 9 2020, 9:22 AM
llvm/utils/TableGen/CodeGenInstruction.cpp
291

Thinking about this some more, I'll see if I can lift this restriction. It certainly would simplify the handling of this constraint since we wouldn't need all the rigmarole around finding a (single) place for it.

Thanks! The tests are quite confusing on how it tries to find a suitable operand to place the constraint.

I don't really understand the upper/lower 16 bit thing. If we have the ability to signal 16 different constraints on an operand, why does the comment talk about four bits of value for each constraint? That would seem to mean only four constraints are allowed at once. It's an odd partition of the 32 bits available.

The partitioning is odd indeed, but with only two constraints I doubt there was never a need to reconsider this.

llvm/utils/TableGen/InstrInfoEmitter.cpp
187

It should be fine to fix in a follow-up patch as the instructions you're trying to target only have four operands and those will be the only instructions needing the new constraint for now.
That said, from a quick glance MCInstrDesc::getOperandConstraint seems to be the only interface directly using it (returning as an int), so I think it's worth a try to see if this a one/few line change is sufficient (and if not, to just leave it for a follow-up patch)

This is still on my TODO list but got pushed aside for a bit due to other issues. I expect to be back on this in the next couple of weeks.

greened updated this revision to Diff 315984.Jan 11 2021, 7:52 PM

Updated to main, implemented some suggested changes.

greened marked 3 inline comments as done.Jan 11 2021, 8:13 PM
greened added inline comments.
llvm/utils/TableGen/CodeGenInstruction.cpp
291

I looked into this and it seems like allowing multiple contraints on an operand is going to require some major surgery. It will require that the Constraints vector in CGIOperandList::OperandInfo` become multi-dimensional, as currently the vector tracks constraints per "sub-operand."

Maybe that will be ok but I don't yet have a good handle about what the fallout from that would be.

greened added inline comments.Jan 15 2021, 8:31 AM
llvm/utils/TableGen/CodeGenInstruction.cpp
291

Actully I think a better approach would be to encode multiple constraints in ConstraintInfo and then have the generator use that to emit the operand info.

sdesmalen added inline comments.Jan 17 2021, 12:55 PM
llvm/utils/TableGen/CodeGenInstruction.cpp
291

Thanks for looking at this again @greened, I had to swap some pages back in my memory to understand the state of this :)

Looking at this patch with a slightly fresh view, I realised it may not be strictly necessary to add support for more than one constraint per operand, because the constraints can be sufficiently simplified. Some observations when there are operands with >1 constraint:

  • EarlyClobber is a more strict variant of not_all_same in that it specifically requires the earlyclobber operand to be allocated a different register from the rest. The not_all_same constraint is then already satisfied by the earlyclobber, making the constraint redundant. That means for a set of constraints, the following rule applies:
{not_all_same(Dst, Op0, Op1), earlyclobber(Dst)} => {earlyclobber(Dst)}
  • Tied operands conflict with not_all_same in that two operands cannot be the same if one of them must be different. That means the following rule applies:
{IsTied(Dst, Op0), not_all_same(Dst, Op0, Op1, Op2)} => {IsTied(Dst, Op0), not_all_same(Op1, Op2)}

The register allocator is still free to choose either Op1 or Op2 to be the same as Dst(==Op0), but not both.

  • For the case where there would only be one remaining register that needs to be different, the following rule applies:
{IsTied(Dst, Op0), not_all_same(Dst, Op0, Op1)} => {IsTied(Dst, Op0), EarlyClobber(Dst)}

This would satisfy that Op1 must be different from Dst/Op0. According to the LangRef this case is already supported:

It is permitted to tie an input to an “early-clobber” output. In that case, no other input may share the same register as the input tied to the early-clobber (even when the other input has the same value).

The only case that would not be supported is having multiple overlapping not_all_same constraints, but I'm not sure how important it is to support that (as long as TableGen gives an error when that happens).

Adding support for multiple constraints is better if we want to extend TableGen with new constraints in the future, but if my reasoning above is correct, it may not be required to consistently represent not_all_same. I thought it was worth pointing out there is something in between the current patch and major surgery. Curious to hear your thoughts!

Matt added a subscriber: Matt.Jan 19 2021, 9:13 AM
dancgr added a subscriber: dancgr.May 7 2021, 9:05 AM

@paulwalker-arm @sdesmalen I will be picking up this work, I will study this patch and see if I have any questions. Thanks!

dancgr added inline comments.May 12 2021, 9:41 AM
llvm/utils/TableGen/CodeGenInstruction.cpp
291

@sdesmalen As far as I understood, you are trying to question whether or not we need to add support for multiple constraints on the same operand, correct?

I agree with your statement. I don't think we need not_all_same and early_clobber at the same time, if early_clobber is statisfied, then not_all_same will also be satisfied. And I agree that there is a conflict between not_all_same and tied operands. Then we are left with the possible need of overlapping not_all_same constraints.

The main example is preventing this:

vz0 = FDIV_ZERO pz1/z, vz2, vz2 => z0 = FDIV_ZERO p1, z0, z0

To support this case we need only:

let Constraints = "not_all_same($Zdn, $_Zdn, $Zm)" in {
  // a[i] = p[i] ? b[i] / c[i] : 0
  defm FDIV_ZERO  : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)>
  // a[i] = p[i] ? b[i] / c[i] : 0
  defm FDIVR_ZERO : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)>
}

For that we will have: The single constraint prevents vz0 and vz2 from being assinged the same register.

vz0 = FDIV_ZERO pz1/z, vz2, vz2 => z1 = FDIV_ZERO p1, z0, z0

If that is the case, I think we can change this patch to only add an extra constraint and remove the multiple constraints aspect.

Is there anything that I missed?

@paulwalker-arm @sdesmalen I will be picking up this work, I will study this patch and see if I have any questions. Thanks!

Thank you. I'm sorry about not being present. I had a job change and can't spend time on this right now.

Allen added a subscriber: Allen.Apr 23 2022, 6:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2022, 6:33 AM
Herald added a subscriber: ctetreau. · View Herald Transcript

Could you use hasExtraSrcRegAllocReq/hasExtraDefRegAllocReq?