This is an archive of the discontinued LLVM Phabricator instance.

[GlobalIsel][X86] Update legalization of G_ADD/G_SUB
ClosedPublic

Authored by RKSimon on Jun 1 2023, 4:29 AM.

Details

Summary

Replace the legacy legalizers for G_ADD/G_SUB with an approach that should work for the full mixture of scalar/vector types we need to support for most opcodes.

If we can decide on this as a suitable pattern we can reuse it for the other op types, but G_ADD/G_SUB are the only ones that have decent legalizer test coverage at this stage.

I've also refactored the G_CTPOP/G_LZCNT entries to match for reference (I'll commit these separately).

Diff Detail

Event Timeline

RKSimon created this revision.Jun 1 2023, 4:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 4:29 AM
RKSimon requested review of this revision.Jun 1 2023, 4:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 4:29 AM
Herald added a subscriber: wdng. · View Herald Transcript

This probably does not compile, but it is my prototype for G_AND:

// AND and PAND
unsigned bitWidth = 64;
auto& bitBuilder = getActionDefinitionsBuilder(G_AND);

if (Subtarget.hasAVX512()) {
  bitBuilder.legalFor({/*512 bit*/v64s8, v32s16, v16s32, v8s64,
    /* 256 bit */ v32s8, v16s16, v8s32, v4s64,
    /* 128 bit */ v16s8, v8s16, v4s32, v2s64,
    /* 64 bit */ s8, s16, s32, s64});
  bitWidth = 512;
} else if (Subtarget.hasAVX2()) {
  bitBuilder.legalFor({
    /* 256 bit */ v32s8, v16s16, v8s32, v4s64,
    /* 128 bit */ v16s8, v8s16, v4s32, v2s64,
    /* 64 bit */ s8, s16, s32, s64});
  bitWidth = 256;
} else if (Subtarget.hasSSE2()) {
  bitBuilder.legalFor({
    /* 128 bit */ v16s8, v8s16, v4s32, v2s64,
    /* 64 bit */ s8, s16, s32, s64});
  bitWidth = 128;
} else { // 64-bit
  bitBuilder.legalFor({/* 64 bit */ s8, s16, s32, s64});
}

bitBuilder.widenScalarToNextPow(0, /*Min=*/ 8)
  .clampScalar(0, s8, s64)
  .clampNumElements(0, s8, s64)
  .clampMaxNumElements(0, s8, bitWidth / 8)
  .clampMaxNumElements(0, s16, bitWidth / 16)
  .clampMaxNumElements(0, s32, bitWidth / 32)
  .clampMaxNumElements(0, s64, bitWidth / 64);

Yes, logic, mul, fp, shifts, etc are all on my wip branch (as with most of the opcodes the gisel tests need a major cleanup first).

I found this pattern in AArch64. Note that they invoke legalForonly once! In my prototype, I also legalize only once.

auto &MinMaxActions = getActionDefinitionsBuilder(
    {G_SMIN, G_SMAX, G_UMIN, G_UMAX});
if (HasCSSC)
  MinMaxActions
      .legalFor({s32, s64, v8s8, v16s8, v4s16, v8s16, v2s32, v4s32})
      // Making clamping conditional on CSSC extension as without legal types we
      // lower to CMP which can fold one of the two sxtb's we'd otherwise need
      // if we detect a type smaller than 32-bit.
      .minScalar(0, s32);
else
  MinMaxActions
      .legalFor({v8s8, v16s8, v4s16, v8s16, v2s32, v4s32});
MinMaxActions
    .clampNumElements(0, v8s8, v16s8)
    .clampNumElements(0, v4s16, v8s16)
    .clampNumElements(0, v2s32, v4s32)
    // FIXME: This sholdn't be needed as v2s64 types are going to
    // be expanded anyway, but G_ICMP doesn't support splitting vectors yet
    .clampNumElements(0, v2s64, v2s64)
    .lower();
RKSimon updated this revision to Diff 527411.Jun 1 2023, 7:17 AM

Keep track of the maximum vector element type for type clamping - still not happy with this, but definitely cleaner

I'm not certain if the legalizer API has a better way we can do this?

I'm not super happy with the legalize rules API. I think it's over-constraining. In particular I don't see the point of having separate fewer/more actions. We could just have one ChangeType/ChangeElement action.

The clamping is also really annoying in the common case where you don't want intermediate non-power-of-2s as you've noticed. Some of the existing rule orderings are also working around missing paths in the legalizer, some of which are fixed. For example if you clamp a non-power-of-2 operation, the requested type ends up being the upper bound of the range and fails for implementations that only handle evenly divisible cases

RKSimon updated this revision to Diff 527556.Jun 1 2023, 12:25 PM
RKSimon edited the summary of this revision. (Show Details)

Use legalIf() and split clampNumElements to simplify max num elements logic.

I'm pretty happy with this for now in terms of clarity, and it should work easily to let us remove the existing legacy legalizers and then add other legal ops on top.

arsenm accepted this revision.Jun 1 2023, 12:47 PM
This revision is now accepted and ready to land.Jun 1 2023, 12:47 PM
tschuett added a comment.EditedJun 1 2023, 1:03 PM

No tests? X86 shows this. How would a legalizer API for X86 look like without abusing legalIf.

The G_ADD legalizer tests are already in place for the legacy legalizer, I can add equivalent G_SUB coverage pretty easily - support for further opcodes needs better tests before they can be added. I'll start with the logic ops and integer multiplies.

If we wanted to add X86 specific legalizers, legalFor64Bit/legalForSSE2/AVX2 style wrappers etc. would probably be quite clean - can/should LegalizeRuleSet be overloaded on a per-target basis?

A wrapper class that takes an ActionBuilder and a SubTarget?

This revision was landed with ongoing or failed builds.Jun 2 2023, 3:35 AM
This revision was automatically updated to reflect the committed changes.