This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Allow bitcount ops to have different result type
ClosedPublic

Authored by arsenm on Jan 25 2019, 8:48 AM.

Details

Summary

For AMDGPU the result is always 32-bit for 64-bit inputs.

Diff Detail

Event Timeline

arsenm created this revision.Jan 25 2019, 8:48 AM
aemerson accepted this revision.Jan 30 2019, 1:43 PM

Seems ok, but maybe we should have some verifier checks that the dest size is wide enough to count the bits of the source?

This revision is now accepted and ready to land.Jan 30 2019, 1:43 PM
arsenm closed this revision.Jan 30 2019, 6:09 PM

r352717

rtereshin added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

Hi Matt @arsenm,

Looks like adding widenScalarDst call for type index 0 is not enough to adjust the widening to the new type constraints in full. All the code below it that now would handle type index 1 widening appears to be written under the assumption that destination and source types are the same, and it widens both, source and the destination. It was perfectly fine an assumption before, but not anymore. If the source needs to be widened, it generates incorrect code.

Here's a unit test that reproduces one of the possible issues (thanks Aditya @aditya_nandakumar for coming up with it!):

diff --git a/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp b/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp
index 815f1e3d321..a5798693c08 100644
--- a/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp
+++ b/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp
@@ -341,6 +341,31 @@ TEST_F(GISelMITest, WidenBitCountingCTPOP) {
   EXPECT_TRUE(CheckMachineFunction(*MF, CheckStr)) << *MF;
 }
 
+// CTPOP widening.
+TEST_F(GISelMITest, WidenBitCountingCTPOP1) {
+  if (!TM)
+    return;
+
+  // Declare your legalization info
+  DefineLegalizerInfo(A, {
+    getActionDefinitionsBuilder(G_CTPOP).legalFor({{s16, s16}});
+  });
+  // Build
+  // Trunc it to s8.
+  LLT s8{LLT::scalar(8)};
+  LLT s16{LLT::scalar(16)};
+  auto MIBTrunc = B.buildTrunc(s8, Copies[0]);
+  auto MIBCTPOP = B.buildInstr(TargetOpcode::G_CTPOP, {s16}, {MIBTrunc});
+  AInfo Info(MF->getSubtarget());
+  DummyGISelObserver Observer;
+  LegalizerHelper Helper(*MF, Info, Observer, B);
+  EXPECT_TRUE(Helper.widenScalar(*MIBCTPOP, 1, s16) ==
+              LegalizerHelper::LegalizeResult::Legalized);
+
+
+  MF->dump();
+}
+
 // CTTZ_ZERO_UNDEF widening.
 TEST_F(GISelMITest, WidenBitCountingCTTZ_ZERO_UNDEF) {
   if (!TM)

Could you please take a look at this?

test/CodeGen/AMDGPU/GlobalISel/legalize-ctlz.mir
78

Is this really legal?

unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp
338

Another example of the behavior - isn't widening for type index 1 supposed to widen only the source? But as could be seen here, it ends up widening both the source and the destination. The resulting code isn't broken largely because the source and destination types of the input instruction are the same.

arsenm marked 2 inline comments as done.Feb 4 2019, 1:46 PM

These are all bugs hidden by not calling the observer when doing the index 0 legalization

unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp
338

Yes, for these the source and dest can be separately legalized

arsenm marked an inline comment as done.Feb 4 2019, 2:26 PM
arsenm added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

Try r353102

rtereshin added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

Hi Matt,

That's much better, thank you!

I feel a little bit divided though about the fact that in some cases widening one type index ends up changing the type of the other type index.

For instance, in this case widening type index 1 ends up also widening type index 0:

// CTLZ widening.
TEST_F(GISelMITest, WidenBitCountingCTLZ) {
  if (!TM)
    return;

  // Declare your legalization info
  DefineLegalizerInfo(A, {
    getActionDefinitionsBuilder(G_CTLZ).legalFor({{s16, s16}});
  });
  // Build
  // Trunc it to s8.
  LLT s8{LLT::scalar(8)};
  LLT s16{LLT::scalar(16)};
  auto MIBTrunc = B.buildTrunc(s8, Copies[0]);
  auto MIBCTLZ = B.buildInstr(TargetOpcode::G_CTLZ, {s8}, {MIBTrunc}); // We start with the type signature {s8, s8}
  AInfo Info(MF->getSubtarget());
  DummyGISelObserver Observer;
  LegalizerHelper Helper(*MF, Info, Observer, B);
  EXPECT_TRUE(Helper.widenScalar(*MIBCTLZ, 1, s16) ==  // We ask the API to widen type index 1 to s16, probably expecting type signature {s8, s16} as a result
              LegalizerHelper::LegalizeResult::Legalized);

  auto CheckStr = R"(
  CHECK: [[Trunc:%[0-9]+]]:_(s8) = G_TRUNC
  CHECK: [[Zext:%[0-9]+]]:_(s16) = G_ZEXT [[Trunc]]
  CHECK: [[Ctlz:%[0-9]+]]:_(s16) = G_CTLZ [[Zext]]  # The API ends up widening not just type index 1, but also type index 0 with the resulting {s16, s16} type signature
  CHECK: [[Cst8:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
  CHECK: [[Sub:%[0-9]+]]:_(s16) = G_SUB [[Ctlz]]:_, [[Cst8]]:_
  CHECK: [[Trunc:%[0-9]+]]:_(s8) = G_TRUNC [[Sub]]
  )";

  // Check
  EXPECT_TRUE(CheckMachineFunction(*MF, CheckStr)) << *MF;
}

In this other case widening type index 1 ends up also narrowing type index 0:

// Test a strange case where the result is wider than the source
TEST_F(GISelMITest, WidenBitCountingCTPOP2) {
  if (!TM)
    return;

  // Declare your legalization info
  DefineLegalizerInfo(A, {
      getActionDefinitionsBuilder(G_CTPOP).legalFor({{s32, s16}});
    });

  // Build
  // Trunc it to s8.
  LLT s8{LLT::scalar(8)};
  LLT s16{LLT::scalar(16)};
  LLT s32{LLT::scalar(32)};
  auto MIBTrunc = B.buildTrunc(s8, Copies[0]);
  auto MIBCTPOP = B.buildInstr(TargetOpcode::G_CTPOP, {s32}, {MIBTrunc}); // We start with type signature {s32, s8}
  AInfo Info(MF->getSubtarget());
  DummyGISelObserver Observer;
  LegalizerHelper Helper(*MF, Info, Observer, B);
  EXPECT_EQ(LegalizerHelper::LegalizeResult::Legalized,  // We ask the API to widen type index 1 to s16, probably expecting {s32, s16} as a result
            Helper.widenScalar(*MIBCTPOP, 1, s16));

  auto CheckStr = R"(
  CHECK: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC %0:_(s64)
  CHECK: [[ZEXT:%[0-9]+]]:_(s16) = G_ZEXT [[TRUNC]]:_(s8)
  CHECK: [[CTPOP:%[0-9]+]]:_(s16) = G_CTPOP [[ZEXT]]  # API ends up narrowing the type index 0 in the process with {s16, s16} type signature as a result.
  CHECK: [[COPY:%[0-9]+]]:_(s32) = G_ZEXT [[CTPOP]]:_(s16)
  )";

  EXPECT_TRUE(CheckMachineFunction(*MF, CheckStr)) << *MF;
}

I talked to @aditya_nandakumar, @volkan, and @qcolombet offline and it appears that we don't have any contract about what pre-defined operations like widenScalar, narrowScalar etc. are allowed to do with the type indices not included in the request, but I think it maybe worth introducing such a contract. Specifically the one that would state that it's guaranteed that these generic, pre-defined operations aren't allowed to change the types of the type indices except the one(s) explicitly included in the request.

I think not having these operations following such a contract makes their behavior quite unexpected for anyone who focuses on writing legalizations rules for their target w/o having an extensive knowledge about how exactly the implementation of the pre-defined API-provided operations looks like. In particular, I think it might make it easier for the users to accidentally introduce loops in legalization rules, and harder for them to understand where they are coming from.

Please let me know what do you think!

Thanks,
Roman

arsenm marked an inline comment as done.Feb 4 2019, 3:58 PM
arsenm added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

In general I don't think it's in general possible to avoid changing other type indexes. For example, FewerElements for a select's condition type requires doing the same FewerElements for the select type. The same applies for any vector cast. The two type indexes won't match, but they still need to have the same number of elements.

I don't think these examples actually show this though. Note in these examples, these aren't producing the fully legalized operation. They're only doing one step, but need at least one more to legalize the operation

arsenm marked an inline comment as done.Feb 4 2019, 4:08 PM
arsenm added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

A bigger problem I have is the order of legalization steps matters. The intermediate operations will be done in a different type if you specify the legalize actions for type index 1 before 0 vs. the other wya around. You wouldn't see the new operation with the narrower type if you legalized in the other order

rtereshin added inline comments.Feb 4 2019, 4:13 PM
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

A bigger problem I have is the order of legalization steps matters. The intermediate operations will be done in a different type if you specify the legalize actions for type index 1 before 0 vs. the other wya around. You wouldn't see the new operation with the narrower type if you legalized in the other order

That's a very good point!

dsanders added inline comments.Feb 4 2019, 4:16 PM
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

I generally agree that the types should generally be modified independently. Not following that makes it fairly easy to get into infinite loops where one type is too big and one is too small. However...

There are some difficulties with such a rule though (at least with the current definitions of operations). Suppose you have:

%dst:(s4) = G_EXTRACT %src:(s16), %idx:s32

and the legalizer is told to widen %dst to s32. G_EXTRACT currently expects all the result bits to originate from the input which isn't possible if the input stays fixed at s16. Arguably, G_EXTRACT should have a size operand anyway at which point we can treat it the excess bits as an any-extension.

There's also some performance costs to such a rule as it increases the number of hops needed to get from illegal to legal. I generally think targets should be allowed to leap directly to the right answer whenever possible. From that point of view, the limitation that we can only specify one index and type as the target for a legalization step is a bit of a nuisance. Personally, I'd like to replace the LegalizationAction with a function pointer that just performs the desired change and have a standard library of actions that can be called for the common changes. This also has the nice side-benefit of removing the Custom action and the limitations that has (the opcode must always be changed to something that is Legal rather than Custom).

qcolombet added inline comments.Feb 5 2019, 9:53 AM
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

For example, FewerElements for a select's condition type requires doing the same FewerElements for the select type. The same applies for any vector cast. The two type indexes won't match, but they still need to have the same number of elements.

That's a good example, but that doesn't mean we shouldn't come up with some kind of contract.
The thing I am after is a way to tell if we are creating loops in our legalization rules.

I haven't thought if this is practical, but we could imagine that the contract if we don't change the element type of the other operands. The number of elements can change but in a reasonable way (at least for the generic code, custom legalization does what it wants).

What do you think?

Personally, I'd like to replace the LegalizationAction with a function pointer that just performs the desired change and have a standard library of actions that can be called for the common changes.

@dsanders didn't we try something like that for ISel and the performance was terrible?

Should we revert this change to have the conversation going or am I the only one unhappy with the absence of guarantees for the other operands?

Personally, I'd like to replace the LegalizationAction with a function pointer that just performs the desired change and have a standard library of actions that can be called for the common changes.

@dsanders didn't we try something like that for ISel and the performance was terrible?

You're thinking of the hundreds-thousands of single-use lambdas that were generated. It was the ugly temporary code that was used to inject multiple statements inside an if-statement condition and looked like this:

if ([&](...) {
    if (!...) return false;
    if (!...) return false;
    if (!...) return false;
    if (!...) return false;
    return true;
    }(Arg1, Arg2)) {
      ...
}

The legalizer is already done the way I'm talking about on the rule matching side and performance was marginally better on average than the old setAction() interface.

arsenm marked an inline comment as done.Feb 5 2019, 10:11 AM
arsenm added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

I guess it depends what you mean by changing the other types. In this case with these, this is really expanding the operation which happens to involve a new instance of the same operation in the requested narrowed type.

Other operations really have type constraints not expressed in the simple, independent type index list (like build_vector requiring the input scalars to match the output vector element type). If you want to widen the scalars, you have to widen the result as well.

I agree legalizations should try to avoid changing the other types if possible, but for most operations I don't think it's avoidable. I was thinking we should maybe enforce in these sorts of linked cases by only allowing one of the type indexes to be specified in a mutation. I've found it kind of annoying to have to figure out what the other type index needs to do in either case, and it would simplify code to only have to worry about one of them.

For the shift amount type, and bit count result type, I still think these are more of an ask the target for what it wants to use situation rather than a type index which should be subject to legalization

The legalizer is already done the way I'm talking about on the rule matching side and performance was marginally better on average than the old setAction() interface.

Got it.
That said, to me the legalizer is no different than ISel or a sort of InstCombine, thus I would expect we eventually use the same engine/automaton for selecting/lowering what goes through. The same goes with the way we represent them, although admittedly legalization is a bit more complicated because we have to come up with a way of representing the constraints on the unbound input types.

lib/CodeGen/GlobalISel/LegalizerHelper.cpp
756

I agree legalizations should try to avoid changing the other types if possible, but for most operations I don't think it's avoidable.

I agree this is unavoidable in some case. What I have in mind is that we should somehow document this effect in the API, so that we can detect cycles and run verifier.
I understand that doing all this manually is painful and we may not want to tackle that for now, however, long term, I believe we should TableGen most of the legalization transformation and generate checks along the way.

The legalizer is already done the way I'm talking about on the rule matching side and performance was marginally better on average than the old setAction() interface.

Got it.
That said, to me the legalizer is no different than ISel or a sort of InstCombine, thus I would expect we eventually use the same engine/automaton for selecting/lowering what goes through. The same goes with the way we represent them, although admittedly legalization is a bit more complicated because we have to come up with a way of representing the constraints on the unbound input types.

I'd agree with that. Though it's worth pointing out that none of those currently build a description of what it wants to do like the legalizer currently does, they just make the change straightaway when the rules match. Removing in LegalizationAction in favour of just making the change brings the legalizer closer to the others.

Should we revert this change to have the conversation going or am I the only one unhappy with the absence of guarantees for the other operands?

I'm not happy with the current state too!

Though, maybe reverting is a bit of a hassle, there is already a follow-up committed later that fixes functional bugs in here (https://reviews.llvm.org/rL353102), it'll have to be reverted too. Maybe a follow up commit instead?

In general, I think we can continue figuring out what the contract needs to be, and adopt "don't change type indices that aren't explicitly requested to be changed unless it's impossible to do so w/o introducing broken MIR" as a guideline right away.

Also, it's just a gut feeling, so it could be a complete nonsense as I didn't give it much thought, but I feel especially bad about the version that widens one type index and narrows the other one. At least just widening more than asked to strictly increases the entire type tuple (as in partial order). Skewing the type tuple like so on the other hand might be more loop-prone, I think.

Though it's worth pointing out that none of those currently build a description of what it wants to do like the legalizer currently does, they just make the change straightaway when the rules match. Removing in LegalizationAction in favour of just making the change brings the legalizer closer to the others.

Agree. This kind of mode would only be used for debug purposes anyway (like with expensive_checks).

The type used in this case for the result has more to do with the type required for the operations after required for the legalization. The G_SUB can be done in either width, but the narrower sub seems like a more preferable canonical form.

The type used in this case for the result has more to do with the type required for the operations after required for the legalization. The G_SUB can be done in either width, but the narrower sub seems like a more preferable canonical form.

That's a very good point. On the other hand, I think it's a general and expected behavior of the Legalizer to use truncates and extends to bridge the types. So it's possible to keep the result of the bit op itself as it is, truncate it if it's safe (due to the original narrow type of the source), and do the G_SUB of the truncated type.

Subsequent steps of the legalization, if they narrow the destination of the bit op itself, will eliminate that truncate as a legalization artifact. In the same time, having G_SUB with of a narrow type generated in the process of widening something isn't much of a problem contract-wise or introducing-loops-wise because AFAIK the legalizer in general legalizes one instruction at a time based on the properties on that instruction only, so that G_SUB will be legalized later on just like any other G_SUB with no interference from the surrounding ops. That latter argument is not true when it's different operands (or type indices) of the same opcode specifically because of what we're discussing here.