Page MenuHomePhabricator

[RFC][GlobalISel] Enable legalizing non-power-of-2 sized types.
ClosedPublic

Authored by kristof.beyls on Mar 2 2017, 2:43 AM.

Details

Summary

I've been working on and off on implementing/designing support for
non-power-of-2-sized types in GlobalISel. By now, I think I've iterated
to a plausible design, but the patch is large-ish and I think it might
make more sense to first discuss the design tradeoffs at a higher level.
That's why I tried to add a higher-level description here that I hope
will help in reviewing the design.

  1. Interface for targets to describe how to legalize.

In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.

For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel works today. For example, for a
target that supports 32 and 64 bit adds natively:

for (auto Ty : {s32, s64})

setAction({G_ADD, 0, s32}, Legal);

or for a target that needs a library call for a 32 bit division:

setAction({G_SDIV, s32}, Libcall);

The main conceptual change I propose to the LegalizerInfo API, is in
specifying how to legalize the type sizes for which a change of size is
needed. For example, in the above example, how to specify how all types
from i1 to i8388607 (apart from s32 and s64 which are legal) need to be
legalized and expressed in terms of operations on the available legal
sizes (again, i32 and i64 in this case). Up till now, the implementation
only allows specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation is that if you'd want to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.

Instead, my proposal is to specify the legalization actions that need
to change the size of the type to be specified using a
"SizeChangeStrategy". For example:

setLegalizeScalarToDifferentSizeStrategy(
    G_ADD, 0, widenToLargerAndNarrowToLargest);

This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.

Another example, taken from the ARM backend is:

for (unsigned Op : {G_SDIV, G_UDIV}) {
  setLegalizeScalarToDifferentSizeStrategy(Op, 0,
      widenToLargerTypesUnsupportedOtherwise);
  if (ST.hasDivideInARMMode())
    setAction({Op, s32}, Legal);
  else
    setAction({Op, s32}, Libcall);
}

For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).

The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:

setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
setLegalizeVectorElementToDifferentSizeStrategy(
    G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);

As currently implemented in this patch, vector types are legalized by
first making the vector element size legal, followed by then making the
number of lanes legal. The strategy to follow in the first step is set
by a call to setLegalizeVectorElementToDifferentSizeStrategy, see
example above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see patch for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.

Therefore, for the above specification, some example legalizations are:

  • getAction({G_ADD, LLT::vector(3, 3)}) returns {WidenScalar, LLT::vector(3, 8)}
  • getAction({G_ADD, LLT::vector(3, 8)}) then returns {MoreElements, LLT::vector(8, 8)}
  • getAction({G_ADD, LLT::vector(20, 8)}) returns {FewerElements, LLT::vector(16, 8)}
  1. Key implementation aspects.

How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:

setScalarAction({G_ADD, LLT:scalar(1)},
                {{1, WidenScalar},  // bit sizes [ 1, 31[
                 {32, Legal},       // bit sizes [32, 33[
                 {33, WidenScalar}, // bit sizes [33, 64[
                 {64, Legal},       // bit sizes [64, 65[
                 {65, NarrowScalar} // bit sizes [65, +inf[
                });

Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is missing, this is just trying to make it
possible for targets to specify what is legal, and how non-legal types
should be legalized. Probably quite a bit of further work is needed in
the actual legalizing and the other passes in GlobalISel to support
non-power-of-2 sized types.

I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.

In the existing targets having some support for GlobalIsel, I tried to not
change the semantics of what is defined in setAction at the moment for ARM,
X86 and AMDGPU.

This drops the need for:

  • LLT::{half,double}...Size().

This might make legalization slower than before (I didn't try to measure
it yet), but I'm assuming that by introducing one or a few caches (see
FIXMEs), we can remove most of the overhead. I thought I'd try to get
some feedback on the high-level design before putting too much further
effort in...

Diff Detail

Repository
rL LLVM

Event Timeline

kristof.beyls created this revision.Mar 2 2017, 2:43 AM
  • Rebased to top-of-trunk.
  • Correctly store Action info for pointer types for targets with multiple address spaces.
  • Added a few basic checks to verify correctness of a targets legalization specification.
  • Added one more static function to LegalizerInfo to make specifications shorter and more readable. (UnsupportedButFor).
  • Adapted setAction API to be able to specify if a particular setAction should be the first specification on the operation, or if it could be a refinement.
  • Introduced a few typedefs to improve readability.
  • Made legalization info identical to current ToT for all backends, so that this patch becomes NFCi.

Hi Kristof,

I haven't seen the patch at all, but what about situations where 64-bit is done with lib calls? For example, ldivmod takes 64-bit arguments and you wouldn't want to narrow them to 32-bits.

If this patch is intended to just simplify the legal vs. others, it shouldn't have a narrow-all that spans to +inf. Makes sense?

cheers,
--renato

Hi Kristof,

I haven't seen the patch at all, but what about situations where 64-bit is done with lib calls? For example, ldivmod takes 64-bit arguments and you wouldn't want to narrow them to 32-bits.

If this patch is intended to just simplify the legal vs. others, it shouldn't have a narrow-all that spans to +inf. Makes sense?

cheers,
--renato

Hi Renato,

This patch should allow to specify everything you want to do for each individual bitsize, if that's what you want.
I'm not exactly sure what exact actions you're thinking of are needed for the different sizes of div or mod, but for example, you could specify:

setAction({G_REM, LLT:scalar(1)},
          {{1, WidenScalar},  // bit sizes [ 1, 31[
           {32, Lower},       // bit sizes [32, 33[
           {33, NarrowScalar}, // bit sizes [33, 63[
           {64, Libcall}, // bit sizes [64, 65[
           {65, Unsupported} // bit sizes [65, +inf[
          });

I'm assuming the above example wouldn't be what you want to do in detail, but it just shows that for different bit sizes, you can specify to do different things to legalize.

Since the design should allow to specify what to do for all bit sizes, there should be a mapping from the set of natural numbers (all bit sizes) to what action to take.
In this patch, I chose for that mapping to be represented using a simple vector, with the boundary bit sizes where the action changes represented as an element in the array.

As this can get a bit verbose, I've added a few helper/syntactic sugar functions that help to specify typical specifications concisely. I created these functions based on the specifications that already exist in the existing backends that support GlobalISel.
The most-used example in this patch is UnsupportedButFor. An example of how it's used from the AArch64 backend is:

for (unsigned BinOp : {G_SREM, G_UREM})
  setAction({BinOp, Scalar}, UnsupportedButFor({1,8,16,32,64}, Lower));

which without this helper function, would be written as something like:

for (unsigned BinOp : {G_SREM, G_UREM})
  setAction({BinOp, Scalar}, 
        {{1, Lower},  // bit sizes [ 1, 1[
         {2, Unsupported},       // bit sizes [2, 8[
         {8, Lower},  // bit sizes [ 8, 8[
         {9, Unsupported},       // bit sizes [9, 16[
         {16, Lower},  // bit sizes [ 16, 17[
         {17, Unsupported},       // bit sizes [17, 32[
         {32, Lower},  // bit sizes [ 32, 33[
         {33, Unsupported},       // bit sizes [33, 64[
         {64, Lower},  // bit sizes [ 64, 65[
         {65, Unsupported} // bit sizes [65, +inf[
        }

You can find more examples in the changes in this patch in the {Target}LegalarizerInfo.cpp files.
Of course, the whole idea of this patch is to be able to easily specify what to do to legalize ranges of bit sizes, e.g. using "NarrowScalar" or "WidenScalar" on a wide range of bitsizes.
Without this patch, currently every single bitsize has to be explicitly enumerated, which is far from ideal.
There are a few examples of how this is done in the first version of the patch on this review. In the second version, I decided to make the patch NFC, which as a consequence means there aren't many (no?) examples of "NarrowScalar" or "WidenScalar" over a range of bit sizes, as that wasn't easily specified before.
One simple example could be:

for (unsigned BinOp : {G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL}) {
    // These operations naturally get the right answer when used on
    // GPR32, even if the actual type is narrower.
    setAction({BinOp, s1},
      {{1, WidenScalar},
       {32, Legal},
       {33, WidenScalar},
       {64, Legal},
       {65, NarrowScalar}
      });

which specifies the intended action for all bit sizes. Before this patch, you'd have to specify (have a call to setAction) for every single bit size you'd want to support. Apart from wasting a lot of memory in tables, you'd also need to decide what the largest bit size would be you'd like to support, as you wouldn't be able to specify "all bit sizes larger than this should be legalized using this action".

I hope the above makes sense?

Thanks,

Kristof

rengolin added a comment.EditedMar 13 2017, 5:06 AM
setAction({G_REM, LLT:scalar(1)},
          {{1, WidenScalar},  // bit sizes [ 1, 31[
           {32, Lower},       // bit sizes [32, 33[
           {33, NarrowScalar}, // bit sizes [33, 63[
           {64, Libcall}, // bit sizes [64, 65[
           {65, Unsupported} // bit sizes [65, +inf[
          });

I'd expect 33~63 to Widen+LibCall here.

I hope the above makes sense?

It does, but there are two issues here:

  1. How would this inter-operate with table-gen?

IIUC, the idea is to move as much as possible to table-gen. Currently (SelDAG), instructions that are described in table-gen are "legal". It would be good to re-use as much as possible of that, to avoid table-gen bloat.

Are we going to ignore that info and re-build a specific lowering database? Or re-use that for the lowering (thus needing merge, see below)? Or is this technique only for when the generic instruction doesn't map to anything in table-gen?

  1. Would this allow merging data?

When sub-arch specific decisions are concerned, having a way to override a base default case would reduce the amount of code on both table-gen and c++ parts.

For example, we could have a default catch-all case like {1,widen; 32,legal; 33,unsupp}, and later on, based on sub-arch decisions, increment legality, lib calls, etc.

cheers,
--renato

I hope the above makes sense?

It does, but there are two issues here:

Those are probably the most interesting questions around this patch that I don't have an answer too, and I hope this review can help with getting closer to an answer.
Thanks for making them explicit here. (Of course there may be other big issues hiding here, but I'm not aware of them at the moment).

The hard part is that I don't think there's a good answer yet indeed on the question on whether it is possible to incrementally specify how to legalize different bitsizes on a specific operation.
A few more thoughts inline below.

  1. How would this inter-operate with table-gen?

IIUC, the idea is to move as much as possible to table-gen. Currently (SelDAG), instructions that are described in table-gen are "legal". It would be good to re-use as much as possible of that, to avoid table-gen bloat.

Are we going to ignore that info and re-build a specific lowering database? Or re-use that for the lowering (thus needing merge, see below)? Or is this technique only for when the generic instruction doesn't map to anything in table-gen?

Indeed, it would be best not to ignore that info. Or at least not violate the DRY principle. Or if we did end up violating that principle, in an asserts-build make sure that we'd assert if the different pieces of info would conflict.
That being said, there might be a hint of a solution already in this patch. One of the helper functions to more concisely specify how to legalize all bit sizes in this patch is getWidenToLargerTypesAndNarrowToLargest.
Let me copy paste the documentation I wrote for it here:

/// Helper function for the common case where legalization for a particular
/// operation consists of widening the type to a large legal type, unless
/// there is no such type and then instead it should be narrowed to the
/// largest legal type. E.g.
/// setAction({G_ADD, LLT:scalar(1)},
///           {{1, WidenScalar},  // bit sizes [ 1, 31[
///            {32, Legal},       // bit sizes [32, 33[
///            {33, WidenScalar}, // bit sizes [33, 64[
///            {64, Legal},       // bit sizes [64, 65[
///            {65, NarrowScalar} // bit sizes [65, +inf[
///           });
/// can be shortened to:
/// setAction({G_ADD, LLT:scalar(1)},
///           getWidenToLargerTypesAndNarrowToLargest(
///            {32, Legal}, {64, Legal}));

The info that a G_ADD is legal on 32 and 64-bit types could indeed be retrieved from tablegen.
The fact that WidenScalar is a good way to legalize if there is a wider legal type is, if I'm not mistaken, target-indepedent, so that could be logic in the target-independent part.
I'm not entirely sure on the exact conditions for NarrowScalar to be an appropriate way to legalize adds that are larger than the largest legal size. Maybe it is also fully target-independent.
If it is, than the above setAction could indeed fully be derived from tablegen info and some target-independent logic.
FWIW, the above seems quite similar to what current SelDAG type-legalization does at https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/TargetLoweringBase.cpp#L1341 (if I understand that code correctly).

  1. Would this allow merging data?

When sub-arch specific decisions are concerned, having a way to override a base default case would reduce the amount of code on both table-gen and c++ parts.

For example, we could have a default catch-all case like {1,widen; 32,legal; 33,unsupp}, and later on, based on sub-arch decisions, increment legality, lib calls, etc.

With the patch as is, you indeed need to re-specify the info for all bit sizes.
It could be that I pushed the current patch way too far in breaking existing APIs and if I turned back the patch to only change the internal representations used in LegalizerInfo.cpp,
this would work. So, the idea being that tablegen info and targets specify for which data types an action can be taken that doesn't require changing bitsize, such as Legal, Lower, Libcall.
And then the target-independent logic can hopefully make decisions on most/all operations on how to adapt types to different bitsizes when that's needed.
I'll look into that.

Thanks for sharing your thoughts!

dsanders edited edge metadata.Mar 13 2017, 11:35 AM

I haven't read the actual code yet, but I've got a couple questions and a comment based on the description and the conversation so far.

Another major change is that getAction no longer returns a single action, but
returns a sequence of actions, as legalizing non-power-of-2 types may need
multiple actions. For example: findLegalAction({G_REM, 13}) should return

[(WidenScalar, 32), (Lower, 32)], indicating to first widen the s13
scalar to 32 bits, and to then lower it, assuming the setAction on SREM
was something like:
setAction({G_REM, LLT:scalar(1)},

{{1, WidenScalar},  // bit sizes [ 1, 31[
 {32, Lower},       // bit sizes [32, 33[
 {33, NarrowScalar} // bit sizes [65, +inf[
});

Does findLegalAction() need to return a sequence here? I'm thinking that it could simply be called twice:

iterator I = findLegalAction({G_REM, 13}); // *I == (WidenScalar, 32)
iterator J = findLegalAction({G_REM, 32}); // *J == (Lower, 32), I could also be given as an argument to speed up the search

Also, given that the 2nd argument to setAction() describes all the bit sizes, is the bit-size of the LLT::scalar(1) still required for something?

How would this inter-operate with table-gen?
IIUC, the idea is to move as much as possible to table-gen. Currently (SelDAG), instructions that are described in table-gen are "legal". It would be good to re-use as much as possible of that, to avoid table-gen bloat.

I agree that there's a correlation there but I don't think it's the tablegen definition that specifies that they are legal. In SelectionDAG, it's the calls to setOperationAction() that specify legality and the default is 'Legal'.

Would this allow merging data?
When sub-arch specific decisions are concerned, having a way to override a base default case would reduce the amount of code on both table-gen and c++ parts.
For example, we could have a default catch-all case like {1,widen; 32,legal; 33,unsupp}, and later on, based on sub-arch decisions, increment legality, lib calls, etc.

With the patch as is, you indeed need to re-specify the info for all bit sizes.
It could be that I pushed the current patch way too far in breaking existing APIs and if I turned back the patch to only change the internal representations used in LegalizerInfo.cpp,
this would work. So, the idea being that tablegen info and targets specify for which data types an action can be taken that doesn't require changing bitsize, such as Legal, Lower, Libcall.
And then the target-independent logic can hopefully make decisions on most/all operations on how to adapt types to different bitsizes when that's needed.
I'll look into that.

One way to allow mergable data in your current API is with an 'Inherit' action like so:

setAction({G_ADD, LLT::scalar(1)}, {{1, Inherit}, {33, WidenScalar}, {64, Legal}, {65, NarrowScalar}});

This would keep existing actions for sizes 1-32, and replace the actions for size 33 and up.

One other possibility is to move the specification to tablegen, and have it figure out an array layout that is cheap to configure. For example, it could decide to take

{{1, WidenScalar}, {32, Legal}, {33, NarrowScalar}
{{1, WidenScalar}, {32, Legal}, {33, WidenScalar}, {64, Legal}, {65, NarrowScalar}} // if 64-bit supported

and create a default array like so:

{{1, WidenScalar}, {32, Legal}, {33, NarrowScalar}, {64, NarrowScalar}, {65, NarrowScalar}}

so that when 64-bit support is enabled it can just replace elements 2 and 3 with:

{33, WidenScalar}, {64, Legal}

I haven't read the actual code yet, but I've got a couple questions and a comment based on the description and the conversation so far.

Thanks for the comments - they're very useful!

Another major change is that getAction no longer returns a single action, but
returns a sequence of actions, as legalizing non-power-of-2 types may need
multiple actions. For example: findLegalAction({G_REM, 13}) should return

[(WidenScalar, 32), (Lower, 32)], indicating to first widen the s13
scalar to 32 bits, and to then lower it, assuming the setAction on SREM
was something like:
setAction({G_REM, LLT:scalar(1)},

{{1, WidenScalar},  // bit sizes [ 1, 31[
 {32, Lower},       // bit sizes [32, 33[
 {33, NarrowScalar} // bit sizes [65, +inf[
});

Does findLegalAction() need to return a sequence here? I'm thinking that it could simply be called twice:

iterator I = findLegalAction({G_REM, 13}); // *I == (WidenScalar, 32)
iterator J = findLegalAction({G_REM, 32}); // *J == (Lower, 32), I could also be given as an argument to speed up the search

I think both options are doable (calling multiple times like your example above, or returning all the legalization steps in on go like the patch currently does).
I don't have a very strong preference on one versus the other - at the moment, just returning the full sequence of legalization steps seemed a little bit conceptually cleaner to me.
I think that the decision on which option to take probably should be done based on which one is the higher-performing one.
My expectation is that the linear or binary search through the vector might be the slowest part, and therefore gathering all legalization steps in one go may be most efficient.
But of course, if you'd keep an iterator and pass it on between different findLegalAction calls, maybe the performance difference wouldn't be big.

I'm also assuming that we'll end up caching the results to findLegalAction queries per function to speed this up, and then the speed difference may be completely irrelevant.

Also, given that the 2nd argument to setAction() describes all the bit sizes, is the bit-size of the LLT::scalar(1) still required for something?

The only relevant part is that it's a LLT::scalar and not an LLT::pointer.
I could get rid of it by having separate setScalarAction/setPointerAction functions. The idea crossed my mind before.
Probably that will make the specifications easier to read, as I agree that the LLT::scalar(1) is a bit confusing.
I'll look into changing that.

Would this allow merging data?
When sub-arch specific decisions are concerned, having a way to override a base default case would reduce the amount of code on both table-gen and c++ parts.
For example, we could have a default catch-all case like {1,widen; 32,legal; 33,unsupp}, and later on, based on sub-arch decisions, increment legality, lib calls, etc.

With the patch as is, you indeed need to re-specify the info for all bit sizes.
It could be that I pushed the current patch way too far in breaking existing APIs and if I turned back the patch to only change the internal representations used in LegalizerInfo.cpp,
this would work. So, the idea being that tablegen info and targets specify for which data types an action can be taken that doesn't require changing bitsize, such as Legal, Lower, Libcall.
And then the target-independent logic can hopefully make decisions on most/all operations on how to adapt types to different bitsizes when that's needed.
I'll look into that.

One way to allow mergable data in your current API is with an 'Inherit' action like so:

setAction({G_ADD, LLT::scalar(1)}, {{1, Inherit}, {33, WidenScalar}, {64, Legal}, {65, NarrowScalar}});

This would keep existing actions for sizes 1-32, and replace the actions for size 33 and up.

That sounds like a promising idea!
It seems to have the nice quality that you could check that "Inherit" covers all but the largest bitsize specified before, so that asserts can protect users of the API from accidentally overwriting earlier specifications.
Or in other words, checking that when using "Inherit", you only extend the specification details, not re-specify earlier specifications. I just think it'd be nice to have those kinds of asserts as on architectures with many different subTargets, it's probably easy to make some mistake somewhere.
My assumption here is that most subTarget extensions will make more bitsizes natively supported/legal, rather than fewer.
This probably needs a bit more experimenting/going through several examples to see how it would play out in practice.

One other possibility is to move the specification to tablegen, and have it figure out an array layout that is cheap to configure. For example, it could decide to take

{{1, WidenScalar}, {32, Legal}, {33, NarrowScalar}
{{1, WidenScalar}, {32, Legal}, {33, WidenScalar}, {64, Legal}, {65, NarrowScalar}} // if 64-bit supported

and create a default array like so:

{{1, WidenScalar}, {32, Legal}, {33, NarrowScalar}, {64, NarrowScalar}, {65, NarrowScalar}}

so that when 64-bit support is enabled it can just replace elements 2 and 3 with:

{33, WidenScalar}, {64, Legal}

Moving this kind of information into tablegen currently seems a bit of overkill to me - but maybe I'm not getting the full reason why it might be beneficial.
As to the idea where you can replace elements in a fixed-size array: my assumption so far is that the creation of these data structures will not be on the critical path, so not worth optimizing for that. But I could be wrong of course.
Unless the technique would have benefits beyond making the creation of the data structures faster?

  • Split LegalizerInfo::setAction into setScalarAction and setPointerAction to avoid having to specify a mostly meaningless LLT as an argument just to indicate whether the type is a scalar or a pointer(with address space).
kristof.beyls retitled this revision from [GlobalISel] Enable specifying how to legalize non-power-of-2 size types. to [GlobalISel] Enable specifying how to legalize non-power-of-2 size types. [NFC-ish].
  • Made the API change to LegalizerInfo::setAction much smaller: the setAction API is largely unchanged now. The only difference is that it no longer allows legalizationActions that change the size to be specified this way.
  • Instead, specifying how to legalize when the size of the type legalized needs to change is specified using a SizeChangeStrategy. In follow-up work, I think that these size-changing strategies will turn out to be largely target-independent, and therefore can be shared between all targets, and not need to be respecified for each target separately.
  • Split setAction into setScalarAction and setPointerAction to avoid having to specify an LLT just to indicate whether the type is a scalar or a pointer(with address space).
  • To keep this patch as NFC as possible, for AArch64, I had to come up with some complicated SizeChangeStrategies. While not pretty, it demonstrates that it is possible to create very custom SizeChangeStrategies. These ugly SizeChangeStrategies are also only expect to be there for a short while, until we make functional-change-changes to allow all non-power-of-two-sized types
  • Moved some of the implementation code from LegalizerInfo.h to LegalizerInfo.cpp
  • A lot of smaller cleanups.
  1. Would this allow merging data?

When sub-arch specific decisions are concerned, having a way to override a base default case would reduce the amount of code on both table-gen and c++ parts.

For example, we could have a default catch-all case like {1,widen; 32,legal; 33,unsupp}, and later on, based on sub-arch decisions, increment legality, lib calls, etc.

With the patch as is, you indeed need to re-specify the info for all bit sizes.
It could be that I pushed the current patch way too far in breaking existing APIs and if I turned back the patch to only change the internal representations used in LegalizerInfo.cpp,
this would work. So, the idea being that tablegen info and targets specify for which data types an action can be taken that doesn't require changing bitsize, such as Legal, Lower, Libcall.
And then the target-independent logic can hopefully make decisions on most/all operations on how to adapt types to different bitsizes when that's needed.
I'll look into that.

The above is what the latest version of the patch does now.

kristof.beyls retitled this revision from [GlobalISel] Enable specifying how to legalize non-power-of-2 size types. [NFC-ish] to [RFC][GlobalISel] Enable legalizing non-power-of-2 sized types..
kristof.beyls edited the summary of this revision. (Show Details)
t.p.northover edited edge metadata.Jul 25 2017, 3:58 PM

Hi Kristof,

Thanks for working on this and I'm really sorry it took so long to reply.

I like the basic structure and I think it should be able to represent everything we need. I originally thought that mixing the strategies with setAction calls was clunky, but I assume almost all of that is going to go away once sensible default strategies are in place for all the operations?

lib/CodeGen/GlobalISel/LegalizerInfo.cpp
45–46 ↗(On Diff #104662)

This should probably be widen-then-narrow, but I assume it's like this to minimize the functional diff.

81–89 ↗(On Diff #104662)

Since v is only used for the push_back maybe just do it directly with ifs:

auto SizeAction = std::make_pair(Type.getSizeInBits(), Action);
if (Type.isPointer())
  AddressSpace2SpecifiedActions[Type.getAddressSpace()].push_back(SizeAction);
else if (Type.isVector())
  ElemSize2SpecifiedActions[Type.getElementType().getSizeInBits()].push_back(SizeAction);
else
  ScalarSpecifiedActions.push_back(SizeAction);
253–260 ↗(On Diff #104662)

I think this is approximately

VecIdx = std::lower_bound(Vec.begin(), Vec.end()) - Vec.begin();

which has added binary-search goodness.

275–277 ↗(On Diff #104662)

I think the assertion would be reasonable.

lib/Target/AArch64/AArch64LegalizerInfo.cpp
107–108 ↗(On Diff #104662)

I think this is the same as widen_1_8_16_narrowToLargest isn't it?

Hi Kristof,

Thanks for working on this and I'm really sorry it took so long to reply.

No problem, and thanks very much for the review!

I like the basic structure and I think it should be able to represent everything we need. I originally thought that mixing the strategies with setAction calls was clunky, but I assume almost all of that is going to go away once sensible default strategies are in place for all the operations?

I'm assuming you're talking about how the target defines legality in the TargetLegalizerInfo, right? Yes, I expect most of the strategy specifications to go away there in follow-up patches that don't aim to be as NFC-ish as this one, introducing more default strategies.
I agree that the mixing of setAction and strategies and how they work together isn't fully trivial, but it seems to me that they probably would work well in practice, which is why I also liked this basic structure. So, probably, once this lands, a brief explanation might be needed somewhere on http://llvm.org/docs/GlobalISel.html - enough for target authors to understand how the setAction and SizeChangingStrategies work together.

lib/CodeGen/GlobalISel/LegalizerInfo.cpp
45–46 ↗(On Diff #104662)

Hmmm, I can't tell of the top of my head. I'll look into it.

81–89 ↗(On Diff #104662)

I think both styles are probably roughly equally readable, but I don't have a preference for one over another, so I'll go for your suggestion, thanks!

253–260 ↗(On Diff #104662)

Yeah, I thought of that while writing this, but also thought that typically the Vec array being searched to be very short, and therefore linear search to potentially be faster than binary search. But, true, that is premature optimization not based on any empirical data, and the std::lower_bound expresses the intended semantics more clearly, so I'll look into going with that.

275–277 ↗(On Diff #104662)

I'm still not entirely sure if it wouldn't be possible to come up with a theoretical example where it still would make sense for 2 consecutive actions to both NeedsLegalizingToDifferentSize().
But indeed, probably best to just assert on that and reintroduce the loops if we have an example demonstrating there really is such a case.

qcolombet edited edge metadata.Sep 25 2017, 10:31 AM

Hi Kristof,

I was under the impression that the patch is good to land, at least as a first step.
What are we missing to push this change?

Cheers,
-Quentin

Hi Kristof,

I was under the impression that the patch is good to land, at least as a first step.
What are we missing to push this change?

Cheers,
-Quentin

Hi Quentin,

This should indeed not need much more work to land. I just need to find a bit of time to:

  • rebase to ToT.
  • address the few minor remarks made by Tim during review, which shouldn't be hard.
  • do some basic correctness testing and ideally compile time impact on the test-suite.

I've been hoping to push on with the above for a little while now, but have failed so far with more urgent stuff popping up all the time....
I hope to make progress on this this week....

Thanks,

Kristof

  • Rebased to ToT.
  • Addressed all outstanding review comments.
  • Used the test-suite on AArch64 to make sure there are no correctness regressions, both in fallback mode and in assert-when-not-legalizable mode.
  • Measured compile time impact of this change: it's below the noise level I see on gathering CTMark compile time numbers on my system.

I believe this makes the patch as is ready to be committed.
I noticed that on X86, with this patch, there will be 7 new failures when GlobalISel is enabled in the test-suite, seemingly because the X86 RegisterBankSelector cannot handle G_TRUNC nodes with non-power-of-2-sized types.
As the testing on AArch64 demonstrates that those are handled correctly by the AArch64 RegisterBankSelector, I'm inclined to commit this patch as is to avoid letting this patch increase further in size.

kristof.beyls marked 9 inline comments as done.Sep 29 2017, 7:24 AM
kristof.beyls added inline comments.
lib/CodeGen/GlobalISel/LegalizerInfo.cpp
45–46 ↗(On Diff #104662)

I've made G_ADD widenToLargerTypesAndNarrowToLargest.
There are going to be lots of tiny functional differences in here that will be extremely hard to avoid, so I might as well change this.

253–260 ↗(On Diff #104662)

It turned out to be slightly less trivial than VecIdx = std::lower_bound(...); but still concise enough to go for it.

275–277 ↗(On Diff #104662)

It turns out the loops are actually needed - see new comment I put in in case NarrowScalar to explain why.

lib/Target/AArch64/AArch64LegalizerInfo.cpp
107–108 ↗(On Diff #104662)

Good catch! I removed the function and replaced its uses with wide_1_8_16_narrowToLargest

kristof.beyls marked 6 inline comments as done.Sep 29 2017, 7:25 AM
aemerson added inline comments.Sep 29 2017, 7:39 AM
include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
342 ↗(On Diff #117141)

Not to block this patch, but std::map seems a little heavy handed for use here, given I think its a BST underneath. I'm assuming DenseMap won't work because of you need to define another tombstone key. Maybe use unordered_map or a simple vector?

Use unordered_map instead of (ordered) map.

kristof.beyls marked an inline comment as done.Sep 29 2017, 8:47 AM
kristof.beyls added inline comments.
include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
342 ↗(On Diff #117141)

Thanks Amara - indeed, an ordered map is not needed.
I switched it to unordered_map.
A pre-allocated vector would waste too much space IMHO, and I don't think it's useful to add the complexity of resizing the vector at run-time until we've seen unordered_map actually is too slow in practice.
As you've guessed, I looked into using a DenseMap here before and concluded that it wouldn't work easily.

kristof.beyls marked an inline comment as done.
  • updated to ToT; adapting the G_OR narrowing support added recently by Quentin.
  • slightly improve test arm64-fallback.ll by working around not being able to legalize non-power-of-2-sized G_IMPLICIT_DEFs yet.
t.p.northover accepted this revision.Oct 25 2017, 6:27 AM

Thanks Kristof. I think this looks pretty reasonable as a starting point.

This revision is now accepted and ready to land.Oct 25 2017, 6:27 AM
This revision was automatically updated to reflect the committed changes.