I've been working on and off on implementing/designing support for
non-power-of-2-sized types in GlobalISel. By now, I think I've iterated
to a plausible design, but the patch is large-ish and I think it might
make more sense to first discuss the design tradeoffs at a higher level.
That's why I tried to add a higher-level description here that I hope
will help in reviewing the design.
- Interface for targets to describe how to legalize.
In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.
For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel works today. For example, for a
target that supports 32 and 64 bit adds natively:
for (auto Ty : {s32, s64})
setAction({G_ADD, 0, s32}, Legal);
or for a target that needs a library call for a 32 bit division:
setAction({G_SDIV, s32}, Libcall);
The main conceptual change I propose to the LegalizerInfo API, is in
specifying how to legalize the type sizes for which a change of size is
needed. For example, in the above example, how to specify how all types
from i1 to i8388607 (apart from s32 and s64 which are legal) need to be
legalized and expressed in terms of operations on the available legal
sizes (again, i32 and i64 in this case). Up till now, the implementation
only allows specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation is that if you'd want to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.
Instead, my proposal is to specify the legalization actions that need
to change the size of the type to be specified using a
"SizeChangeStrategy". For example:
setLegalizeScalarToDifferentSizeStrategy( G_ADD, 0, widenToLargerAndNarrowToLargest);
This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.
Another example, taken from the ARM backend is:
for (unsigned Op : {G_SDIV, G_UDIV}) { setLegalizeScalarToDifferentSizeStrategy(Op, 0, widenToLargerTypesUnsupportedOtherwise); if (ST.hasDivideInARMMode()) setAction({Op, s32}, Legal); else setAction({Op, s32}, Libcall); }
For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).
The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:
setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal); setLegalizeVectorElementToDifferentSizeStrategy( G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);
As currently implemented in this patch, vector types are legalized by
first making the vector element size legal, followed by then making the
number of lanes legal. The strategy to follow in the first step is set
by a call to setLegalizeVectorElementToDifferentSizeStrategy, see
example above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see patch for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.
Therefore, for the above specification, some example legalizations are:
- getAction({G_ADD, LLT::vector(3, 3)}) returns {WidenScalar, LLT::vector(3, 8)}
- getAction({G_ADD, LLT::vector(3, 8)}) then returns {MoreElements, LLT::vector(8, 8)}
- getAction({G_ADD, LLT::vector(20, 8)}) returns {FewerElements, LLT::vector(16, 8)}
- Key implementation aspects.
How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:
setScalarAction({G_ADD, LLT:scalar(1)}, {{1, WidenScalar}, // bit sizes [ 1, 31[ {32, Legal}, // bit sizes [32, 33[ {33, WidenScalar}, // bit sizes [33, 64[ {64, Legal}, // bit sizes [64, 65[ {65, NarrowScalar} // bit sizes [65, +inf[ });
Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is missing, this is just trying to make it
possible for targets to specify what is legal, and how non-legal types
should be legalized. Probably quite a bit of further work is needed in
the actual legalizing and the other passes in GlobalISel to support
non-power-of-2 sized types.
I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.
In the existing targets having some support for GlobalIsel, I tried to not
change the semantics of what is defined in setAction at the moment for ARM,
X86 and AMDGPU.
This drops the need for:
- LLT::{half,double}...Size().
This might make legalization slower than before (I didn't try to measure
it yet), but I'm assuming that by introducing one or a few caches (see
FIXMEs), we can remove most of the overhead. I thought I'd try to get
some feedback on the high-level design before putting too much further
effort in...
Not to block this patch, but std::map seems a little heavy handed for use here, given I think its a BST underneath. I'm assuming DenseMap won't work because of you need to define another tombstone key. Maybe use unordered_map or a simple vector?