Extract the relevant bits with shifts.
This feels more like lower than widenScalar as it replaces the G_UNMERGE_VALUES with G_LSHR+G_TRUNC. I would have expected widenScalar to change something like:
%1, %2 = G_UNMERGE_VALUES %0
%5 = G_ANYEXT %0 %1, %2, %3, %4 = G_UNMERGE_VALUES %5
where %3 and %4 are unused. The main limitation with that is that sizeof(%5) would have to be a multiple of sizeof(%1). If that's not true then it would have to lower instead as G_UNMERGE_VALUES wouldn't be able to handle it.
What was the reason for emitting G_LSHR+G_TRUNC?
I’m not quite sure where to handle the various steps for merge/unmerge. D65447 is essentially the same, except it bitcasts from a vector which seems more like a lower action. Lower however does not have a specified type for it. I thought using the type hint for these primitives is more relevant than the operation itself.
I’m also having trouble deciding whether to handle the source or destination type for some of these. The fact that widen/narrow are distinct actions is something of a complication. Sometimes a wider type needs to be introduced when narrowing, and it’s not clear to me how it should be handled. This is sometimes avoidable by doing something resembling the opposite on the other type index.
I’m also not sure in cases where the types do not evenly divide whether a single legalize step should attempt to deal with it, or if the target legalization rules should have the burden of specifying actions to intermediate types which would be simpler to handle
I agree it's rather fuzzy. My understanding is that widenScalar/narrowScalar expect to emit the same core operations but on intermediates of the given type plus some glue MIR to make it all fit. Similarly for moreElements/fewerElements. Lower/Libcall (and Bitcast if we were to add one) expect to change the operations but stick to the same type. Custom doesn't really have any expectations and is free to do whatever.
FWIW, I'd like to push the legalizer more towards only having two real actions: Legal and ChangeIt. Where ChangeIt is given a function that makes the desired change (those functions could be from the widenScalar/lower/etc. family). Everything beyond that like widenScalar/lower/etc. is just sugar. The main reason I want to move in this direction is because it eliminates the Custom action. At the moment if you use Custom then one function has to be capable of every change you might want to make even if different cases are completely different.
Could you show me an example of needing to widen a type as part of narrowing?
You mean for things like widening/narrowing the sources of a G_MERGE_VALUES or the results of a G_UNMERGE_VALUES? Changing the other side of these two is fairly simple (you just add undefs or unused results and G_ANYEXT/G_TRUNC the one that has a remainder) but changing them from this side is tricky.
I think the intent is that a single legalize step is supposed to handle it but it's certainly possible that we haven't given that enough thought.
We currently have a mix of different strategies to deal with cases that don't evenly divide, although it would probably be better to have one consistent one. There is a tradeoff in complexity between LegalizerHelper and the target's LegalizerInfo rules.
For AMDGPU, to a 1st order approximation, every type should be broken up into 32-bit pieces. One example I've been looking at in the past couple of days is:
--- name: test_unmerge_s16_s48 body: | bb.0: liveins: $vgpr0_vgpr1 %0:_(s64) = COPY $vgpr0_vgpr1 %1:_(s48) = G_TRUNC %0 %2:_(s16), %3:_(s16), %4:_(s16) = G_UNMERGE_VALUES %1 %5:_(s32) = G_ANYEXT %2 %6:_(s32) = G_ANYEXT %3 %7:_(s32) = G_ANYEXT %4 $vgpr0 = COPY %5 $vgpr1 = COPY %6 $vgpr2 = COPY %7 ...
I want this to end up with s32, s32 = G_UNMERGE_VALUES s64, with shifts on the 32-bit elements to get the used 16-bit parts. I'm not sure if this logically is a widening of the result s16, or a narrowing of the source s48. If I try to treat it as a narrowing of the source, it is convenient to widen the original source so that a sub-unmerge can be created to s32s, before unmerging those to the original output registers (with an extra unused output)
The current strategy widens the result type to the requested s32, and multiplies the source type to s96, and produces some ugly s96 shifts which get expanded which is what I'm now trying to avoid. This strategy is fine when the result source is at most s32.
I see a few too many options for handling this:
These all involve some guesswork for when it makes sense to try to use a shift in the original source type. At some point, I think any non-legal register width unmerge will need to emit some kind of shift. So far I've been trying to make a reasonable guess when the requested type ends up matching the source, and making this a lower would require more target defined legalization rules to switch to lowering at some point.
I can also imagine a system where the legalize action specifies the types for all type indexes, although that probably makes things more complicated for the legalizer.